[Interest] back and forth from UTF-8

Tue Dec 10 13:17:39 CET 2013

Hi,

I am working on an software that uses a SQLite database.  There are default
tables embedded in the software as comma separated string constants, that
will be used to create the first instance of the full database, if one is
not provided.

The software will be available in three languages, so I am using UTF-8 as
much as possible.

I have learned that those string constants embedded in the code are not
automatically treated as UTF-8, but as soon as the software manages to get
those values converted to UTF-8, everywhere else I read those values, they
appear the same and correct way, UTF-8 coded.

The problem is that if the software insert those string constants to the
database as they are, when I have to retrieve them, I have to convert them
to UTF-8.  If it is able to convert it to UTF-8 prior to inserting them to
the database, then when it needs to retrieve them, they already are coded
in UTF-8.

So I have opted to use it this way, that is, the data already coded to
UTF-9.

But there are other string constants, there is a log class that sends
messages through the network for debugging purposes, and mixing string
constants with database values, it comes out a mess, because the values
from the database comes out correctly, because they already are coded as
UTF-8, but those string constants, when used as parts of the log message,
comes out as garbage if they use UTF-8 characters.

So, is there a way that those string constants be automatically considered
to be coded in UTF-8?  The code file itself is UTF-8 coded, but it seems it
is not the point.  I think it is a huge burden to code and to execute that
many conversions, in special if there is a way to have those string
constants compiled UTF-8 coded.

I am using Qt 4.8.5 and QTextCodec as follows:

        QTextCodec *codec = QTextCodec::codecForName("UTF-8");
        QStringList mCommaSeparatedRegisters;
        mCommaSeparatedRegisters // "Class, Name, Index, Value"
            << codec->toUnicode("ConjuntoCamera,Aceleracao,0,88000")
            << codec->toUnicode("ConjuntoCamera,Aceleracao,1,88000")
            <<
codec->toUnicode("ConjuntoCamera,DefasagemCentroLente,0,32000")
            <<
codec->toUnicode("ConjuntoCamera,DefasagemCentroLente,1,32000")
            <<
codec->toUnicode("ConjuntoCamera,DistanciaReferenciaRelativa,0,500000")
            <<
codec->toUnicode("ConjuntoCamera,DistanciaReferenciaRelativa,1,-500000")
           ...
            << codec->toUnicode("MaquinaMontadora,NomeCliente,0,RAZÃO
SOCIAL") // this one must end up UTF-8 coded;
            << codec->toUnicode("MaquinaMontadora,NumeroSerie,0,NÚM. DE
SÉRIE")  // this one, too, and there are many others;
            ;

Then the string list is parsed into the database.

This way, the database is built up already UTF-8 coded, but if I don't use
the QTextCodec, the database is not UTF-8 coded.

In other parts of the program, as a sample:

        emit Log(DEBUG, QString("Valor obtido da operação:
%1").arg(mDataBaseValue("MaquinaMontadora,NomeCliente,0"));

The word "operação" appears as garbage, but the value from the database
appears correctly, that is, "RAZÃO SOCIAL" (I hope everyone has an e-mail
client capable of seeing accentuations).

Thanks
Francisco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20131210/5e53a484/attachment.html>