[Interest] Smaller QString Serialization

Josh jnfo-d at grauman.com
Tue Apr 15 07:39:44 CEST 2025


Hello all,

It looks like the standard QString serialization writes a 32-bit size 
(uses 0xFFFFFFFF for Null String, and uses 0xFFFFFFFE if 32-bit isn't big 
enough and then writes a 64-bit size), followed by 16 bits for each 
character.

To save space, I want to use a scheme that writes an 8-bit size, and 
reserves the last four values (255, 254, 254, 252) for a null QString, or 
another 16-bit, 32-bit, or 64-bit size depending on how large the QString 
is, followed by utf8 data.

This will make a null QString 1 byte instead of 4, a single Latin 
character like 'a' 2 bytes instead of 6, and most long Latin1 strings 
around half the size...

I wrote a quick implementation, and it seems to work well.

My primary question is: is there a way to avoid doing a copy for the utf8 
data (the char *buffer in operator>> and the QByteArray utf8 in 
operator<<)?

Any other obvious ways to speed it up, or other suggestions?

Josh

QDataStream &operator<<(QDataStream &out, const QString &str)
{
   if(str.isNull())
     out << (quint8)255; //null marker
   else
   {
     QByteArray utf8=str.toUtf8();
     qsizetype size=utf8.size();
     if(size<252)
     {
       out << (quint8)size;
       out.writeRawData(utf8.data(), size);
     }
     else if(size<65536)
     {
       out << (quint8)254 << (quint16)size;
       out.writeRawData(utf8.data(), size);
     }
     else if(size<4294967296)
     {
       out << (quint8)253 << (quint32)size;
       out.writeRawData(utf8.data(), size);
     }
     else
     {
       out << (quint8)252 << (qsizetype)size;
       out.writeRawData(utf8.data(), size);
     }
   }
   return(out);
}

QDataStream &operator>>(QDataStream &in, QString &str)
{
   quint8 firstSize;
   in >> firstSize;
   if(firstSize==255) //null marker
     str = QString();
   else if(firstSize<252)
   {
     char* buffer = new char[firstSize];
     in.readRawData(buffer, firstSize);
     str = QString::fromUtf8(buffer, firstSize);
     delete[] buffer;
   }
   else if(firstSize==254)
   {
     quint16 secondSize;
     in >> secondSize;
     char* buffer = new char[secondSize];
     in.readRawData(buffer, secondSize);
     str = QString::fromUtf8(buffer, secondSize);
     delete[] buffer;
   }
   else if(firstSize==253)
   {
     quint32 secondSize;
     in >> secondSize;
     char* buffer = new char[secondSize];
     in.readRawData(buffer, secondSize);
     str = QString::fromUtf8(buffer, secondSize);
     delete[] buffer;
   }
   else if(firstSize==252)
   {
     qsizetype secondSize;
     in >> secondSize;
     char* buffer = new char[secondSize];
     in.readRawData(buffer, secondSize);
     str = QString::fromUtf8(buffer, secondSize);
     delete[] buffer;
   }
   return(in);
}


More information about the Interest mailing list