[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings

Bubke Marco Marco.Bubke at theqtcompany.com
Thu Oct 15 10:00:03 CEST 2015


On October 15, 2015 08:45:30 Knoll Lars <Lars.Knoll at theqtcompany.com> wrote:

> On 14/10/15 23:51, "Bubke Marco" <Marco.Bubke at theqtcompany.com> wrote:
>
>>On October 14, 2015 23:10:26 Thiago Macieira <thiago.macieira at intel.com>
>>wrote:
>>
>>> On Wednesday 14 October 2015 20:52:12 Bubke Marco wrote:
>>>> On October 14, 2015 22:13:11 Thiago Macieira
>>>><thiago.macieira at intel.com>
>>> wrote:
>>>> And I don't want an utf 8 baked
>>>> QString. For my use cases implicit sharing is overkill.  Move semantics
>>>> would be enough. I want localAwareCompare(const char *s1, const char
>>>>*s2).
>>>
>>> Do it on your own. You just said that ICU has the function you want, so
>>>use 
>>> it.
>>
>>So Qt is always shipping with ICU?
>
> No, we wanted to do this at some point, but it turns out that it’s not
> possible to rely on it on all platforms.

Is there an other utf 8 backend on Windows? I heard that Microsoft is using it now on some plattforms too. 

>> 
>>
>>> Qt does not have to provide a comparator that operates on something
>>>other than 
>>> its native string type.
>>
>>Isn't Qt a framework to help developers? Sorry your argumentation is
>>sounds not very empirical.
>
> Of course our aim should be to help developers. But there will always be
> some use cases which we will not cover. The question is whether this is
> one of them or not.

Most file and network content is in utf 8, databases too. It has simply a size and performance advantage for most cases. You have not so many cases where you have pure Chinese signs in an text. Mostly it is an mixture. In Linux,  which is very important in embedded, utf 8 dominates ťhe APIs. Ask your self if we don't want support that. We could start simply and expand slowly. If the standard library would support utf 8 collations on all platforms very well we could skip it but today you have to do your own solutions again and again. 

>> 
>>
>>>
>>>> Maybe windows and mac os will bring support to the standard library so
>>>>we
>>>> don't need it but in the mean time it would be very helpful.
>>>> 
>>>> A utf 8 based QTextDocument would be maybe nice too.
>>>
>>> What for? It needs to keep a lot of extra structures, so the cost of
>>> conversion and extra memory is minimal. And besides, QTextDocument
>>>really 
>>> needs a seekable string, not UTF-8.
>>
>>Is UTF 16 seekable? You still have surrogates and you can merge merge
>>code points.
>
> For the most parts. When it comes to positioning cursors inside the text,
> you’ll always need to take care of complex text layouting, diacritics and
> (in the case of utf16) surrogates. Still, a lot of the seeking is probably
> easier with utf16 than with utf8.

Actually I would like to see measurements. Does anybody have seen anything about it. What is Chrome and Mozilla doing?

>> 
>>
>>Lets describe an example. I send the QTextDocument content to an library
>>which expect utf8 content and gives me back positions. This gets
>>interesting if you use non
>>ASCII signs. Actually the new clang code model works that way.
>
> We also have the opposite case, where we need to send utf16 to a 3rd party
> or system library, and get back positions. Unfortunately, not all APIs
> take the same encoding.

Yes, but in my experience C APIs are using UTF 8 or all UTF standards. There some older which use wchar but that is different anyway. It may be different for Java etc. but I don't think that people who use Qt use that much java APIs.  ;-) 

>> 
>>>
>>> Even if we provide UTF-8 support classes, those will not propagate to
>>>the GUI. 
>>> Forget it.
>>
>>What about compressing UTF 16 like python is doing it for UTF 32. If you
>>are only using ascii you set a flag and you can remove all that useless
>>zeros. It would be have implications for data() but maybe we should not
>>provide access to the internal representation. If you use UTF 32 as a
>>base you don't need anymore surrogates.
>
> That’s back to a mixed representation in QString. I personally think that
> combines the worst of both worlds.
>
I am not so sure eighter but I  like to have an open discussion about it with examples of other use cases. How other programs like mozilla,  chrome handling it, what does other languages support. Especially Internet centric solutions handle it would be interesting. 



More information about the Development mailing list