[Development] Utf8 local aware compare

Bubke Marco Marco.Bubke at digia.com
Thu Sep 11 14:53:45 CEST 2014


What about  icu::Collator::compareUTF8() in ICU?

http://userguide.icu-project.org/strings/utf-8:
"ICU has some APIs dedicated for UTF-8. They tend to have been added for "worker functions" like comparing strings, to avoid the string conversion overhead, rather than for "builder functions" like factory methods and attribute setters.

For example, icu::Collator::compareUTF8() compares two UTF-8 strings incrementally, without converting all of the two strings to UTF-16 if there is an early base letter difference."

My understanding is that if your UTF-8 charakters are precomposed you can do it.
________________________________________
From: Knoll Lars
Sent: Thursday, September 11, 2014 2:09 PM
To: Bubke Marco; development at qt-project.org
Subject: Re: [Development] Utf8 local aware compare

Hi Marco,

On 11/09/14 13:23, "Bubke Marco" <Marco.Bubke at digia.com> wrote:
>Okay, first the context. I want to write locale aware compare collations
>for a local database which is saving the text in UTF-8. So converting the
>text always with QString::fromUtf8 is a little bit expensive. Under Unix
>strcoll would work but for windows there
> is no UTF-8 collation. One idea around that problem is to convert per
>character to UTF-16 and than compare it. Is there some internal API where
>I could do the conversion per character(not per byte)?

There’s no real way to do this. Collation needs to work on whole strings,
and can’t compare byte by byte. And the platform APIs on Windows, Mac and
ICU are all utf16 based. So the API below wouldn’t help as we’d still have
to convert the string to utf16 before passing it into the platform APIs.

Lars

>
>
>Anyway, best would be:
>
>QCollator
><http://qt-project.org/doc/qt-5/qcollator.html#QCollator>::compare
><http://qt-project.org/doc/qt-5/qcollator.html#compare>(const char
>*utf8String1, const char *utf8String2,
> size_t size1=-1, size_t size2=-1) const
>
>Thank you, Marco
>
>
>
>




More information about the Development mailing list