[Development] toLower() vs. toCaseFolded()
Thiago Macieira
thiago.macieira at intel.com
Wed Dec 9 19:09:14 CET 2015
On Wednesday 09 December 2015 11:54:54 Marc Mutz wrote:
> Hi,
>
> http://doc.qt.io/qt-5/qstring.html#toCaseFolded is very vague on what case
> folding actually _is_ and how it's different from toLower().
>
> Can someone please tell me the difference and why toCaseFolded() exists in
> the first place? Is it faster? Is it guaranteed to not make the string
> longer/shorter?
It's a different form of case transformation in Unicode. If you care about it,
you've probably read some document that said that you should do case folding.
An example is the "stringprep" / "nameprep" step of Internationalised Domain
Names: it requires case folding.
It's given in the Unicode file CaseFolding.txt.
For example, İ lowercases to i, but casefolds to i + combining dot above.
Also, don't confuse case folding with title-casing. That's the difference
between LJ, Lj, and lj and is found in the third column of cases (last column) in
UnicodeData.txt.
01C7;LATIN CAPITAL LETTER LJ;Lu;0;L;<compat> 004C 004A;;;;N;LATIN CAPITAL
LETTER L J;;;01C9;01C8
01C8;LATIN CAPITAL LETTER L WITH SMALL LETTER J;Lt;0;L;<compat> 004C
006A;;;;N;LATIN LETTER CAPITAL L SMALL J;;01C7;01C9;01C8
01C9;LATIN SMALL LETTER LJ;Ll;0;L;<compat> 006C 006A;;;;N;LATIN SMALL LETTER L
J;;01C7;;01C8
> Preferably answer by updating the docs to be a bit more descriptive.
I'd like Konstantin to do that, as he might have a better idea than "Unicode
says so"...
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list