[Development] toLower() vs. toCaseFolded()

Thiago Macieira thiago.macieira at intel.com
Wed Dec 9 19:09:14 CET 2015


On Wednesday 09 December 2015 11:54:54 Marc Mutz wrote:
> Hi,
> 
> http://doc.qt.io/qt-5/qstring.html#toCaseFolded is very vague on what case
> folding actually _is_ and how it's different from toLower().
> 
> Can someone please tell me the difference and why toCaseFolded() exists in
> the first place? Is it faster? Is it guaranteed to not make the string
> longer/shorter?

It's a different form of case transformation in Unicode. If you care about it, 
you've probably read some document that said that you should do case folding. 
An example is the "stringprep" / "nameprep" step of Internationalised Domain 
Names: it requires case folding.

It's given in the Unicode file CaseFolding.txt.

For example, İ lowercases to i, but casefolds to i + combining dot above.

Also, don't confuse case folding with title-casing. That's the difference 
between LJ, Lj, and lj and is found in the third column of cases (last column) in 
UnicodeData.txt.

01C7;LATIN CAPITAL LETTER LJ;Lu;0;L;<compat> 004C 004A;;;;N;LATIN CAPITAL 
LETTER L J;;;01C9;01C8
01C8;LATIN CAPITAL LETTER L WITH SMALL LETTER J;Lt;0;L;<compat> 004C 
006A;;;;N;LATIN LETTER CAPITAL L SMALL J;;01C7;01C9;01C8
01C9;LATIN SMALL LETTER LJ;Ll;0;L;<compat> 006C 006A;;;;N;LATIN SMALL LETTER L 
J;;01C7;;01C8


> Preferably answer by updating the docs to be a bit more descriptive.

I'd like Konstantin to do that, as he might have a better idea than "Unicode 
says so"...

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center




More information about the Development mailing list