[Development] Qt::CaseInsensitive comparison is not the same as toLower() comparison
Thiago Macieira
thiago.macieira at intel.com
Wed Feb 10 19:27:04 CET 2016
Hi all
(especially Konstantin!)
When reviewing a change, I noticed that QString::startsWith with
CaseInsensitive compares things like this:
if (foldCase(data[i]) != foldCase((ushort)latin[i]))
return false;
with foldCase() being convertCase_helper<QUnicodeTables::CasefoldTraits>(ch),
whereas toLower() uses QUnicodeTables::LowercaseTraits.
There's a slight but important difference in a few character pairs see below.
The code has been like that since forever. So I have to ask:
=> Is this intended?
If you write code like:
qDebug() << a.startsWith(b, Qt::CaseInsensitive)
<< (a.toLower() == b.toLower());
You'll get a different result for the following pairs (for example, see util/
unicode/data/CaseFolding.txt for more):
µ U+00B5 MICRO SIGN
μ U+03BC GREEK SMALL LETTER MU
s U+0073 LATIN SMALL LETTER S
ſ U+017F LATIN SMALL LETTER LONG S
And then there are the differences between toUpper and toLower. The following
pairs compare false with toLower(), compare true with toUpper(), currently
compare false with CaseInsensitive/toCaseFolded() but *should* compare
true[1]:
ß U+00DF LATIN SMALL LETTER SHARP S
ẞ U+1E9E LATIN CAPITAL LETTER SHARP S
SS
ʼn U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
ʼN
ff U+FB00 LATIN SMALL LIGATURE FF
FF
[1] CaseFolding.txt says:
# The data supports both implementations that require simple case foldings
# (where string lengths don't change), and implementations that allow full
case folding
# (where string lengths may grow). Note that where they can be supported, the
# full case foldings are superior: for example, they allow "MASSE" and "Maße"
to match.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list