[Development] Oslo, we have a problem</apollo 13> [char8_t]
Thiago Macieira
thiago.macieira at intel.com
Mon Jul 8 17:31:23 CEST 2019
On Monday, 8 July 2019 10:53:42 -03 Konstantin Ritt wrote:
> > See my reply to Marc: users want US-ASCII case-insensitive text matching
> > and
> > case folding routines, for network protocols that are US-ASCII case-
> > insensitive (DNS, IRC, etc.).
>
> That strnicmp() and std::toupper()/std::tolower() is exactly what for.
No, those are exactly what they are NOT for.
First, those are locale-dependent and should not be used unless you control
the locale or you specifically want to treat your 8-bit content under the
system's locale codec. On most modern Unix systems, that's UTF-8. But it's not
uncommon to find applications run with LC_ALL=C, which force those functions
to US-ASCII.
And then there's tr_TR.UTF-8, which causes strnicmp("I", "i") != 0. If this is
what you want, great. Just be careful when using it and expecting US-ASCII
behaviour, like when parsing the IRC protocol. There used to be an old bug in
ksirc that if you joined channel #irc, it would also join #ırc and then
further open tabs for #Irc and #İrc depending on messages you received.
Finally, std::toupper and std::tolower are FLAWED BY DESIGN. Do not use them,
ever. Uppercasing and lowercasing are string functions, any API that returns a
single character is flawed. SG16 means to fix that in the new std::text
functionality.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
More information about the Development
mailing list