[Development] HEADS-UP: QStringLiteral

Thiago Macieira thiago.macieira at intel.com
Thu Aug 22 17:25:59 CEST 2019


On Thursday, 22 August 2019 05:31:44 PDT Edward Welbourne wrote:
> That's the UTF8 path Thiago is talking about.
> There is no short-cut, although I do wonder why there isn't a "search
> for the first byte whose top bit is set", which might equip us with one.

There is. It's that code you didn't understand: the simdDecodeAscii() function 
is called from the UTF-8 decoder and fails only if the input isn't ASCII.

    for ( ; end - src >= 16; src += 16, dst += 16) {
        __m128i data = _mm_loadu_si128((const __m128i*)src);
[load 16 characters]

#ifdef __AVX2__
        const int BitSpacing = 2;
        // load and zero extend to an YMM register
        const __m256i extended = _mm256_cvtepu8_epi16(data);
[this is the Latin1 to UTF16 expansion, but may be wrong]

        uint n = _mm256_movemask_epi8(extended);
[this extracts the high bit from each byte]
        if (!n) {
            // store
            _mm256_storeu_si256((__m256i*)dst, extended);
            continue;
[if the input was US-ASCII, repeat]
        }

[here, we handle the case of the input containing non-ASCII: store the input 
that was US-ASCII, the find the first US-ASCII scanning backwards from the 
end]

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products






More information about the Development mailing list