[Development] HEADS-UP: QStringLiteral
Thiago Macieira
thiago.macieira at intel.com
Thu Aug 22 17:25:59 CEST 2019
On Thursday, 22 August 2019 05:31:44 PDT Edward Welbourne wrote:
> That's the UTF8 path Thiago is talking about.
> There is no short-cut, although I do wonder why there isn't a "search
> for the first byte whose top bit is set", which might equip us with one.
There is. It's that code you didn't understand: the simdDecodeAscii() function
is called from the UTF-8 decoder and fails only if the input isn't ASCII.
for ( ; end - src >= 16; src += 16, dst += 16) {
__m128i data = _mm_loadu_si128((const __m128i*)src);
[load 16 characters]
#ifdef __AVX2__
const int BitSpacing = 2;
// load and zero extend to an YMM register
const __m256i extended = _mm256_cvtepu8_epi16(data);
[this is the Latin1 to UTF16 expansion, but may be wrong]
uint n = _mm256_movemask_epi8(extended);
[this extracts the high bit from each byte]
if (!n) {
// store
_mm256_storeu_si256((__m256i*)dst, extended);
continue;
[if the input was US-ASCII, repeat]
}
[here, we handle the case of the input containing non-ASCII: store the input
that was US-ASCII, the find the first US-ASCII scanning backwards from the
end]
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
More information about the Development
mailing list