[Development] QArrayDataOps/Pointer UB post-mortem
Thiago Macieira
thiago.macieira at intel.com
Tue May 26 18:40:05 CEST 2026
On Tuesday, 26 May 2026 09:29:29 Pacific Daylight Time Thiago Macieira wrote:
> BTW, my prompt to Claude was:
>
> I have a test bug that looks like a compiler problem. It could be a source
> problem, but I don't think so.
> When running @tests/auto/corelib/tools/qarraydata/tst_qarraydata.cpp test
> arrayDataOps, I get the following output:
> ...
> The compiled assembly is @tst_qarraydata.cpp.s and the object file is
> @tst_qarraydata.cpp.o
>
> So I did tell it that the problem could be a compiler bug. That's probably
> confirmation bias, because I had concluded the same.
>
> But in any case, it found the exact same piece of assembly I had and
> concluded the exact same thing: the cached size wasn't updated.
>
> What we both got wrong is the attribution: the issue wasn't a compiler bug,
> but a source code bug that caused UB.
I've resumed the session and told it to look at operator* and operator->
This is what it replied after thinking a bit:
====
● Read qarraydatapointer.h
│ src/corelib/tools/qarraydatapointer.h
└ L103:121 (19 lines read)
● This changes the conclusion significantly. The real bug is here.
operator->() does static_cast<DataOps *>(this) — a downcast from
QArrayDataPointer<T>* to the derived DataOps* (QArrayDataOps<T>). This is only
valid if the object actually is a DataOps. In SimpleVector, d is stored as a
plain QArrayDataPointer<T> — not DataOps. So the downcast is undefined
behaviour.
GCC 15.2.1 exploits this UB through TBAA (type-based alias analysis):
- The callee copyAppend receives this typed as DataOps* (derived)
- The caller's stack variable is typed as QArrayDataPointer<T> (base)
- GCC's TBAA concludes: writes through a DataOps* member cannot alias the
base-typed stack variable's size field
- Therefore: no reload needed after the call — stale 0 is reused
Revised conclusion
This is a source bug, not a compiler bug. GCC 15.2.1 is correct to exploit
the UB; it just does so more aggressively than earlier versions.
The fix is in SimpleVector — it should store a DataOps (i.e.
QArrayDataOps<T>) rather than a bare QArrayDataPointer<T>, matching how
production Qt containers use this pattern. In QList and friends the stored
member IS of the derived type, so operator->() is valid there. SimpleVector is
the odd one out.
====
The proposed fix would have worked, but would be undesirable. Its understanding
of the production containers is wrong. So when I asked it where it was seeing
the derived type, it change its mind....
====
● QList stores DataPointer d — i.e. QArrayDataPointer<T>, exactly the same as
SimpleVector. There is no QArrayDataOps<T> object anywhere.
My previous revision was wrong. QList::d is QArrayDataPointer<T>, not
DataOps. The same downcast UB exists in QList too — it's a pervasive design
pattern throughout Qt.
This means the "source bug" framing is also incorrect. The operator*/
operator-> downcast is intentional and universal. GCC has tolerated it for
years.
Conclusion: back to compiler bug
The downcast pattern is deliberate and identical in both QList and
SimpleVector. If it were truly invalid UB that GCC was correctly exploiting,
QList would be broken too. The fact that only this specific inlined call site
in the test misfires — where GCC can see the allocation of detached.d on the
stack and caches size = 0 across the copyAppend call — points to a GCC 15
regression in load elimination / alias analysis for this particular code
shape, not a principled TBAA exploit.
====
This shows it does *not* fully understand the consequences of UB. I expect
it's the same for most human developers. So when asking it about UBs
expressing itself differently, it came to the correct conclusion of:
====
Revised conclusion (final)
The root cause is UB in the source: operator->() performs an invalid downcast
from QArrayDataPointer<T>* to DataOps* when the object is not actually a
DataOps. GCC 15.2.1's optimizer exploits this via TBAA in the specific inlined
call shape present in the test, but not (yet) in QList's call sites. The
behaviour difference between the two is entirely consistent with how UB works —
it can be latent and context-sensitive.
The fix belongs in qarraydatapointer.h: operator->() should not perform a
downcast at all. The DataOps methods should be accessible without pretending
the object is of a derived type it isn't.
====
--
Thiago Macieira - thiago.macieira (AT) intel.com
Principal Engineer - Intel DCG - Platform & Sys. Eng.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5176 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/development/attachments/20260526/d9e8ee2a/attachment.bin>
More information about the Development
mailing list