[Development] Splitting Qt Network out of qtbase (was: QtBase network failures)

Sat Jun 25 18:33:42 CEST 2022

On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote:
> Perhaps this is a good time to discuss whether we should move Qt Network
> into its own repository. This would make qtbase integrations less exposed
> to network failure, which - even without certificates expiring - are a fact
> of life. And qtbase integrations already suffer from plenty of flakiness.
> And that an operational issue might require patches to merge and to get
> cherry picked, which might take several attempts, each taking several
> hours, just amplifies that problem further.

> Conceptually, we have made that kind of change before (when taking Qt
> Positioning out of the qtlocation repo). But there are some challenges.

> One challenge is that several of our Qt Core tests are using networking
> features (tests outside of tests/auto/network that include
> network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile, tst_qfileinfo,
> tst_qiodevice, tst_qtextstream, tst_qfiledialog2). Without having looked
> into the details, I’d assume that we might not need an actual server to
> test many of those codepaths (or that those tests can be moved into a
> qtnetwork repo, ie. QTextStream::stillOpenWhenAtEnd doesn’t seem to test
> QTextStream, which never closes a QIODevice).

Personally, I'd prefer if those Core tests ddn't use Networking. The majority 
of them aren't actually using QtNetwork, they are the Windows portion that 
deals with the SMB server provided by the Network Test Server. So the issue 
isn't that of QtNetwork, but of the NTS and would remain anyway. That would 
leave a few tests like QTextStream that use QTcpSocket for some particular 
QIODevice sequential condition, but which could be replaced with an identical 
condition with a different class, like QProcess.

But what's the gain? This looks like a lot of effort to me, particularly if we 
don't move the UNC path tests in the file classes.

Not looking scientifically at it, but from memory, the network test server and 
the networking tests haven't been the majority of spurious failures in the CI. 
They're a big contributor, but not the majority. From a random sampling of 
test failures in the past week, I see:

Non-test failures:
* general CI failures - "failed to acquire machine" [1]
* sccache network failures [2] 
* licensing issues with the INTEGRITY compiler
* timeouts [3]
* weird unexplained failures like [4] or [5]
Test failures:
* flaky tests on timing (QMutex, QDeadlineTimer, etc.)
* QFSModel on macOS on ARM [6]
* a std::filesystem unexplored issue on Windows [7]
* some widget issues like [8] or [9]

And yes, network test failures in 
https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655473411

But they are nowhere near the majority, even the plurality. The CI general 
failures, sccache failures and timeouts appear to be far more common and 
deserve more attention. 

Even among pure test failures the network ones don't appear to be the largest 
contributor. So I have to ask: is the effort worth the benefit?

[1] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655663797
[2] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656078094
[3] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656083019
[4] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655995816
[5] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656009936
[6] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1654781141
[7] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1654295531
[8] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655717505
[9] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1647034101
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering