[Development] Splitting Qt Network out of qtbase (was: QtBase network failures)

Mon Jun 27 14:42:15 CEST 2022

> Well, at least with qt positioning we need to handle all the cherry-picks to 5.15 manually

Which is probably fine for positioning, I don't expect there to be too many patches getting
picked back. But for a given period I would say it's not unusual for half of the
patches going to Network to be cherry-picked.

Mårten

------

From: Ivan Solovev <ivan.solovev at qt.io> 
Sent: mandag 27. juni 2022 13:40
To: Mårten Nordheim <marten.nordheim at qt.io>; Volker Hilsheimer <volker.hilsheimer at qt.io>; Macieira, Thiago <thiago.macieira at intel.com>
Cc: development at qt-project.org
Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)

> If it can be done with little effort, including cherry-picking fixes backwards, 
> even to 5.15, then it might be something to be investigated.

Well, at least with qt positioning we need to handle all the cherry-picks to 5.15 manually, 
because the cherry-pick bot obviously does not detect that those need to go to qtlocation.git repo.

So, if we split network now, we will need to care about manually picking to 6.4, 6.3, 6.2 and 5.15.

Best regards, 
Ivan

________________________________________
From: Development <mailto:development-bounces at qt-project.org> on behalf of Mårten Nordheim <mailto:marten.nordheim at qt.io>
Sent: Monday, June 27, 2022 12:12 PM
To: Volker Hilsheimer <mailto:volker.hilsheimer at qt.io>; Macieira, Thiago <mailto:thiago.macieira at intel.com>
Cc: mailto:development at qt-project.org <mailto:development at qt-project.org>
Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures) 

Hello!

While I don't necessarily object to splitting network out on a fundamental
level, I also find it to be questionable given the effort. If it can be done with
little effort, including cherry-picking fixes backwards, even to 5.15, then
it might be something to be investigated.

Off the top of my head, I can only think of the fact that network and IO tend
to be more interconnected compared to various other parts of Qt, so
historically it may have been useful to keep those together. But it's
likely less important these days.

As for what is flaky and what is not, I'm not sure it's useful to compare.
Of course, I always notice when something I haven't remotely touched
causes my integrations to fail, and I expect it's the same for others :)

Mårten

PS. [7] from Thiago's message was not a std::filesystem issue,
it was tst_QThread crashing on exit (probably a real failure from the
integration). I have seen the filesystem issue before but it only ever
happens for the first run, and never fails twice. Possibly granting it
title of "flakiest test" in the database :)
But since it only happens once or not at all I've been unable to
debug it.

> -----Original Message-----
> From: Development <mailto:development-bounces at qt-project.org> On Behalf Of
> Volker Hilsheimer
> Sent: søndag 26. juni 2022 15:39
> To: Macieira, Thiago <mailto:thiago.macieira at intel.com>
> Cc: mailto:development at qt-project.org
> Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase
> network failures)
> 
> 
> 
> > On 25 Jun 2022, at 18:33, Thiago Macieira <mailto:thiago.macieira at intel.com>
> wrote:
> >
> > On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote:
> >> Perhaps this is a good time to discuss whether we should move Qt
> >> Network into its own repository. This would make qtbase integrations
> >> less exposed to network failure, which - even without certificates
> >> expiring - are a fact of life. And qtbase integrations already suffer from
> plenty of flakiness.
> >> And that an operational issue might require patches to merge and to
> >> get cherry picked, which might take several attempts, each taking
> >> several hours, just amplifies that problem further.
> >
> >> Conceptually, we have made that kind of change before (when taking Qt
> >> Positioning out of the qtlocation repo). But there are some challenges.
> >
> >> One challenge is that several of our Qt Core tests are using
> >> networking features (tests outside of tests/auto/network that include
> >> network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile,
> >> tst_qfileinfo, tst_qiodevice, tst_qtextstream, tst_qfiledialog2).
> >> Without having looked into the details, I'd assume that we might not
> >> need an actual server to test many of those codepaths (or that those
> >> tests can be moved into a qtnetwork repo, ie.
> >> QTextStream::stillOpenWhenAtEnd doesn't seem to test QTextStream,
> which never closes a QIODevice).
> >
> > Personally, I'd prefer if those Core tests ddn't use Networking. The
> > majority of them aren't actually using QtNetwork, they are the Windows
> > portion that deals with the SMB server provided by the Network Test
> > Server. So the issue isn't that of QtNetwork, but of the NTS and would
> > remain anyway. That would leave a few tests like QTextStream that use
> > QTcpSocket for some particular QIODevice sequential condition, but
> > which could be replaced with an identical condition with a different class,
> like QProcess.
> >
> > But what's the gain? This looks like a lot of effort to me,
> > particularly if we don't move the UNC path tests in the file classes.
> >
> > Not looking scientifically at it, but from memory, the network test
> > server and the networking tests haven't been the majority of spurious
> failures in the CI.
> > They're a big contributor, but not the majority. From a random
> > sampling of test failures in the past week, I see:
> >
> > Non-test failures:
> > * general CI failures - "failed to acquire machine" [1]
> > * sccache network failures [2]
> > * licensing issues with the INTEGRITY compiler
> > * timeouts [3]
> > * weird unexplained failures like [4] or [5] Test failures:
> > * flaky tests on timing (QMutex, QDeadlineTimer, etc.)
> > * QFSModel on macOS on ARM [6]
> > * a std::filesystem unexplored issue on Windows [7]
> > * some widget issues like [8] or [9]
> >
> > And yes, network test failures in
> > https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655473411
> >
> > But they are nowhere near the majority, even the plurality. The CI
> > general failures, sccache failures and timeouts appear to be far more
> > common and deserve more attention.
> >
> > Even among pure test failures the network ones don't appear to be the
> > largest contributor. So I have to ask: is the effort worth the benefit?
> 
> 
> It's not going to be a silver-bullet, and I agree that there are other sources of
> flakiness that are likely larger contributors to failing integrations. Anecdotally,
> it seems that every single patch that I was involved in during the last couple
> of weeks was blocked in some branch by either tst_qtcpsocket or
> tst_qnetworkreply failure (or both [1]). Perhaps it's likely that things are
> related - if the network is unstable enough, then we might either get an
> sccache failure, or a Qt Network test failure.
> 
> [1] https://codereview.qt-project.org/c/qt/qtbase/+/417978
> 
> However, and again anecdotally, qtbase seems to suffer from the
> dependency on a stable network a lot more than other repositories. Maybe
> because it's large enough for any glitch to likely hit either a build or a test run.
> In which case making qtbase smaller, and esp taking away those tests that
> might take a lot of time. In which case making qtbase smaller might improve
> things as well.
> 
> So, I think that it might be worth it. Cleaning up the QtCore tests that don't
> just test file I/O with UNC paths (*) seems like an almost trivial refactoring -
> move the relevant tests to a test case in qtnetwork. The git work is also
> mostly a mechanical exercise, repeating what we did with Qt Positioning. We
> might not have to solve any hard engineering problem to take at least a little
> step forward.
> 
> Whereas making sccache fault tolerant, or making the CI system run test VMs
> with certain performance guarantees, or generally writing reliable tests that
> nevertheless depend on non-deterministic subsystems, seem like much
> harder engineering problems (that are still worth trying).
> 
> Volker
> 
> (*) As for the UNC stuff - it seems that we are testing only string parsing
> code. We are not taking care of any of the actual network traffic or SMB
> protocol. So do we need to access a share from a remote server at all? Would
> it be an option to create and share a folder on the Windows VMs running
> those tests during provisioning, and then use '\\$(COMPUTERNAME)'? That
> works for me on a local VM at least, all QFile tests pass (and we could
> probably even enable tst_QFile::largeUncFileSupport and simplify
> tst_QFile::writeLargeDataBlock_data) after running this as admin:
> 
> New-Item 'c:\testshare' -ItemType Directory New-SmbShare -Name
> testshare -Path 'c:\testshare' -ReadAccess Users Write-Output $("." * 32) >
> test.pri # test expects size of 34 bytes New-Item c:\testshare\readme.txt -
> ItemType File
> 
> New-Item 'c:\testsharewritable' -ItemType Directory New-SmbShare -Name
> testsharewritable -Path 'c:\testsharewritable' -ChangeAccess Users
> 
> _______________________________________________
> Development mailing list
> mailto:Development at qt-project.org
> https://lists.qt-project.org/listinfo/development
_______________________________________________
Development mailing list
mailto:Development at qt-project.org
https://lists.qt-project.org/listinfo/development