[Development] Adding QUIP24 - Blacklisting flaky tests
    Ville Voutilainen 
    ville.voutilainen at gmail.com
       
    Mon Dec 30 18:13:22 CET 2024
    
    
  
On Fri, 27 Dec 2024 at 19:26, Axel Spoerl via Development
<development at qt-project.org> wrote:
>
> Hi everyone,
>
> hopefully you have all had a merry Christmas and the year 2024 is gracefully moving towards new year's eve.
> I would like to draw your attention to https://codereview.qt-project.org/c/meta/quips/+/597911
> The commit adds QUIP 24, which is about blacklisting flaky tests.
>
> @Edward Welbourne has raised the topic during the contributor summit in Wûrzburg last fall.
>
> Thanks in advance for your comments and feedback.
Here's my rather candid dulcet tones 0.02 on this endeavor.
The QUIP (correctly) says that blacklisting was always meant to be a
temporary measure, and we have had quite many tests blacklisted
for years.
You know, my general comment on that is "uh huh". And the follow-up
question is, how long ago was it that an integration failed due to
flaky
tests? And how long ago was it that a submodule update failed due to
flaky tests?
Because before the answer to those questions is "many moons ago", I
find it alarming if we are going to do much anything to our
blacklistings,
especially remove them, and to a lesser extent alarming to replace
them with QSKIPs.
Fix them and then unblacklist, fine, I'll chip in. But before we are
in that "many moons ago" state, I don't think it's a correct approach
to strive
for getting rid of them in a categorical sense, unless they are gotten
rid of by actually fixing their flakiness. And perhaps we first need
coverage
of a different kind than what the BFAILing tests provide according to
some views of coverage that I can't quite make myself agree with,
and that's the coverage of blacklistings sufficient enough to get us
to that "many moons ago" state.
The QUIP has disconcerting indications that it's suggesting a
direction where blacklisted tests are a bad thing to have, and worse
than various alternatives. But apparently-randomly failing CI is worse
than anything.
I can't fathom a situation where that's better than pretty much any
alternative I can come up with.
    
    
More information about the Development
mailing list