[Development] Removal of some of the blacklisted (non-working) autotests?

Fri Nov 4 11:38:45 CET 2016

Good to see that discussion arose from this.

>   In your email you wrote that blacklisted is just a burden for CI. In
>general it is true, but mark that currently they are compiling and they are
>_not_ crashing. So they do contribute to the quality of Qt.

I didn't think it that way. It's true what you say, but I think you also agree that the contribution from the blacklisted tests to the quality of Qt isn't very big.

>On the other hand
>they artificially increase test coverage level hiding untested code paths and
>they may affect other tests.

This is my concern also. It gives a false feeling that the coverage is on a good level, since in QtBase only, there are 201 tests blacklisted which must have an effect on the coverage for sure. And here we are only talking about those 50 tests that have been blacklisted on all platforms and some of them have been so for over 2 years now.

>Partial conclusion: the test delivers an interesting information. It is worth
>to spend a bit of time and check what is going on. A tool that do a stress
>test of a test would be a really good addition, as it would simplify
>debugging.

Jędrek's suggestion about new tests following a separate stress test path is very interesting and I'm really looking forward of having this in place in COIN. This way we could be rather certain that we don't introduce new flakiness to the system and when a test starts to fail randomly it's most probably due to a real issue in Qt code, rather than a "flaky" test.

>Conclusion: I believe we should delete them, but be smart about that. Don't
>just go and remove all of them, but first build infrastructure / process that
>allows us to not decrease the test coverage. Even more, we need something that
>would not allow flaky, badly written test to be merged to Qt in the first place,
>otherwise we would have that discussion again in next 12 months.

Of course. [😊]  I wouldn't go about and delete everything in one go. That's why I started this thread to raise discussion. Also Eddy's suggestion of moving (some) of those tests to manual testing is a good one. That way the test wouldn't be deleted, but moved to "manual" directory. That way it wouldn't get built and wouldn't burden the CI.

I'm still all ears if you have other improvement ideas about the autotests.

- Milla

________________________________
From: Jedrzej Nowacki
Sent: Friday, November 4, 2016 10:10 AM
To: development at qt-project.org
Cc: Milla Pohjanheimo
Subject: Re: [Development] Removal of some of the blacklisted (non-working) autotests?

On torsdag 3. november 2016 09.17.57 CET Milla Pohjanheimo wrote:
> I would like to challenge you a bit about removal of (some of) the
> blacklisted autotests.
> (...)

Hi,

    In your email you wrote that blacklisted is just a burden for CI. In
general it is true, but mark that currently they are compiling and they are
_not_ crashing. So they do contribute to the quality of Qt. On the other hand
they artificially increase test coverage level hiding untested code paths and
they may affect other tests.

Partial conclusion: delete them, but watch code coverage and add new ones
where needed.

    Eddy mentioned that tests are documenting certain Qt usages. There are two
aspects of it. In many cases, we do not know what author of a test wanted to
test, sometimes it was a use case, sometimes it was just an internal code path
in the implementation. On the other hand, we know that the code is not working
in a reliable way, therefore in reality it is a misleading documentation.
Moreover developers tends to do copy&paste coding and coping a blacklisted
test is just plain wrong, while being very difficult to catch in code review.

Partial conclusion: delete them, but add a policy that every test should
contain a short comment what is the purpose of the test.

Some tests are blacklisted only on certain platforms. Depending on the reason
why the test is not working on a specific platform, we need to handle them
differently. In general it is ok to assume that Qt as well as Qt bugs are
cross platform. So a test failing only on one of them is failing because of a
platform specific code (we do not have so much of it) or it is wrongly
blacklisted or it hits a bug in a platform itself.

Partial conclusion: the test delivers an interesting information. It is worth
to spend a bit of time and check what is going on. A tool that do a stress
test of a test would be a really good addition, as it would simplify
debugging.

Conclusion: I believe we should delete them, but be smart about that. Don't
just go and remove all of them, but first build infrastructure / process that
allows us to not decrease the test coverage. Even more, we need something that
would not allow flaky, badly written test to be merged to Qt in the first place,
otherwise we would have that discussion again in next 12 months.

Cheers,
  Jędrek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20161104/9d313b4e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OutlookEmoji-😊.png
Type: image/png
Size: 488 bytes
Desc: OutlookEmoji-😊.png
URL: <http://lists.qt-project.org/pipermail/development/attachments/20161104/9d313b4e/attachment.png>