On Fri, Mar 22, 2019 at 1:57 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:


Il giorno ven 22 mar 2019 alle ore 12:42 Dan Kenigsberg <danken@redhat.com> ha scritto:


On Fri, 22 Mar 2019, 12:21 Sandro Bonazzola, <sbonazzo@redhat.com> wrote:


Il giorno ven 22 mar 2019 alle ore 11:14 Dan Kenigsberg <danken@redhat.com> ha scritto:


On Fri, 22 Mar 2019, 12:00 Sandro Bonazzola, <sbonazzo@redhat.com> wrote:


Il giorno ven 22 mar 2019 alle ore 10:52 Dan Kenigsberg <danken@redhat.com> ha scritto:
Yes, I'm repeating myself.
SKIPPING TESTS IS BAD

I agree. And having the suite failing on a broken test skipping all the following tests is even worse.
This is why I would prefer the rest of the product being tested while someone take ownership of the broken test and fix it.

This is a good reason to rewrite OST with pytest, which continues on failure.

Patches are welcome :-)

This is not an empty gesture. The network suite came into being because of this issue (and others)

 
And a good reason to ping mperina on IRC to debug this. And a good reason not to merge new code.

It doesn't convince me that we should ignore the failure without due debugging.

Debugging in indeed needed but not on production system blocking the rest of the CI. Maintainer of the test can debug it on own test environment.

The product of this system are bugs. We found one. If you skip it, we all risk it being forgotten. Skipping should be rare, and happen only after the owner is found and admits that he is too busy/lazy to fix it now, and files a bug to fix it later.

We didn't found a bug in the product we are testing, we found a bug in the test that still need to be identified.
According to Dafna: "we are randomly failing on get_host_hooks test for at least 3 weeks. its not a specific branch or project and there are no commonalities that I can see,"
If it was a bug in the product I would have totally agreed with you, it couldn't have been ignored. I'm not saying to ignore this as well.
Being a bug in the test itself

I have no idea if this is the case. NullPointerException smells like something coming deep from Engine's data model
 
I would rather prefer take a non reliable test off for further investigation on a development environment and ensure the rest of the tests are being executed in production environment finding bugs on the product if there are.

Skip is still on the table as an option. The infra team may request us to use it. But we should first put the pressure on them to fix it properly.
mperina and msobczyk are now aware of the issue; they should decide if they fix it now or asynchronously.