Re: [ovirt-devel] local vdsm build fails

12 Jun 2014

      ----- Original Message -----
...
From: "David Caro" <dcaroest@redhat.com>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan
Kenigsberg" <danken@redhat.com>
Sent: Thursday, June 12, 2014 1:24:39 PM
Subject: Re: [ovirt-devel]  local vdsm build fails
On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:
...
----- Original Message -----
...
From: "David Caro" <dcaroest@redhat.com>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com,
dcaro@redhat.com, devel@ovirt.org, "Dan
Kenigsberg" <danken@redhat.com>
Sent: Friday, June 6, 2014 5:16:52 PM
Subject: Re: [ovirt-devel]  local vdsm build fails
On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:
...
----- Original Message -----
...
From: "Dan Kenigsberg" <danken@redhat.com>
To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>,
fsimonce@redhat.com,
nsoffer@redhat.com, dcaro@redhat.com
Cc: devel@ovirt.org
Sent: Friday, June 6, 2014 12:15:18 PM
Subject: Re: [ovirt-devel]  local vdsm build fails
On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:
...
All,
I pulled the latest vdsm from master and noticed that build is failing.
Here is the patch that causes the failuer:
http://gerrit.ovirt.org/#/c/28226
and looking at jenkins comments I can see that jenkins was failing
with the same reason:
http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...
Nir has already fix that as well. The storage tests were just fine, but
a post build script was running cp incorrectly.
David pointed that we need a way to distinguish between test errors and
failures.
He suggested looking up strings in the test output - we should not go
there, unless
we want to "fix" this many more times in the future.
I suggest to use the these rules:
- SUCCESS - make check returns 0
- FAILURE - make check returns 1
- ERROR - anything else returned by make check or any other script.
I think that make check does work like this, but it should be easy to
change.
What do you think?
...
Thanks for your report. Nir has already fixed this in
http://gerrit.ovirt.org/28426.
It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed also
because we have turned PYFLAKES off in unit test jobs. We must turn it
on
in
at least one of the tests (or initiate a new jenkins job for `make
check-local`).
As a quick fix, David has re-enabled PYFLAKES in
http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te...
Regards,
Dan.
Perfect for me, but you should know that it will fail also when strange
things occur, for example, out of memory, of disk space, slave
disconnected, network error, etc.
If you are willing to treat those (the most common infra failures) as
devel failures, then no problem on my side,
I'm not - this is why we should separate test failures from test errors.
...
but I don't want you to
start ignoring test errors because it's most probably an infra error
(don't get me wrong, it's totally normal to start ignoring an alarm
that is not a real problem, as infra members we will try to minimize
the infra issues, but it's not yet as stable as we'd like it to be).
This is too late now, people are already ignoring jenkins reports because
of the many false negatives :-)
So the return code is not a good solution then, we have to see if it
failed, and if it was due to an infra error or a devel error. I think
that it's easier to filter for:
* A string that means the tests did ran, probably at the end of the log
so if there's a connection failure it will be detected as infra issue.
* A string that identified if the test failed or passed
And if none of those were found, then an infra failure is supposed.
Ok, how about:

1. make check will write a file with test results - no other output
   can go into that file so we don't have to use heuristics when
   parsing the file.

2. If the file is found and parse successfully, tests either succeeded or failed.

3. Any other failure is a test error - failure is *never* assumed