local vdsm build fails

older
Small note on workarounds in UI...

Piotr Kliczewski

6 Jun 2014 6 Jun '14

9:19 a.m.

All, I pulled the latest vdsm from master and noticed that build is failing. Here is the patch that causes the failuer: http://gerrit.ovirt.org/#/c/28226 and looking at jenkins comments I can see that jenkins was failing with the same reason: http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge... Thanks, Piotr

Show replies by date

Michal Skrivanek

6 Jun 6 Jun

9:23 a.m.

On Jun 6, 2014, at 09:19 , Piotr Kliczewski <piotr.kliczewski@gmail.com> wrote:

...

All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

btw at least yesterday again there were so many false errors with jenkins not being able to run the tests properly that it's unusable…. was that the reason the result was ignored? (though the err is clear about relevance to that patch) Thanks, michal

...

Thanks, Piotr _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

David Caro

9:53 a.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --6kuk2ANPT7uFFXUMWdB5PQSbgds2Kl5SC Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri 06 Jun 2014 09:23:33 AM CEST, Michal Skrivanek wrote:

...

On Jun 6, 2014, at 09:19 , Piotr Kliczewski <piotr.kliczewski@gmail.com= wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failing=

=2E

...

...
Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_loca=

lfs_gerrit/1064/console

btw at least yesterday again there were so many false errors with jenki= ns not being able to run the tests properly that it's unusable=E2=80=A6. was that the reason the result was ignored? (though the err is clear ab= out relevance to that patch)

Can you point out which jobs were false positives? Also, can you=20 specify for each one how to determine if it's a test failure from the=20 logs? As specific as possible? We can filter the logs for those=20 failures and set a different message so you'll know from the gerrit=20 comments if it was a real issue or infra failure.

...

Thanks, michal

...
Thanks, Piotr _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

-- David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --6kuk2ANPT7uFFXUMWdB5PQSbgds2Kl5SC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTkXNzAAoJEEBxx+HSYmnDHkwIAITe7fIxVKESzktcSAUlXgrn lTPDKrLt3Jou1xI1rqlCGFxnalgQTm750wemtImo8T3KqXlOaD4+hblG0zhglSDp I1x+mBFWFMZPmHGm9JxBpQfPzs4uohVlTioYM60OWMyC6y0vLvG2xtSgWgLxbzHO 015Vseu8DrFfxkpiL5o65enZagGMbUJ+1cUMCiKPPS5ePc62WVdHroLbCD1pKX4a CUJTBAEnqP9fhUggbqu99aPPNqBrf+L3+Jn1hiAaFovWGVZnpMTSouUKpAsLFKaV RRHiIOrQWDeB8d9UO0+/52rimA7zZjUW2zEgrwu3rEsUGqhDco1h3MFdxdJULVQ= =LJay -----END PGP SIGNATURE----- --6kuk2ANPT7uFFXUMWdB5PQSbgds2Kl5SC--

Francesco Romani

10:03 a.m.

----- Original Message -----

...

From: "David Caro" <dcaroest@redhat.com> To: "Michal Skrivanek" <michal.skrivanek@redhat.com> Cc: devel@ovirt.org Sent: Friday, June 6, 2014 9:53:23 AM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 09:23:33 AM CEST, Michal Skrivanek wrote:

...
On Jun 6, 2014, at 09:19 , Piotr Kliczewski <piotr.kliczewski@gmail.com> wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

btw at least yesterday again there were so many false errors with jenkins not being able to run the tests properly that it's unusable…. was that the reason the result was ignored? (though the err is clear about relevance to that patch)

Can you point out which jobs were false positives? Also, can you specify for each one how to determine if it's a test failure from the logs? As specific as possible? We can filter the logs for those failures and set a different message so you'll know from the gerrit comments if it was a real issue or infra failure.

Quite some noise is from python segfaulting. I can reproduce the segfault locally but I'm having hard time pinpointing the issue. Reported the bug upstream: https://github.com/nose-devs/nose/issues/817 Let me summarize what I (we) know about python segfaulting: * the segfault should be reproduceable on any box running nose >= 1.3.0, just using $ cd vdsm $ ./configure && make $ NOSE_WITH_XUNIT=1 make check or at least I can reproduce the issue on all the boxes I tried locally (vanilla F20, F19) * if we run each testunit separately, we do NOT observe the failure. This triggers the segfault: $ cd tests $ ./run_tests_local.sh ./*.py This does not: $ cd tests $ for TEST in `ls ./*.py`; do ./run_tests_local.sh $TEST; done * the stack traces I observed are huge, more than 750 levels deep. This suggests the stack exausted, and this in turn probably triggered by some kind of recursion gone wild. Note the offending stack trace is just on one thread; all the others are quiet. * I tried to reproduce the issue with a simpler use case with no luck so far. At the moment I don't have better suggestions than bite the bullet and dig in the huge stack trace looking for repetitive patterns or some sort of hint. A core dump available for post-mortem analysis, from my laptop, which is a F20 with few updates (mostly from virt-preview and few other places - full list, if relevant, provided as pkgs.txt.gz in the folder below) here [link will be provided off-list] core.20626.1000.gz is the fresh new core, core.20626.1000.md5 is its checksum. -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani

Dan Kenigsberg

11:15 a.m.

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...

All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426. It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed also because we have turned PYFLAKES off in unit test jobs. We must turn it on in at least one of the tests (or initiate a new jenkins job for `make check-local`). As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te... Regards, Dan.

Federico Simoncelli

12:12 p.m.

----- Original Message -----

...

From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 11:15:18 AM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

Sorry, the patch was verified in a series. I relied on gerrit running pyflakes for each individual patch (I didn't know it was disabled). Just to be on the safe side if a patch explicitly says (in the comment setting the "verified" flag) that was tested in a series we should probably merge the entire set together. There may be more important side effects other than just not being able to build. Sadly there's not always time to verify a long series one by one. -- Federico

...

...
and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed also because we have turned PYFLAKES off in unit test jobs. We must turn it on in at least one of the tests (or initiate a new jenkins job for `make check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te...

Regards, Dan.

Nir Soffer

3:53 p.m.

----- Original Message -----

...

From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 12:15:18 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

Nir has already fix that as well. The storage tests were just fine, but a post build script was running cp incorrectly. David pointed that we need a way to distinguish between test errors and failures. He suggested looking up strings in the test output - we should not go there, unless we want to "fix" this many more times in the future. I suggest to use the these rules: - SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script. I think that make check does work like this, but it should be easy to change. What do you think?

...

Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed also because we have turned PYFLAKES off in unit test jobs. We must turn it on in at least one of the tests (or initiate a new jenkins job for `make check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te...

Regards, Dan.

David Caro

4:16 p.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --SmQ4HENuk4Fdwsov8PNJCumIW7R9IiAQF Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:

...

----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.c= om, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 12:15:18 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failin= g.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_loc= alfs_gerrit/1064/console

Nir has already fix that as well. The storage tests were just fine, but=

...

a post build script was running cp incorrectly.

David pointed that we need a way to distinguish between test errors and= failures. He suggested looking up strings in the test output - we should not go t= here, unless we want to "fix" this many more times in the future.

I suggest to use the these rules:

- SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script.

I think that make check does work like this, but it should be easy to c= hange.

What do you think?

...
Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed als=

o

...
because we have turned PYFLAKES off in unit test jobs. We must turn it= on in at least one of the tests (or initiate a new jenkins job for `make check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_u= nit_tests/configure

Regards, Dan.

Perfect for me, but you should know that it will fail also when strange=20 things occur, for example, out of memory, of disk space, slave=20 disconnected, network error, etc. If you are willing to treat those (the most common infra failures) as=20 devel failures, then no problem on my side, but I don't want you to=20 start ignoring test errors because it's most probably an infra error=20 (don't get me wrong, it's totally normal to start ignoring an alarm=20 that is not a real problem, as infra members we will try to minimize=20 the infra issues, but it's not yet as stable as we'd like it to be). -- David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --SmQ4HENuk4Fdwsov8PNJCumIW7R9IiAQF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTkc1UAAoJEEBxx+HSYmnDaToH/2/x5uehCP2ZiOtLZIv2t0F8 /mBzfx+rjqtIlt5wXtL+8tDJkl9lrRhd4NKwkOs6waBFmEp32XM3DNqjXLLihkgc y9pd0opJddHv0dytmhG8mJQxzTa07momYmrb+kI1rK4VR7JiascXALwFFM5xYCIv zx66O0nCEdPWwBEGEu6JvpbPhYlaMNjXmIxhHkPfg8r4ynIcwh8ZP1RpVMYjNft1 JE+zKsdEQdpzGfTmIpFfQ8Io0gCSlGLle65Fp84Lhw4Bl7i5SyMZI8CYKCVYe9H2 Q22MJ1mE2nhs4pvYKFrBsiMI1DZ4yzWdxzte+Rnzba12VrE3NJ2N0rNuw1/MZ9U= =SX2o -----END PGP SIGNATURE----- --SmQ4HENuk4Fdwsov8PNJCumIW7R9IiAQF--

Nir Soffer

8 Jun 8 Jun

12:57 p.m.

----- Original Message -----

...

From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 6, 2014 5:16:52 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 12:15:18 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

Nir has already fix that as well. The storage tests were just fine, but a post build script was running cp incorrectly.

David pointed that we need a way to distinguish between test errors and failures. He suggested looking up strings in the test output - we should not go there, unless we want to "fix" this many more times in the future.

I suggest to use the these rules:

- SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script.

I think that make check does work like this, but it should be easy to change.

What do you think?

...
Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed also because we have turned PYFLAKES off in unit test jobs. We must turn it on in at least one of the tests (or initiate a new jenkins job for `make check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te...

Regards, Dan.

Perfect for me, but you should know that it will fail also when strange things occur, for example, out of memory, of disk space, slave disconnected, network error, etc.

If you are willing to treat those (the most common infra failures) as devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test errors.

...

but I don't want you to start ignoring test errors because it's most probably an infra error (don't get me wrong, it's totally normal to start ignoring an alarm that is not a real problem, as infra members we will try to minimize the infra issues, but it's not yet as stable as we'd like it to be).

This is too late now, people are already ignoring jenkins reports because of the many false negatives :-)

David Caro

12 Jun 12 Jun

12:24 p.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --p24AU4JWLLvxM022u0ghKBuqCBEHoClgG Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:

...

----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.c= om, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 6, 2014 5:16:52 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat= =2Ecom, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 12:15:18 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is fail= ing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_l= ocalfs_gerrit/1064/console

Nir has already fix that as well. The storage tests were just fine, b= ut a post build script was running cp incorrectly.

David pointed that we need a way to distinguish between test errors a= nd failures. He suggested looking up strings in the test output - we should not go=

...

...
...
there, unless we want to "fix" this many more times in the future.

I suggest to use the these rules:

- SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script.

I think that make check does work like this, but it should be easy to=

...

...
...
change.

What do you think?

...
Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed a=

lso

...
because we have turned PYFLAKES off in unit test jobs. We must turn = it on in at least one of the tests (or initiate a new jenkins job for `make check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master= _unit_tests/configure

Regards, Dan.

Perfect for me, but you should know that it will fail also when strang= e things occur, for example, out of memory, of disk space, slave disconnected, network error, etc.

If you are willing to treat those (the most common infra failures) as devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test errors= =2E

...
but I don't want you to start ignoring test errors because it's most probably an infra error (don't get me wrong, it's totally normal to start ignoring an alarm that is not a real problem, as infra members we will try to minimize the infra issues, but it's not yet as stable as we'd like it to be).

This is too late now, people are already ignoring jenkins reports becau= se of the many false negatives :-)

So the return code is not a good solution then, we have to see if it=20 failed, and if it was due to an infra error or a devel error. I think=20 that it's easier to filter for: * A string that means the tests did ran, probably at the end of the log=20 so if there's a connection failure it will be detected as infra issue. * A string that identified if the test failed or passed And if none of those were found, then an infra failure is supposed. -- David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --p24AU4JWLLvxM022u0ghKBuqCBEHoClgG Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTmX/nAAoJEEBxx+HSYmnDENkH/iQWyZH8N2w/UNKhXkJebLXK QZzC8i+8bXJ8Y2iWIuIG9Ktv4NNBKQklQ7vV1mbEoB7ikJSfJSYEaQw5kLOp5Wrt bfWH+qF8T+Hh02NBn+tiXaFURSScorCLA2kTmhtAlyNNbmaw2tlcd7joP3Ykxi1J YpDK3YI6E6WoO/KO9twiKxFqcKoqY3bw28aLiQrQdskXhI6pqH5DrvrGuCq9Ftk8 zejX9Cw4TapPHP+8eawIaqTt+JEkALCbRIL5WbOMOp96YqsSOiYxTD/hwnDqhDJQ tCjD3qtQ7IEPykJqeLhlG2TfcWzyehejvcIXAfCYqeqBssXR+SAy1o99hpKTFzw= =cxTJ -----END PGP SIGNATURE----- --p24AU4JWLLvxM022u0ghKBuqCBEHoClgG--

Nir Soffer

12:47 p.m.

----- Original Message -----

...

From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 1:24:39 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 6, 2014 5:16:52 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 12:15:18 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote:

...
All,

I pulled the latest vdsm from master and noticed that build is failing.

Here is the patch that causes the failuer:

http://gerrit.ovirt.org/#/c/28226

and looking at jenkins comments I can see that jenkins was failing with the same reason:

http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

Nir has already fix that as well. The storage tests were just fine, but a post build script was running cp incorrectly.

David pointed that we need a way to distinguish between test errors and failures. He suggested looking up strings in the test output - we should not go there, unless we want to "fix" this many more times in the future.

I suggest to use the these rules:

- SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script.

I think that make check does work like this, but it should be easy to change.

What do you think?

...
Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed also because we have turned PYFLAKES off in unit test jobs. We must turn it on in at least one of the tests (or initiate a new jenkins job for `make check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te...

Regards, Dan.

Perfect for me, but you should know that it will fail also when strange things occur, for example, out of memory, of disk space, slave disconnected, network error, etc.

If you are willing to treat those (the most common infra failures) as devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test errors.

...
but I don't want you to start ignoring test errors because it's most probably an infra error (don't get me wrong, it's totally normal to start ignoring an alarm that is not a real problem, as infra members we will try to minimize the infra issues, but it's not yet as stable as we'd like it to be).

This is too late now, people are already ignoring jenkins reports because of the many false negatives :-)

So the return code is not a good solution then, we have to see if it failed, and if it was due to an infra error or a devel error. I think that it's easier to filter for:

* A string that means the tests did ran, probably at the end of the log so if there's a connection failure it will be detected as infra issue. * A string that identified if the test failed or passed

And if none of those were found, then an infra failure is supposed.

Ok, how about: 1. make check will write a file with test results - no other output can go into that file so we don't have to use heuristics when parsing the file. 2. If the file is found and parse successfully, tests either succeeded or failed. 3. Any other failure is a test error - failure is *never* assumed

David Caro

8:48 p.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --OUIhV6GJhKMmtM9DDKvaaOwtkNotjNpmD Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu 12 Jun 2014 12:47:11 PM CEST, Nir Soffer wrote:

...

----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.c= om, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 1:24:39 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat= =2Ecom, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 6, 2014 5:16:52 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, nsoffer@redhat.com, dcaro@redhat.com Cc: devel@ovirt.org Sent: Friday, June 6, 2014 12:15:18 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote: > All, > > I pulled the latest vdsm from master and noticed that build is fa= iling. > > Here is the patch that causes the failuer: > > http://gerrit.ovirt.org/#/c/28226 > > and looking at jenkins comments I can see that jenkins was failin= g > with the same reason: > > http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests= _localfs_gerrit/1064/console

Nir has already fix that as well. The storage tests were just fine,= but a post build script was running cp incorrectly.

David pointed that we need a way to distinguish between test errors= and failures. He suggested looking up strings in the test output - we should not = go there, unless we want to "fix" this many more times in the future.

I suggest to use the these rules:

- SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script.=

...

...
...
...
...
I think that make check does work like this, but it should be easy =

to

...
change.

What do you think?

...
Thanks for your report. Nir has already fixed this in http://gerrit.ovirt.org/28426.

It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed=

also

...
because we have turned PYFLAKES off in unit test jobs. We must tur= n it on in at least one of the tests (or initiate a new jenkins job for `make=

...

...
...
...
...
...
check-local`).

As a quick fix, David has re-enabled PYFLAKES in http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_mast= er_unit_tests/configure

Regards, Dan.

Perfect for me, but you should know that it will fail also when stra= nge things occur, for example, out of memory, of disk space, slave disconnected, network error, etc.

If you are willing to treat those (the most common infra failures) a= s devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test erro= rs.

...
but I don't want you to start ignoring test errors because it's most probably an infra error=

...

...
...
...
(don't get me wrong, it's totally normal to start ignoring an alarm that is not a real problem, as infra members we will try to minimize=

...

...
...
...
the infra issues, but it's not yet as stable as we'd like it to be).=

...

...
...
This is too late now, people are already ignoring jenkins reports bec=

ause

...
of the many false negatives :-)

So the return code is not a good solution then, we have to see if it failed, and if it was due to an infra error or a devel error. I think that it's easier to filter for:

* A string that means the tests did ran, probably at the end of the lo= g so if there's a connection failure it will be detected as infra issue.=

...

...
* A string that identified if the test failed or passed

And if none of those were found, then an infra failure is supposed.

Ok, how about:

1. make check will write a file with test results - no other output can go into that file so we don't have to use heuristics when parsing the file.

2. If the file is found and parse successfully, tests either succeeded = or failed. What does 'parse successfully' mean?

3. Any other failure is a test error - failure is *never* assumed You mean an infra issue?

-- David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --OUIhV6GJhKMmtM9DDKvaaOwtkNotjNpmD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTmfYWAAoJEEBxx+HSYmnD3csH/3thGlI3uE7ZYQPKkiVFQ9Dj qCbPu7ITr+GBeENATwcGbmWc0F9wf9QilC//DcqjzYWRqC54ZOgNd9qpgIHIGu3o 4+EE9UWNVd4jdz2hvuoqDcyxjQ3FcX2uzyMMbO7Lap5WQCGE7D/wUCZIwPsDYWWP sS91liMAUTPqmCYY7SFb/qaoFpgmnpZTboTJxRGHEOa1GCBHz80WgHTZSP0paHHK g8SQXKJQLOtIsI2tCMoGwVvnP2pTHKnUVEEitho2Y71eGtvfY4FmXlNGjWeFrKpe bzWWtX/G3DlVUpd20aues0sZp5Z9V0fKihAsW4HpsPcGWLVQeUA27XEJYvb9TnY= =bR5A -----END PGP SIGNATURE----- --OUIhV6GJhKMmtM9DDKvaaOwtkNotjNpmD--

Nir Soffer

9:11 p.m.

----- Original Message -----

...

From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 9:48:54 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Thu 12 Jun 2014 12:47:11 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 1:24:39 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 6, 2014 5:16:52 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote:

...
----- Original Message ----- > From: "Dan Kenigsberg" <danken@redhat.com> > To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, > fsimonce@redhat.com, > nsoffer@redhat.com, dcaro@redhat.com > Cc: devel@ovirt.org > Sent: Friday, June 6, 2014 12:15:18 PM > Subject: Re: [ovirt-devel] local vdsm build fails > > On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote: >> All, >> >> I pulled the latest vdsm from master and noticed that build is >> failing. >> >> Here is the patch that causes the failuer: >> >> http://gerrit.ovirt.org/#/c/28226 >> >> and looking at jenkins comments I can see that jenkins was failing >> with the same reason: >> >> http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge...

Nir has already fix that as well. The storage tests were just fine, but a post build script was running cp incorrectly.

David pointed that we need a way to distinguish between test errors and failures. He suggested looking up strings in the test output - we should not go there, unless we want to "fix" this many more times in the future.

I suggest to use the these rules:

- SUCCESS - make check returns 0 - FAILURE - make check returns 1 - ERROR - anything else returned by make check or any other script.

I think that make check does work like this, but it should be easy to change.

What do you think?

> > Thanks for your report. Nir has already fixed this in > http://gerrit.ovirt.org/28426. > > It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed > also > because we have turned PYFLAKES off in unit test jobs. We must turn it > on > in > at least one of the tests (or initiate a new jenkins job for `make > check-local`). > > As a quick fix, David has re-enabled PYFLAKES in > http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te... > > Regards, > Dan. >

Perfect for me, but you should know that it will fail also when strange things occur, for example, out of memory, of disk space, slave disconnected, network error, etc.

If you are willing to treat those (the most common infra failures) as devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test errors.

...
but I don't want you to start ignoring test errors because it's most probably an infra error (don't get me wrong, it's totally normal to start ignoring an alarm that is not a real problem, as infra members we will try to minimize the infra issues, but it's not yet as stable as we'd like it to be).

This is too late now, people are already ignoring jenkins reports because of the many false negatives :-)

So the return code is not a good solution then, we have to see if it failed, and if it was due to an infra error or a devel error. I think that it's easier to filter for:

* A string that means the tests did ran, probably at the end of the log so if there's a connection failure it will be detected as infra issue. * A string that identified if the test failed or passed

And if none of those were found, then an infra failure is supposed.

Ok, how about:

1. make check will write a file with test results - no other output can go into that file so we don't have to use heuristics when parsing the file.

2. If the file is found and parse successfully, tests either succeeded or failed. What does 'parse successfully' mean?

That I can open and understand the contents of this file. For example if I expect the file to contain "PASS" or "FAILED" but it contains "BLAH" this is not a test failure, this is a test error.

...

...
3. Any other failure is a test error - failure is *never* assumed

You mean an infra issue?

To make it clear: - test success - all tests completed and passed - test failure - all tests completed and some of them failed - test error - anything else, meaning, I don't know if tests passed or failed. This can be caused by bad tests, infra issue, anything. Practically this means that the test author and infra should understand the error and fix it, otherwise the developer does not know if her code is good or bad. Nir

David Caro

13 Jun 13 Jun

5:34 p.m.

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --JWv6SKN48HC4bWmJvJSBXV2crBPp428RU Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu 12 Jun 2014 09:11:07 PM CEST, Nir Soffer wrote:

...

----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.c= om, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 9:48:54 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Thu 12 Jun 2014 12:47:11 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat= =2Ecom, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 1:24:39 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 6, 2014 5:16:52 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote: > ----- Original Message ----- >> From: "Dan Kenigsberg" <danken@redhat.com> >> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, >> fsimonce@redhat.com, >> nsoffer@redhat.com, dcaro@redhat.com >> Cc: devel@ovirt.org >> Sent: Friday, June 6, 2014 12:15:18 PM >> Subject: Re: [ovirt-devel] local vdsm build fails >> >> On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote= : >>> All, >>> >>> I pulled the latest vdsm from master and noticed that build is >>> failing. >>> >>> Here is the patch that causes the failuer: >>> >>> http://gerrit.ovirt.org/#/c/28226 >>> >>> and looking at jenkins comments I can see that jenkins was fail= ing >>> with the same reason: >>> >>> http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tes= ts_localfs_gerrit/1064/console > > Nir has already fix that as well. The storage tests were just fin= e, but > a post build script was running cp incorrectly. > > David pointed that we need a way to distinguish between test erro= rs and > failures. > He suggested looking up strings in the test output - we should no= t go > there, unless > we want to "fix" this many more times in the future. > > I suggest to use the these rules: > > - SUCCESS - make check returns 0 > - FAILURE - make check returns 1 > - ERROR - anything else returned by make check or any other scrip= t. > > I think that make check does work like this, but it should be eas= y to > change. > > What do you think? > >> >> Thanks for your report. Nir has already fixed this in >> http://gerrit.ovirt.org/28426. >> >> It was introduced in http://gerrit.ovirt.org/#/c/28226/ but miss= ed >> also >> because we have turned PYFLAKES off in unit test jobs. We must t= urn it >> on >> in >> at least one of the tests (or initiate a new jenkins job for `ma= ke >> check-local`). >> >> As a quick fix, David has re-enabled PYFLAKES in >> http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_ma= ster_unit_tests/configure >> >> Regards, >> Dan. >>

Perfect for me, but you should know that it will fail also when st= range things occur, for example, out of memory, of disk space, slave disconnected, network error, etc.

If you are willing to treat those (the most common infra failures)= as devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test er= rors.

...
but I don't want you to start ignoring test errors because it's most probably an infra err= or (don't get me wrong, it's totally normal to start ignoring an alar= m that is not a real problem, as infra members we will try to minimi= ze the infra issues, but it's not yet as stable as we'd like it to be= ).

This is too late now, people are already ignoring jenkins reports b= ecause of the many false negatives :-)

So the return code is not a good solution then, we have to see if it=

...

...
...
...
failed, and if it was due to an infra error or a devel error. I thin= k that it's easier to filter for:

* A string that means the tests did ran, probably at the end of the = log so if there's a connection failure it will be detected as infra issu= e. * A string that identified if the test failed or passed

And if none of those were found, then an infra failure is supposed.

Ok, how about:

1. make check will write a file with test results - no other output can go into that file so we don't have to use heuristics when parsing the file.

2. If the file is found and parse successfully, tests either succeede= d or failed. What does 'parse successfully' mean?

That I can open and understand the contents of this file. For example i= f I expect the file to contain "PASS" or "FAILED" but it contains "BLAH" th= is is not a test failure, this is a test error.

So then you have passed from checking jenkins log output to check the=20 contents of a file, I see no clear advantage. Nothing against though=20 (it cleans up the jenkins log, and let's you compress the logs). I'll need then the format of the file to check it.

...

...
...
3. Any other failure is a test error - failure is *never* assumed

You mean an infra issue?

To make it clear:

- test success - all tests completed and passed

- test failure - all tests completed and some of them failed

- test error - anything else, meaning, I don't know if tests passed or =

failed.

...

This can be caused by bad tests, infra issue, anything. Practically t= his means that the test author and infra should understand the error and fix it= , otherwise the developer does not know if her code is good or bad.

Nir

-- David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Email: dcaro@redhat.com Web: www.redhat.com RHT Global #: 82-62605 --JWv6SKN48HC4bWmJvJSBXV2crBPp428RU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTmxoPAAoJEEBxx+HSYmnDcN4H/RSD2ggcjH4sc6ui8aiqrP3/ x/paZZQqHa+ekyjKw2tIQR3qIyzMVzoSvhDnCz8hLzxdpFl0+ko+7YBHa31i+8DV 7Ar+0YpzxN5LaMf2mbgcfIBtahC1kTqm+GDHE0bVvWjMNu630PwUcB1D6i3IgbWb a0Mscaq8KSlFmEpr0z9AOScLaSWHimKAjok/6Gc06eOQmAGuecVPJaDSto4mBWw1 CKmZ95t/WF0CgH9yubvLhU7NNjjJl8h2FDNnxNnXLpqoCd53krAz6uxP6E9jY0Tm rAl2ptCFs2gnVKLGm2/fPi/IQuxYQnhVZ3j9R7m6iMx2ArCZtYi7DFr/Pp0Kkk4= =MmCo -----END PGP SIGNATURE----- --JWv6SKN48HC4bWmJvJSBXV2crBPp428RU--

Nir Soffer

9:10 p.m.

----- Original Message -----

...

From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Friday, June 13, 2014 6:34:39 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Thu 12 Jun 2014 09:11:07 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 9:48:54 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Thu 12 Jun 2014 12:47:11 PM CEST, Nir Soffer wrote:

...
----- Original Message -----

...
From: "David Caro" <dcaroest@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com> Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, fsimonce@redhat.com, dcaro@redhat.com, devel@ovirt.org, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, June 12, 2014 1:24:39 PM Subject: Re: [ovirt-devel] local vdsm build fails

On Sun 08 Jun 2014 12:57:24 PM CEST, Nir Soffer wrote:

...
----- Original Message ----- > From: "David Caro" <dcaroest@redhat.com> > To: "Nir Soffer" <nsoffer@redhat.com> > Cc: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, > fsimonce@redhat.com, > dcaro@redhat.com, devel@ovirt.org, "Dan > Kenigsberg" <danken@redhat.com> > Sent: Friday, June 6, 2014 5:16:52 PM > Subject: Re: [ovirt-devel] local vdsm build fails > > On Fri 06 Jun 2014 03:53:41 PM CEST, Nir Soffer wrote: >> ----- Original Message ----- >>> From: "Dan Kenigsberg" <danken@redhat.com> >>> To: "Piotr Kliczewski" <piotr.kliczewski@gmail.com>, >>> fsimonce@redhat.com, >>> nsoffer@redhat.com, dcaro@redhat.com >>> Cc: devel@ovirt.org >>> Sent: Friday, June 6, 2014 12:15:18 PM >>> Subject: Re: [ovirt-devel] local vdsm build fails >>> >>> On Fri, Jun 06, 2014 at 09:19:11AM +0200, Piotr Kliczewski wrote: >>>> All, >>>> >>>> I pulled the latest vdsm from master and noticed that build is >>>> failing. >>>> >>>> Here is the patch that causes the failuer: >>>> >>>> http://gerrit.ovirt.org/#/c/28226 >>>> >>>> and looking at jenkins comments I can see that jenkins was failing >>>> with the same reason: >>>> >>>> http://jenkins.ovirt.org/job/vdsm_master_storage_functional_tests_localfs_ge... >> >> Nir has already fix that as well. The storage tests were just fine, >> but >> a post build script was running cp incorrectly. >> >> David pointed that we need a way to distinguish between test errors >> and >> failures. >> He suggested looking up strings in the test output - we should not go >> there, unless >> we want to "fix" this many more times in the future. >> >> I suggest to use the these rules: >> >> - SUCCESS - make check returns 0 >> - FAILURE - make check returns 1 >> - ERROR - anything else returned by make check or any other script. >> >> I think that make check does work like this, but it should be easy to >> change. >> >> What do you think? >> >>> >>> Thanks for your report. Nir has already fixed this in >>> http://gerrit.ovirt.org/28426. >>> >>> It was introduced in http://gerrit.ovirt.org/#/c/28226/ but missed >>> also >>> because we have turned PYFLAKES off in unit test jobs. We must turn >>> it >>> on >>> in >>> at least one of the tests (or initiate a new jenkins job for `make >>> check-local`). >>> >>> As a quick fix, David has re-enabled PYFLAKES in >>> http://jenkins.ovirt.org/view/By%20Project/view/vdsm/job/vdsm_master_unit_te... >>> >>> Regards, >>> Dan. >>> > > Perfect for me, but you should know that it will fail also when > strange > things occur, for example, out of memory, of disk space, slave > disconnected, network error, etc. > > If you are willing to treat those (the most common infra failures) as > devel failures, then no problem on my side,

I'm not - this is why we should separate test failures from test errors.

> but I don't want you to > start ignoring test errors because it's most probably an infra error > (don't get me wrong, it's totally normal to start ignoring an alarm > that is not a real problem, as infra members we will try to minimize > the infra issues, but it's not yet as stable as we'd like it to be).

This is too late now, people are already ignoring jenkins reports because of the many false negatives :-)

So the return code is not a good solution then, we have to see if it failed, and if it was due to an infra error or a devel error. I think that it's easier to filter for:

* A string that means the tests did ran, probably at the end of the log so if there's a connection failure it will be detected as infra issue. * A string that identified if the test failed or passed

And if none of those were found, then an infra failure is supposed.

Ok, how about:

1. make check will write a file with test results - no other output can go into that file so we don't have to use heuristics when parsing the file.

2. If the file is found and parse successfully, tests either succeeded or failed. What does 'parse successfully' mean?

That I can open and understand the contents of this file. For example if I expect the file to contain "PASS" or "FAILED" but it contains "BLAH" this is not a test failure, this is a test error.

So then you have passed from checking jenkins log output to check the contents of a file, I see no clear advantage. Nothing against though (it cleans up the jenkins log, and let's you compress the logs).

The advantage is that tests results are separated from the noise in jenkins log, and checking the results is not fragile.

...

I'll need then the format of the file to check it.

Lets talk about this next week. Nir

4275

Age (days ago)

4282

Last active (days ago)

List overview

Download

14 comments

7 participants

participants (7)

Dan Kenigsberg
David Caro
Federico Simoncelli
Francesco Romani
Michal Skrivanek
Nir Soffer
Piotr Kliczewski

local vdsm build fails

tags

participants (7)