[VDSM] Travis builds still fail on .coverage rename

Nir Soffer

4 Jul 2018 4 Jul '18

12:48 p.m.

Dan, travis build still fail when renaming coverage file even after your last patch. ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. ---------------------------------------------------------------------- Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check' https://travis-ci.org/oVirt/vdsm/jobs/399932012 Do you have any idea what is wrong there? Why we don't have any error message from the failed command? Nir

Attachments:

attachment.html (text/html — 7.0 KB)

Show replies by date

Dan Kenigsberg

4 Jul 4 Jul

12:59 p.m.

On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. ---------------------------------------------------------------------- Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

...

Nir

Nir Soffer

5 Jul 5 Jul

2:52 a.m.

On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...

On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS..................................................................................................................................................................

...
---------------------------------------------------------------------- Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure. Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807 Nir

Dan Kenigsberg

5:43 p.m.

On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. ---------------------------------------------------------------------- Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists') which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here? py-watch seems to be failing due to missing gdb on the travis image cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1 Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

...

Nir

Nir Soffer

5:53 p.m.

On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...

On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS..................................................................................................................................................................

...
...
---------------------------------------------------------------------- Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

...

py-watch seems to be failing due to missing gdb on the travis image

...

cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

...

Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix. Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test. We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora. We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this. Nir

Nir Soffer

10:55 p.m.

On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com>

wrote:

...
...
Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS..................................................................................................................................................................

...

...
...
...
Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

...
py-watch seems to be failing due to missing gdb on the travis image

...
cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

...

...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/ Nir

Nir Soffer

6 Jul 6 Jul

12:51 a.m.

On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com>

wrote:

...
...
Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS..................................................................................................................................................................

...

...
...
...
Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...

...
...
py-watch seems to be failing due to missing gdb on the travis image

...
cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561 ___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...

...
...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors. ___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...

Nir

Edward Haas

1:12 p.m.

I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest. On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com>

wrote:

...
> Dan, travis build still fail when renaming coverage file even after > your last patch. > > > > ...........................SS.SS............................ ............................................................ ............................................................ .............SS............................................. .....S.S................................S................... .............SS.....SS...................................... ......S...............SSS...S.....S......................... ....................S....................................... .........................SSS............SSSS..SSSSSSSSS.SS.. ............................................................ ............................................................ ........................................ > ------------------------------------------------------------

...
...
> Ran 1267 tests in 99.239s > OK (SKIP=63) > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 > make[1]: *** [check] Error 1 > make[1]: Leaving directory `/vdsm/tests' > ERROR: InvocationError: '/usr/bin/make -C tests check' > > https://travis-ci.org/oVirt/vdsm/jobs/399932012 > > Do you have any idea what is wrong there? > > Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...
...
...
py-watch seems to be failing due to missing gdb on the travis image

...
cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
...
...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Nir Soffer

2:35 p.m.

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...

I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"? We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...

On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote: > > On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote: > > Dan, travis build still fail when renaming coverage file even after > > your last patch. > > > > > > > > ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. > >

...
> > Ran 1267 tests in 99.239s > > OK (SKIP=63) > > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 > > make[1]: *** [check] Error 1 > > make[1]: Leaving directory `/vdsm/tests' > > ERROR: InvocationError: '/usr/bin/make -C tests check' > > > > https://travis-ci.org/oVirt/vdsm/jobs/399932012 > > > > Do you have any idea what is wrong there? > > > > Why we don't have any error message from the failed command? > > No idea, nothing pops to mind. > We can revert to the sillier [ -f .coverage ] condition instead of > understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...
...
...
py-watch seems to be failing due to missing gdb on the travis image

...
cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
...
...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Edward Haas

6:25 p.m.

...

On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote: I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

...

We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote: On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote: > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote: >> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote: >> > Dan, travis build still fail when renaming coverage file even after >> > your last patch. >> > >> > >> > >> > ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. >> > ---------------------------------------------------------------------- >> > Ran 1267 tests in 99.239s >> > OK (SKIP=63) >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 >> > make[1]: *** [check] Error 1 >> > make[1]: Leaving directory `/vdsm/tests' >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >> > >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >> > >> > Do you have any idea what is wrong there? >> > >> > Why we don't have any error message from the failed command? >> >> No idea, nothing pops to mind. >> We can revert to the sillier [ -f .coverage ] condition instead of >> understanding (yeah, this feels dirty) > > > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this > failure. > > Now we have failures for the pywatch_test, and some network > tests. Can someone from network look at this? > https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...
...
...
py-watch seems to be failing due to missing gdb on the travis image

cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
...
...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Nir Soffer

6:41 p.m.

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...

On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...
I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

...

We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote: > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote: >> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote: >> > Dan, travis build still fail when renaming coverage file even after >> > your last patch. >> > >> > >> > >> > ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. >> > ---------------------------------------------------------------------- >> > Ran 1267 tests in 99.239s >> > OK (SKIP=63) >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 >> > make[1]: *** [check] Error 1 >> > make[1]: Leaving directory `/vdsm/tests' >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >> > >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >> > >> > Do you have any idea what is wrong there? >> > >> > Why we don't have any error message from the failed command? >> >> No idea, nothing pops to mind. >> We can revert to the sillier [ -f .coverage ] condition instead of >> understanding (yeah, this feels dirty) > > > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this > failure. > > Now we have failures for the pywatch_test, and some network > tests. Can someone from network look at this? > https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...
...
...
py-watch seems to be failing due to missing gdb on the travis image

...
cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
...
...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Edward Haas

7:05 p.m.

...

On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote: I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/oVirt/vdsm

...

...
...
We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote: > On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote: >> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote: >> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote: >> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote: >> >> >> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> wrote: >> >> > Dan, travis build still fail when renaming coverage file even after >> >> > your last patch. >> >> > >> >> > >> >> > >> >> > ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. >> >> > ---------------------------------------------------------------------- >> >> > Ran 1267 tests in 99.239s >> >> > OK (SKIP=63) >> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 >> >> > make[1]: *** [check] Error 1 >> >> > make[1]: Leaving directory `/vdsm/tests' >> >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >> >> > >> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >> >> > >> >> > Do you have any idea what is wrong there? >> >> > >> >> > Why we don't have any error message from the failed command? >> >> >> >> No idea, nothing pops to mind. >> >> We can revert to the sillier [ -f .coverage ] condition instead of >> >> understanding (yeah, this feels dirty) >> > >> > >> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this >> > failure. >> > >> > Now we have failures for the pywatch_test, and some network >> > tests. Can someone from network look at this? >> > https://travis-ci.org/nirs/vdsm/builds/400204807 >> >> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows >> >> ConfigNetworkError: (21, 'Executing commands failed: >> ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge >> named vdsmbr_test already exists') >> >> which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea >> why it shows here? > > Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...
> >> py-watch seems to be failing due to missing gdb on the travis image >> >> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) >> cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback >> (most recent call last):\n File "./py-watch", line 60, in <module>\n >> dump_trace(watched_proc)\n File "./py-watch", line 32, in >> dump_trace\n \'thread apply all py-bt\'])\n File >> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in >> call\n p = Popen(*popenargs, **kwargs)\n File >> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in >> __init__\n restore_signals, start_new_session)\n File >> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in >> _execute_child\n raise child_exception_type(errno_num, >> err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; >> <rc> = 1 > > Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
> >> Nir, could you remind me what is "ERROR: InterpreterNotFound: >> python3.6" and how can we avoid it? it keeps distracting during >> debugging test failures. > > We can avoid it in travis using env matrix. > > Currently we run "make check" which run all the the tox envs > (e.g. storage-py27,storage-py36) regardless of the build type. This is good > for manual usage when you don't know which python version is available > on a developer machine. For example if I have python 3.7 installed, maybe > I like to test. > > We can change this so we will test only the *-py27 on centos, and both > *-py27 and *-py36 on Fedora. > > We can do the same in ovirt CI but it will be harder, we don't have a declerative > way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Nir Soffer

9:16 p.m.

On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas@redhat.com> wrote:

...

On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...
I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/oVirt/vdsm

This is indeed 4.2 build. Any commit in github is tested in travis. We would like to fix also the 4.2 builds, but first we need to fix master builds. You can see here that master build fail: https://travis-ci.org/oVirt/vdsm/builds Since we added gbd and python-debuginfo: https://travis-ci.org/oVirt/vdsm/builds/400644077 - centos build fail (network-py27) https://travis-ci.org/oVirt/vdsm/jobs/400644079 - fedora 28 build pass https://travis-ci.org/oVirt/vdsm/jobs/400644081 - fedora rawhide fail because we cannot rebuild the image, python-libblokdev is missing in rawhide. https://travis-ci.org/oVirt/vdsm/jobs/400644083 See https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/CDNETITY5RYOCQB... This is the first failing test: __________________ TestOvsApiBase.test_execute_a_transaction ___________________ self = <network.integration.ovs.ovs_driver_test.TestOvsApiBase testMethod=test_execute_a_transaction> def test_execute_a_transaction(self): ovsdb = create() cmd_add_br = ovsdb.add_br(TEST_BRIDGE) cmd_list_bridge_info = ovsdb.list_bridge_info() t = ovsdb.transaction() t.add(cmd_add_br) t.add(cmd_list_bridge_info)

...

t.commit()

network/integration/ovs/ovs_driver_test.py:51: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <vdsm.network.ovs.driver.vsctl.Transaction object at 0x7f72a7989b50> def commit(self): if not self.commands: return timeout_option = [] if self.timeout: timeout_option = ['--timeout=' + str(self.timeout)] args = [] for command in self.commands: args += ['--'] + command.cmd exec_line = [_ovs_vsctl_cmd()] + timeout_option + OUTPUT_FORMAT + args logging.debug('Executing commands: %s' % ' '.join(exec_line)) rc, out, err = netcmd.exec_sync(exec_line) if rc != 0: err = err.splitlines() if OvsDBConnectionError.is_ovs_db_conn_error(err): raise OvsDBConnectionError('\n'.join(err)) else: raise ConfigNetworkError( ne.ERR_BAD_PARAMS,

...

'Executing commands failed: %s' % '\n'.join(err))

E ConfigNetworkError: (21, 'Executing commands failed: 2018-07-05T22:24:25Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)') ../lib/vdsm/network/ovs/driver/vsctl.py:78: ConfigNetworkError ------------------------------ Captured log call ------------------------------- vsctl.py 68 DEBUG Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- add-br vdsmbr_test -- list Bridge cmdutils.py 151 DEBUG /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- add-br vdsmbr_test -- list Bridge (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = '2018-07-05T22:24:25Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)\n'; <rc> = -14 ---------------------------- Captured log teardown ----------------------------- vsctl.py 68 DEBUG Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge cmdutils.py 151 DEBUG /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge (cwd None) cmdutils.py 159 DEBUG SUCCESS: <err> = ''; <rc> = 0 It failed with a timeout. The second test fail because the bridge already exists: ____________ TestOvsApiWithSingleRealBridge.test_create_remove_bond ____________ self = <network.integration.ovs.ovs_driver_test.TestOvsApiWithSingleRealBridge testMethod=test_create_remove_bond> def setUp(self): self.ovsdb = create()

...

self.ovsdb.add_br(TEST_BRIDGE).execute()

network/integration/ovs/ovs_driver_test.py:69: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../lib/vdsm/network/ovs/driver/vsctl.py:99: in execute t.add(self) ../lib/vdsm/network/ovs/driver/__init__.py:46: in __exit__ self.result = self.commit() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <vdsm.network.ovs.driver.vsctl.Transaction object at 0x7f72a7989210> def commit(self): if not self.commands: return timeout_option = [] if self.timeout: timeout_option = ['--timeout=' + str(self.timeout)] args = [] for command in self.commands: args += ['--'] + command.cmd exec_line = [_ovs_vsctl_cmd()] + timeout_option + OUTPUT_FORMAT + args logging.debug('Executing commands: %s' % ' '.join(exec_line)) rc, out, err = netcmd.exec_sync(exec_line) if rc != 0: err = err.splitlines() if OvsDBConnectionError.is_ovs_db_conn_error(err): raise OvsDBConnectionError('\n'.join(err)) else: raise ConfigNetworkError( ne.ERR_BAD_PARAMS,

...

'Executing commands failed: %s' % '\n'.join(err))

E ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists') ../lib/vdsm/network/ovs/driver/vsctl.py:78: ConfigNetworkError ------------------------------ Captured log call ------------------------------- vsctl.py 68 DEBUG Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- add-br vdsmbr_test cmdutils.py 151 DEBUG /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- add-br vdsmbr_test (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists\n'; <rc> = 1 ---------------------------- Captured log teardown ----------------------------- vsctl.py 68 DEBUG Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge cmdutils.py 151 DEBUG /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge (cwd None) cmdutils.py 159 DEBUG SUCCESS: <err> = ''; <rc> = 0 I don't know anything about these tests, but this failure looks like: 1. first test has a timeout 2. first test cleanup did not run because the cleanup code is not correct 3. second test fail because the first test did not clean up This looks like real issue in the code.

...

...
We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> > wrote: > > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> > wrote: > >> > >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> > wrote: > >> > Dan, travis build still fail when renaming coverage file even > after > >> > your last patch. > >> > > >> > > >> > > >> > > ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. > >> > > ---------------------------------------------------------------------- > >> > Ran 1267 tests in 99.239s > >> > OK (SKIP=63) > >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 > >> > make[1]: *** [check] Error 1 > >> > make[1]: Leaving directory `/vdsm/tests' > >> > ERROR: InvocationError: '/usr/bin/make -C tests check' > >> > > >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 > >> > > >> > Do you have any idea what is wrong there? > >> > > >> > Why we don't have any error message from the failed command? > >> > >> No idea, nothing pops to mind. > >> We can revert to the sillier [ -f .coverage ] condition instead of > >> understanding (yeah, this feels dirty) > > > > > > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed > this > > failure. > > > > Now we have failures for the pywatch_test, and some network > > tests. Can someone from network look at this? > > https://travis-ci.org/nirs/vdsm/builds/400204807 > > https://travis-ci.org/nirs/vdsm/jobs/400204808 shows > > ConfigNetworkError: (21, 'Executing commands failed: > ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge > named vdsmbr_test already exists') > > which I thought was limited to dirty ovirt-ci jenkins slaves. Any > idea > why it shows here? >

Maybe one failed test leave dirty host to the next test?

network tests fail now only on CentOS now.

...
...
> py-watch seems to be failing due to missing gdb on the travis image

> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd > None) > cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback > (most recent call last):\n File "./py-watch", line 60, in <module>\n > dump_trace(watched_proc)\n File "./py-watch", line 32, in > dump_trace\n \'thread apply all py-bt\'])\n File > "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in > call\n p = Popen(*popenargs, **kwargs)\n File > "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in > __init__\n restore_signals, start_new_session)\n File > "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in > _execute_child\n raise child_exception_type(errno_num, > err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; > <rc> = 1 >

Cool, easy fix.

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
...
> Nir, could you remind me what is "ERROR: InterpreterNotFound: > python3.6" and how can we avoid it? it keeps distracting during > debugging test failures. >

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Edward Haas

7 Jul 7 Jul

9:02 a.m.

On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...
I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/ oVirt/vdsm

This is indeed 4.2 build. Any commit in github is tested in travis. We would like to fix also the 4.2 builds, but first we need to fix master builds.

You can see here that master build fail: https://travis-ci.org/oVirt/vdsm/builds

Since we added gbd and python-debuginfo: https://travis-ci.org/oVirt/vdsm/builds/400644077

- centos build fail (network-py27) https://travis-ci.org/oVirt/vdsm/jobs/400644079

- fedora 28 build pass https://travis-ci.org/oVirt/vdsm/jobs/400644081

- fedora rawhide fail because we cannot rebuild the image, python-libblokdev is missing in rawhide. https://travis-ci.org/oVirt/vdsm/jobs/400644083 See https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/ CDNETITY5RYOCQBIQQF2NUF6RAHGJRPW/

I don't know anything about these tests, but this failure looks like:

1. first test has a timeout 2. first test cleanup did not run because the cleanup code is not correct 3. second test fail because the first test did not clean up

This looks like real issue in the code.

This is the same problem we had on oVirt CI, there are linux bridges on the node. I have posted a patch to fail earlier and how the real problem: https://gerrit.ovirt.org/#/c/92867/ The travis-ci run for it is here: https://travis-ci.org/EdDev/vdsm/jobs/401143906 This is the problem: cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl --system-id=random start (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is in use by: br_netfilter\n'; <rc> = 1 Any idea who is creating the "br_netfilter" bridge? I guess this is travis-ci related. Thanks, Edy.

...

...
...
We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> wrote:

> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> > wrote: > >> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> >> wrote: >> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> >> wrote: >> >> >> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> >> wrote: >> >> > Dan, travis build still fail when renaming coverage file even >> after >> >> > your last patch. >> >> > >> >> > >> >> > >> >> > ...........................SS.SS............................ >> ............................................................ >> ............................................................ >> .............SS............................................. >> .....S.S................................S................... >> .............SS.....SS...................................... >> ......S...............SSS...S.....S......................... >> ....................S....................................... >> .........................SSS............SSSS..SSSSSSSSS.SS.. >> ............................................................ >> ............................................................ >> ........................................ >> >> > ------------------------------------------------------------ >> ---------- >> >> > Ran 1267 tests in 99.239s >> >> > OK (SKIP=63) >> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 >> >> > make[1]: *** [check] Error 1 >> >> > make[1]: Leaving directory `/vdsm/tests' >> >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >> >> > >> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >> >> > >> >> > Do you have any idea what is wrong there? >> >> > >> >> > Why we don't have any error message from the failed command? >> >> >> >> No idea, nothing pops to mind. >> >> We can revert to the sillier [ -f .coverage ] condition instead >> of >> >> understanding (yeah, this feels dirty) >> > >> > >> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed >> this >> > failure. >> > >> > Now we have failures for the pywatch_test, and some network >> > tests. Can someone from network look at this? >> > https://travis-ci.org/nirs/vdsm/builds/400204807 >> >> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows >> >> ConfigNetworkError: (21, 'Executing commands failed: >> ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge >> named vdsmbr_test already exists') >> >> which I thought was limited to dirty ovirt-ci jenkins slaves. Any >> idea >> why it shows here? >> > > Maybe one failed test leave dirty host to the next test? >

network tests fail now only on CentOS now.

...
> >> py-watch seems to be failing due to missing gdb on the travis image > > >> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 >> (cwd None) >> cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback >> (most recent call last):\n File "./py-watch", line 60, in >> <module>\n >> dump_trace(watched_proc)\n File "./py-watch", line 32, in >> dump_trace\n \'thread apply all py-bt\'])\n File >> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in >> call\n p = Popen(*popenargs, **kwargs)\n File >> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in >> __init__\n restore_signals, start_new_session)\n File >> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in >> _execute_child\n raise child_exception_type(errno_num, >> err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; >> <rc> = 1 >> > > Cool, easy fix. >

Fixed by https://gerrit.ovirt.org/#/c/92846/

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

...
> > >> Nir, could you remind me what is "ERROR: InterpreterNotFound: >> python3.6" and how can we avoid it? it keeps distracting during >> debugging test failures. >> > > We can avoid it in travis using env matrix. > > Currently we run "make check" which run all the the tox envs > (e.g. storage-py27,storage-py36) regardless of the build type. This > is good > for manual usage when you don't know which python version is > available > on a developer machine. For example if I have python 3.7 installed, > maybe > I like to test. > > We can change this so we will test only the *-py27 on centos, and > both > *-py27 and *-py36 on Fedora. > > We can do the same in ovirt CI but it will be harder, we don't have > a declerative > way to configure this. >

Fixed all builds using --enable-python3: https://gerrit.ovirt.org/#/c/92847/

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

...
Nir

Edward Haas

9:10 a.m.

On Sat, Jul 7, 2018 at 9:02 AM, Edward Haas <ehaas@redhat.com> wrote:

...

On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...
I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/oV irt/vdsm

This is indeed 4.2 build. Any commit in github is tested in travis. We would like to fix also the 4.2 builds, but first we need to fix master builds.

You can see here that master build fail: https://travis-ci.org/oVirt/vdsm/builds

Since we added gbd and python-debuginfo: https://travis-ci.org/oVirt/vdsm/builds/400644077

- centos build fail (network-py27) https://travis-ci.org/oVirt/vdsm/jobs/400644079

- fedora 28 build pass https://travis-ci.org/oVirt/vdsm/jobs/400644081

- fedora rawhide fail because we cannot rebuild the image, python-libblokdev is missing in rawhide. https://travis-ci.org/oVirt/vdsm/jobs/400644083 See https://lists.ovirt.org/archives/list/devel@ovirt.org/th read/CDNETITY5RYOCQBIQQF2NUF6RAHGJRPW/

I don't know anything about these tests, but this failure looks like:

1. first test has a timeout 2. first test cleanup did not run because the cleanup code is not correct 3. second test fail because the first test did not clean up

This looks like real issue in the code.

This is the same problem we had on oVirt CI, there are linux bridges on the node. I have posted a patch to fail earlier and how the real problem: https://gerrit.ovirt.org/#/c/92867/ The travis-ci run for it is here: https://travis-ci.org/EdDev/ vdsm/jobs/401143906 This is the problem: cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl --system-id=random start (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is in use by: br_netfilter\n'; <rc> = 1

Any idea who is creating the "br_netfilter" bridge? I guess this is travis-ci related.

Actually, this may be Docker or some other package that is installed/setup on it. How can I run the docker with the tests locally to debug this?

...

Thanks,

...

Edy.

...
...
...
We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> wrote:

> On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> > wrote: > >> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> >> wrote: >> >>> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> >>> wrote: >>> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> >>> wrote: >>> >> >>> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com> >>> wrote: >>> >> > Dan, travis build still fail when renaming coverage file even >>> after >>> >> > your last patch. >>> >> > >>> >> > >>> >> > >>> >> > ...........................SS.SS............................ >>> ............................................................ >>> ............................................................ >>> .............SS............................................. >>> .....S.S................................S................... >>> .............SS.....SS...................................... >>> ......S...............SSS...S.....S......................... >>> ....................S....................................... >>> .........................SSS............SSSS..SSSSSSSSS.SS.. >>> ............................................................ >>> ............................................................ >>> ........................................ >>> >> > ------------------------------------------------------------ >>> ---------- >>> >> > Ran 1267 tests in 99.239s >>> >> > OK (SKIP=63) >>> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage >>> .coverage-nose-py2 >>> >> > make[1]: *** [check] Error 1 >>> >> > make[1]: Leaving directory `/vdsm/tests' >>> >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >>> >> > >>> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >>> >> > >>> >> > Do you have any idea what is wrong there? >>> >> > >>> >> > Why we don't have any error message from the failed command? >>> >> >>> >> No idea, nothing pops to mind. >>> >> We can revert to the sillier [ -f .coverage ] condition instead >>> of >>> >> understanding (yeah, this feels dirty) >>> > >>> > >>> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed >>> this >>> > failure. >>> > >>> > Now we have failures for the pywatch_test, and some network >>> > tests. Can someone from network look at this? >>> > https://travis-ci.org/nirs/vdsm/builds/400204807 >>> >>> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows >>> >>> ConfigNetworkError: (21, 'Executing commands failed: >>> ovs-vsctl: cannot create a bridge named vdsmbr_test because a >>> bridge >>> named vdsmbr_test already exists') >>> >>> which I thought was limited to dirty ovirt-ci jenkins slaves. Any >>> idea >>> why it shows here? >>> >> >> Maybe one failed test leave dirty host to the next test? >> > network tests fail now only on CentOS now.

> >> >>> py-watch seems to be failing due to missing gdb on the travis image >> >> >>> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 >>> (cwd None) >>> cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback >>> (most recent call last):\n File "./py-watch", line 60, in >>> <module>\n >>> dump_trace(watched_proc)\n File "./py-watch", line 32, in >>> dump_trace\n \'thread apply all py-bt\'])\n File >>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in >>> call\n p = Popen(*popenargs, **kwargs)\n File >>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in >>> __init__\n restore_signals, start_new_session)\n File >>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, >>> in >>> _execute_child\n raise child_exception_type(errno_num, >>> err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; >>> <rc> = 1 >>> >> >> Cool, easy fix. >> > > Fixed by https://gerrit.ovirt.org/#/c/92846/ >

Fedora 28 build is green with this change: https://travis-ci.org/nirs/vdsm/jobs/400549561

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded storage-py36: commands succeeded lib-py27: commands succeeded lib-py36: commands succeeded network-py27: commands succeeded network-py36: commands succeeded virt-py27: commands succeeded virt-py36: commands succeeded congratulations :)

> >> >> >>> Nir, could you remind me what is "ERROR: InterpreterNotFound: >>> python3.6" and how can we avoid it? it keeps distracting during >>> debugging test failures. >>> >> >> We can avoid it in travis using env matrix. >> >> Currently we run "make check" which run all the the tox envs >> (e.g. storage-py27,storage-py36) regardless of the build type. This >> is good >> for manual usage when you don't know which python version is >> available >> on a developer machine. For example if I have python 3.7 installed, >> maybe >> I like to test. >> >> We can change this so we will test only the *-py27 on centos, and >> both >> *-py27 and *-py36 on Fedora. >> >> We can do the same in ovirt CI but it will be harder, we don't have >> a declerative >> way to configure this. >> > > Fixed all builds using --enable-python3: > https://gerrit.ovirt.org/#/c/92847/ >

Here is an example from CentOS build - no false errors.

___________________________________ summary ____________________________________ tests: commands succeeded storage-py27: commands succeeded lib-py27: commands succeeded ERROR: network-py27: commands failed virt-py27: commands succeeded make: *** [tests] Error 1 make: *** Waiting for unfinished jobs.... ___________________________________ summary ____________________________________ pylint: commands succeeded congratulations :)

> > Nir >

Greg Sheremeta

1:36 p.m.

Hey, I was in a similar same place with docker networking and webdriver in Jenkins a few months ago :) Best way to mimic CI locally is using mock_runner. [I tried on a fresh machine yesterday and got stuck -- will try again soon, it runs on my old machine great ] https://ovirt-jira.atlassian.net/browse/OVIRT-2275 https://www.ovirt.org/blog/2017/01/ovirt-system-tests-to-the-rescue/ The magic incantation on my older machine (for build-artifacts, but the last flag switches the stage): cd /my/project ../jenkins/mock_configs/mock_runner.sh --mock-confs-dir ../jenkins/mock_configs fc28:fedora-28-x86_64 --build-only Best wishes, Greg On Sat, Jul 7, 2018 at 2:45 AM Edward Haas <ehaas@redhat.com> wrote:

...

On Sat, Jul 7, 2018 at 9:02 AM, Edward Haas <ehaas@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...
I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/oVirt/vdsm

This is indeed 4.2 build. Any commit in github is tested in travis. We would like to fix also the 4.2 builds, but first we need to fix master builds.

You can see here that master build fail: https://travis-ci.org/oVirt/vdsm/builds

Since we added gbd and python-debuginfo: https://travis-ci.org/oVirt/vdsm/builds/400644077

- centos build fail (network-py27) https://travis-ci.org/oVirt/vdsm/jobs/400644079

- fedora 28 build pass https://travis-ci.org/oVirt/vdsm/jobs/400644081

- fedora rawhide fail because we cannot rebuild the image, python-libblokdev is missing in rawhide. https://travis-ci.org/oVirt/vdsm/jobs/400644083 See https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/CDNETITY5RYOCQB...

I don't know anything about these tests, but this failure looks like:

1. first test has a timeout 2. first test cleanup did not run because the cleanup code is not correct 3. second test fail because the first test did not clean up

This looks like real issue in the code.

This is the same problem we had on oVirt CI, there are linux bridges on the node. I have posted a patch to fail earlier and how the real problem: https://gerrit.ovirt.org/#/c/92867/ The travis-ci run for it is here: https://travis-ci.org/EdDev/vdsm/jobs/401143906 This is the problem: cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl --system-id=random start (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is in use by: br_netfilter\n'; <rc> = 1

Any idea who is creating the "br_netfilter" bridge? I guess this is travis-ci related.

Actually, this may be Docker or some other package that is installed/setup on it. How can I run the docker with the tests locally to debug this?

...
Thanks,

...
Edy.

...
...
...
We run "make check" both in travis (.travis.yml) and ovirt ci (automation/check-patch.sh)

...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

> > > On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> > wrote: > >> On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> >> wrote: >> >>> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> >>> wrote: >>> >>>> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> >>>> wrote: >>>> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg < >>>> danken@redhat.com> wrote: >>>> >> >>>> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer < >>>> nsoffer@redhat.com> wrote: >>>> >> > Dan, travis build still fail when renaming coverage file >>>> even after >>>> >> > your last patch. >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. >>>> >> > >>>> ---------------------------------------------------------------------- >>>> >> > Ran 1267 tests in 99.239s >>>> >> > OK (SKIP=63) >>>> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage >>>> .coverage-nose-py2 >>>> >> > make[1]: *** [check] Error 1 >>>> >> > make[1]: Leaving directory `/vdsm/tests' >>>> >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >>>> >> > >>>> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >>>> >> > >>>> >> > Do you have any idea what is wrong there? >>>> >> > >>>> >> > Why we don't have any error message from the failed command? >>>> >> >>>> >> No idea, nothing pops to mind. >>>> >> We can revert to the sillier [ -f .coverage ] condition >>>> instead of >>>> >> understanding (yeah, this feels dirty) >>>> > >>>> > >>>> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed >>>> this >>>> > failure. >>>> > >>>> > Now we have failures for the pywatch_test, and some network >>>> > tests. Can someone from network look at this? >>>> > https://travis-ci.org/nirs/vdsm/builds/400204807 >>>> >>>> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows >>>> >>>> ConfigNetworkError: (21, 'Executing commands failed: >>>> ovs-vsctl: cannot create a bridge named vdsmbr_test because a >>>> bridge >>>> named vdsmbr_test already exists') >>>> >>>> which I thought was limited to dirty ovirt-ci jenkins slaves. Any >>>> idea >>>> why it shows here? >>>> >>> >>> Maybe one failed test leave dirty host to the next test? >>> >> > network tests fail now only on CentOS now. > > >> >>> >>>> py-watch seems to be failing due to missing gdb on the travis >>>> image >>> >>> >>>> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 >>>> (cwd None) >>>> cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback >>>> (most recent call last):\n File "./py-watch", line 60, in >>>> <module>\n >>>> dump_trace(watched_proc)\n File "./py-watch", line 32, in >>>> dump_trace\n \'thread apply all py-bt\'])\n File >>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in >>>> call\n p = Popen(*popenargs, **kwargs)\n File >>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in >>>> __init__\n restore_signals, start_new_session)\n File >>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, >>>> in >>>> _execute_child\n raise child_exception_type(errno_num, >>>> err_msg)\nOSError: [Errno 2] No such file or directory: >>>> \'gdb\'\n'; >>>> <rc> = 1 >>>> >>> >>> Cool, easy fix. >>> >> >> Fixed by https://gerrit.ovirt.org/#/c/92846/ >> > > Fedora 28 build is green with this change: > https://travis-ci.org/nirs/vdsm/jobs/400549561 > > > > ___________________________________ summary ____________________________________ > tests: commands succeeded > storage-py27: commands succeeded > storage-py36: commands succeeded > lib-py27: commands succeeded > lib-py36: commands succeeded > network-py27: commands succeeded > network-py36: commands succeeded > virt-py27: commands succeeded > virt-py36: commands succeeded > congratulations :) > > > >> >>> >>> >>>> Nir, could you remind me what is "ERROR: InterpreterNotFound: >>>> python3.6" and how can we avoid it? it keeps distracting during >>>> debugging test failures. >>>> >>> >>> We can avoid it in travis using env matrix. >>> >>> Currently we run "make check" which run all the the tox envs >>> (e.g. storage-py27,storage-py36) regardless of the build type. >>> This is good >>> for manual usage when you don't know which python version is >>> available >>> on a developer machine. For example if I have python 3.7 >>> installed, maybe >>> I like to test. >>> >>> We can change this so we will test only the *-py27 on centos, and >>> both >>> *-py27 and *-py36 on Fedora. >>> >>> We can do the same in ovirt CI but it will be harder, we don't >>> have a declerative >>> way to configure this. >>> >> >> Fixed all builds using --enable-python3: >> https://gerrit.ovirt.org/#/c/92847/ >> > > Here is an example from CentOS build - no false errors. > > ___________________________________ summary ____________________________________ > tests: commands succeeded > storage-py27: commands succeeded > lib-py27: commands succeeded > ERROR: network-py27: commands failed > virt-py27: commands succeeded > make: *** [tests] Error 1 > make: *** Waiting for unfinished jobs.... > ___________________________________ summary ____________________________________ > pylint: commands succeeded > congratulations :) > > > > >> >> Nir >> >

_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MBZJ2YZCGALBXD...

-- GREG SHEREMETA SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX Red Hat NA <https://www.redhat.com/> gshereme@redhat.com IRC: gshereme <https://red.ht/sig>

Nir Soffer

8 Jul 8 Jul

1:42 p.m.

On Sat, Jul 7, 2018 at 9:11 AM Edward Haas <ehaas@redhat.com> wrote:

...

On Sat, Jul 7, 2018 at 9:02 AM, Edward Haas <ehaas@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

...
I do not know if it is relevant or not, but the tests that travis runs for master are taken from the 4.2 branch. OVS tests are now running using pytest.

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/oVirt/vdsm

This is indeed 4.2 build. Any commit in github is tested in travis. We would like to fix also the 4.2 builds, but first we need to fix master builds.

You can see here that master build fail: https://travis-ci.org/oVirt/vdsm/builds

Since we added gbd and python-debuginfo: https://travis-ci.org/oVirt/vdsm/builds/400644077

- centos build fail (network-py27) https://travis-ci.org/oVirt/vdsm/jobs/400644079

- fedora 28 build pass https://travis-ci.org/oVirt/vdsm/jobs/400644081

- fedora rawhide fail because we cannot rebuild the image, python-libblokdev is missing in rawhide. https://travis-ci.org/oVirt/vdsm/jobs/400644083 See https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/CDNETITY5RYOCQB...

I don't know anything about these tests, but this failure looks like:

1. first test has a timeout 2. first test cleanup did not run because the cleanup code is not correct 3. second test fail because the first test did not clean up

This looks like real issue in the code.

This is the same problem we had on oVirt CI, there are linux bridges on the node. I have posted a patch to fail earlier and how the real problem: https://gerrit.ovirt.org/#/c/92867/ The travis-ci run for it is here: https://travis-ci.org/EdDev/vdsm/jobs/401143906 This is the problem: cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl --system-id=random start (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is in use by: br_netfilter\n'; <rc> = 1

Who is using rmmod?

...

Any idea who is creating the "br_netfilter" bridge? I guess this is

...
travis-ci related.

Why do we care about br_netfilter? do we require a system without any bridge?

...

Actually, this may be Docker or some other package that is installed/setup on it. How can I run the docker with the tests locally to debug this?

Run this in vdsm root directory (copied from .travis.yml): export DOCKER_IMAGE=ovirtorg/vdsm-test-centos docker pull $DOCKER_IMAGE docker run \ --env TRAVIS_CI=1 \ --privileged \ --rm \ -it \ -v `pwd`:/vdsm:Z \ $DOCKER_IMAGE \ bash -c "cd /vdsm && ./autogen.sh --system && make && make --jobs=2 check" Since this is privileged container, you probably want to run this inside a vm.

...

We run "make check" both in travis (.travis.yml) and ovirt ci

...
...
...
...
(automation/check-patch.sh)

...
On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> wrote:

> > > On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> > wrote: > >> On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> >> wrote: >> >>> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> >>> wrote: >>> >>>> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> >>>> wrote: >>>> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg < >>>> danken@redhat.com> wrote: >>>> >> >>>> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer < >>>> nsoffer@redhat.com> wrote: >>>> >> > Dan, travis build still fail when renaming coverage file >>>> even after >>>> >> > your last patch. >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> ...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS.................................................................................................................................................................. >>>> >> > >>>> ---------------------------------------------------------------------- >>>> >> > Ran 1267 tests in 99.239s >>>> >> > OK (SKIP=63) >>>> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage >>>> .coverage-nose-py2 >>>> >> > make[1]: *** [check] Error 1 >>>> >> > make[1]: Leaving directory `/vdsm/tests' >>>> >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >>>> >> > >>>> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >>>> >> > >>>> >> > Do you have any idea what is wrong there? >>>> >> > >>>> >> > Why we don't have any error message from the failed command? >>>> >> >>>> >> No idea, nothing pops to mind. >>>> >> We can revert to the sillier [ -f .coverage ] condition >>>> instead of >>>> >> understanding (yeah, this feels dirty) >>>> > >>>> > >>>> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed >>>> this >>>> > failure. >>>> > >>>> > Now we have failures for the pywatch_test, and some network >>>> > tests. Can someone from network look at this? >>>> > https://travis-ci.org/nirs/vdsm/builds/400204807 >>>> >>>> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows >>>> >>>> ConfigNetworkError: (21, 'Executing commands failed: >>>> ovs-vsctl: cannot create a bridge named vdsmbr_test because a >>>> bridge >>>> named vdsmbr_test already exists') >>>> >>>> which I thought was limited to dirty ovirt-ci jenkins slaves. Any >>>> idea >>>> why it shows here? >>>> >>> >>> Maybe one failed test leave dirty host to the next test? >>> >> > network tests fail now only on CentOS now. > > >> >>> >>>> py-watch seems to be failing due to missing gdb on the travis >>>> image >>> >>> >>>> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 >>>> (cwd None) >>>> cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback >>>> (most recent call last):\n File "./py-watch", line 60, in >>>> <module>\n >>>> dump_trace(watched_proc)\n File "./py-watch", line 32, in >>>> dump_trace\n \'thread apply all py-bt\'])\n File >>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in >>>> call\n p = Popen(*popenargs, **kwargs)\n File >>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in >>>> __init__\n restore_signals, start_new_session)\n File >>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, >>>> in >>>> _execute_child\n raise child_exception_type(errno_num, >>>> err_msg)\nOSError: [Errno 2] No such file or directory: >>>> \'gdb\'\n'; >>>> <rc> = 1 >>>> >>> >>> Cool, easy fix. >>> >> >> Fixed by https://gerrit.ovirt.org/#/c/92846/ >> > > Fedora 28 build is green with this change: > https://travis-ci.org/nirs/vdsm/jobs/400549561 > > > > ___________________________________ summary ____________________________________ > tests: commands succeeded > storage-py27: commands succeeded > storage-py36: commands succeeded > lib-py27: commands succeeded > lib-py36: commands succeeded > network-py27: commands succeeded > network-py36: commands succeeded > virt-py27: commands succeeded > virt-py36: commands succeeded > congratulations :) > > > >> >>> >>> >>>> Nir, could you remind me what is "ERROR: InterpreterNotFound: >>>> python3.6" and how can we avoid it? it keeps distracting during >>>> debugging test failures. >>>> >>> >>> We can avoid it in travis using env matrix. >>> >>> Currently we run "make check" which run all the the tox envs >>> (e.g. storage-py27,storage-py36) regardless of the build type. >>> This is good >>> for manual usage when you don't know which python version is >>> available >>> on a developer machine. For example if I have python 3.7 >>> installed, maybe >>> I like to test. >>> >>> We can change this so we will test only the *-py27 on centos, and >>> both >>> *-py27 and *-py36 on Fedora. >>> >>> We can do the same in ovirt CI but it will be harder, we don't >>> have a declerative >>> way to configure this. >>> >> >> Fixed all builds using --enable-python3: >> https://gerrit.ovirt.org/#/c/92847/ >> > > Here is an example from CentOS build - no false errors. > > ___________________________________ summary ____________________________________ > tests: commands succeeded > storage-py27: commands succeeded > lib-py27: commands succeeded > ERROR: network-py27: commands failed > virt-py27: commands succeeded > make: *** [tests] Error 1 > make: *** Waiting for unfinished jobs.... > ___________________________________ summary ____________________________________ > pylint: commands succeeded > congratulations :) > > > > >> >> Nir >> >

Edward Haas

4:40 p.m.

On Sun, Jul 8, 2018 at 1:42 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...

On Sat, Jul 7, 2018 at 9:11 AM Edward Haas <ehaas@redhat.com> wrote:

...
On Sat, Jul 7, 2018 at 9:02 AM, Edward Haas <ehaas@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas@redhat.com> wrote:

...
On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer@redhat.com> wrote:

On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas@redhat.com> wrote:

> I do not know if it is relevant or not, but the tests that travis > runs for master are taken from the 4.2 branch. > OVS tests are now running using pytest. >

What do you mean by "taken from 4.2 branch"?

I mean that the branch checked out is 4.2 and not master. It even says so on the console output.

Can you share the url of that build?

I just clicked the icon on the vdsm repo: https://travis-ci.org/ oVirt/vdsm

This is indeed 4.2 build. Any commit in github is tested in travis. We would like to fix also the 4.2 builds, but first we need to fix master builds.

You can see here that master build fail: https://travis-ci.org/oVirt/vdsm/builds

Since we added gbd and python-debuginfo: https://travis-ci.org/oVirt/vdsm/builds/400644077

- centos build fail (network-py27) https://travis-ci.org/oVirt/vdsm/jobs/400644079

- fedora 28 build pass https://travis-ci.org/oVirt/vdsm/jobs/400644081

- fedora rawhide fail because we cannot rebuild the image, python-libblokdev is missing in rawhide. https://travis-ci.org/oVirt/vdsm/jobs/400644083 See https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/ CDNETITY5RYOCQBIQQF2NUF6RAHGJRPW/

I don't know anything about these tests, but this failure looks like:

1. first test has a timeout 2. first test cleanup did not run because the cleanup code is not correct 3. second test fail because the first test did not clean up

This looks like real issue in the code.

This is the same problem we had on oVirt CI, there are linux bridges on the node. I have posted a patch to fail earlier and how the real problem: https://gerrit.ovirt.org/#/c/92867/ The travis-ci run for it is here: https://travis-ci.org/EdDev/ vdsm/jobs/401143906 This is the problem: cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl --system-id=random start (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is in use by: br_netfilter\n'; <rc> = 1

Who is using rmmod?

The ovs service is trying to load the ovs kmod, for doing so it needs to take down the bridge one and reload it after the ovs one.

...

...
Any idea who is creating the "br_netfilter" bridge? I guess this is

...
travis-ci related.

Why do we care about br_netfilter? do we require a system without any bridge?

Yes, in case ovs kmod has not been loaded in advance.

...

...
Actually, this may be Docker or some other package that is installed/setup on it. How can I run the docker with the tests locally to debug this?

Run this in vdsm root directory (copied from .travis.yml):

export DOCKER_IMAGE=ovirtorg/vdsm-test-centos

docker pull $DOCKER_IMAGE

docker run \ --env TRAVIS_CI=1 \ --privileged \ --rm \ -it \ -v `pwd`:/vdsm:Z \ $DOCKER_IMAGE \ bash -c "cd /vdsm && ./autogen.sh --system && make && make --jobs=2 check"

Since this is privileged container, you probably want to run this inside a vm.

OK, will try. But I think the kmod is up to the machine the docker runs in, so in this case it is the travis slave.

...

...
We run "make check" both in travis (.travis.yml) and ovirt ci

...
...
...
...
(automation/check-patch.sh)

> > > On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer <nsoffer@redhat.com> > wrote: > >> >> >> On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer <nsoffer@redhat.com> >> wrote: >> >>> On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer <nsoffer@redhat.com> >>> wrote: >>> >>>> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> >>>> wrote: >>>> >>>>> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> >>>>> wrote: >>>>> > On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg < >>>>> danken@redhat.com> wrote: >>>>> >> >>>>> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer < >>>>> nsoffer@redhat.com> wrote: >>>>> >> > Dan, travis build still fail when renaming coverage file >>>>> even after >>>>> >> > your last patch. >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > ...........................SS. >>>>> SS.......................................................... >>>>> ............................................................ >>>>> ...........................................SS............... >>>>> ...................................S.S...................... >>>>> ..........S................................SS.....SS........ >>>>> ....................................S...............SSS...S. >>>>> ....S.............................................S......... >>>>> .......................................................SSS.. >>>>> ..........SSSS..SSSSSSSSS.SS................................ >>>>> ............................................................ >>>>> ............................................................ >>>>> .......... >>>>> >> > ------------------------------ >>>>> ---------------------------------------- >>>>> >> > Ran 1267 tests in 99.239s >>>>> >> > OK (SKIP=63) >>>>> >> > [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage >>>>> .coverage-nose-py2 >>>>> >> > make[1]: *** [check] Error 1 >>>>> >> > make[1]: Leaving directory `/vdsm/tests' >>>>> >> > ERROR: InvocationError: '/usr/bin/make -C tests check' >>>>> >> > >>>>> >> > https://travis-ci.org/oVirt/vdsm/jobs/399932012 >>>>> >> > >>>>> >> > Do you have any idea what is wrong there? >>>>> >> > >>>>> >> > Why we don't have any error message from the failed command? >>>>> >> >>>>> >> No idea, nothing pops to mind. >>>>> >> We can revert to the sillier [ -f .coverage ] condition >>>>> instead of >>>>> >> understanding (yeah, this feels dirty) >>>>> > >>>>> > >>>>> > Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) >>>>> fixed this >>>>> > failure. >>>>> > >>>>> > Now we have failures for the pywatch_test, and some network >>>>> > tests. Can someone from network look at this? >>>>> > https://travis-ci.org/nirs/vdsm/builds/400204807 >>>>> >>>>> https://travis-ci.org/nirs/vdsm/jobs/400204808 shows >>>>> >>>>> ConfigNetworkError: (21, 'Executing commands >>>>> failed: >>>>> ovs-vsctl: cannot create a bridge named vdsmbr_test because a >>>>> bridge >>>>> named vdsmbr_test already exists') >>>>> >>>>> which I thought was limited to dirty ovirt-ci jenkins slaves. >>>>> Any idea >>>>> why it shows here? >>>>> >>>> >>>> Maybe one failed test leave dirty host to the next test? >>>> >>> >> network tests fail now only on CentOS now. >> >> >>> >>>> >>>>> py-watch seems to be failing due to missing gdb on the travis >>>>> image >>>> >>>> >>>>> cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 >>>>> (cwd None) >>>>> cmdutils.py 159 DEBUG FAILED: <err> = >>>>> 'Traceback >>>>> (most recent call last):\n File "./py-watch", line 60, in >>>>> <module>\n >>>>> dump_trace(watched_proc)\n File "./py-watch", line 32, in >>>>> dump_trace\n \'thread apply all py-bt\'])\n File >>>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, >>>>> in >>>>> call\n p = Popen(*popenargs, **kwargs)\n File >>>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, >>>>> in >>>>> __init__\n restore_signals, start_new_session)\n File >>>>> "/usr/lib64/python2.7/site-packages/subprocess32.py", line >>>>> 1567, in >>>>> _execute_child\n raise child_exception_type(errno_num, >>>>> err_msg)\nOSError: [Errno 2] No such file or directory: >>>>> \'gdb\'\n'; >>>>> <rc> = 1 >>>>> >>>> >>>> Cool, easy fix. >>>> >>> >>> Fixed by https://gerrit.ovirt.org/#/c/92846/ >>> >> >> Fedora 28 build is green with this change: >> https://travis-ci.org/nirs/vdsm/jobs/400549561 >> >> >> >> ___________________________________ summary ____________________________________ >> tests: commands succeeded >> storage-py27: commands succeeded >> storage-py36: commands succeeded >> lib-py27: commands succeeded >> lib-py36: commands succeeded >> network-py27: commands succeeded >> network-py36: commands succeeded >> virt-py27: commands succeeded >> virt-py36: commands succeeded >> congratulations :) >> >> >> >>> >>>> >>>> >>>>> Nir, could you remind me what is "ERROR: InterpreterNotFound: >>>>> python3.6" and how can we avoid it? it keeps distracting during >>>>> debugging test failures. >>>>> >>>> >>>> We can avoid it in travis using env matrix. >>>> >>>> Currently we run "make check" which run all the the tox envs >>>> (e.g. storage-py27,storage-py36) regardless of the build type. >>>> This is good >>>> for manual usage when you don't know which python version is >>>> available >>>> on a developer machine. For example if I have python 3.7 >>>> installed, maybe >>>> I like to test. >>>> >>>> We can change this so we will test only the *-py27 on centos, and >>>> both >>>> *-py27 and *-py36 on Fedora. >>>> >>>> We can do the same in ovirt CI but it will be harder, we don't >>>> have a declerative >>>> way to configure this. >>>> >>> >>> Fixed all builds using --enable-python3: >>> https://gerrit.ovirt.org/#/c/92847/ >>> >> >> Here is an example from CentOS build - no false errors. >> >> ___________________________________ summary ____________________________________ >> tests: commands succeeded >> storage-py27: commands succeeded >> lib-py27: commands succeeded >> ERROR: network-py27: commands failed >> virt-py27: commands succeeded >> make: *** [tests] Error 1 >> make: *** Waiting for unfinished jobs.... >> ___________________________________ summary ____________________________________ >> pylint: commands succeeded >> congratulations :) >> >> >> >> >>> >>> Nir >>> >> >

Daniel Belenky

6:59 a.m.

On Thu, Jul 5, 2018 at 17:56 Nir Soffer <nsoffer@redhat.com> wrote:

...

On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer <nsoffer@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 1:00 PM Dan Kenigsberg <danken@redhat.com> wrote:

...
On Wed, Jul 4, 2018 at 12:48 PM, Nir Soffer <nsoffer@redhat.com>

wrote:

...
...
Dan, travis build still fail when renaming coverage file even after your last patch.

...........................SS.SS.................................................................................................................................................................SS..................................................S.S................................S................................SS.....SS............................................S...............SSS...S.....S.............................................S................................................................SSS............SSSS..SSSSSSSSS.SS..................................................................................................................................................................

...

...
...
...
Ran 1267 tests in 99.239s OK (SKIP=63) [ -n "$NOSE_WITH_COVERAGE" ] && mv .coverage .coverage-nose-py2 make[1]: *** [check] Error 1 make[1]: Leaving directory `/vdsm/tests' ERROR: InvocationError: '/usr/bin/make -C tests check'

https://travis-ci.org/oVirt/vdsm/jobs/399932012

Do you have any idea what is wrong there?

Why we don't have any error message from the failed command?

No idea, nothing pops to mind. We can revert to the sillier [ -f .coverage ] condition instead of understanding (yeah, this feels dirty)

Thanks, your patch (https://gerrit.ovirt.org/#/c/92813/) fixed this failure.

Now we have failures for the pywatch_test, and some network tests. Can someone from network look at this? https://travis-ci.org/nirs/vdsm/builds/400204807

https://travis-ci.org/nirs/vdsm/jobs/400204808 shows

ConfigNetworkError: (21, 'Executing commands failed: ovs-vsctl: cannot create a bridge named vdsmbr_test because a bridge named vdsmbr_test already exists')

which I thought was limited to dirty ovirt-ci jenkins slaves. Any idea why it shows here?

Maybe one failed test leave dirty host to the next test?

...
py-watch seems to be failing due to missing gdb on the travis image

...
cmdutils.py 151 DEBUG ./py-watch 0.1 sleep 10 (cwd None) cmdutils.py 159 DEBUG FAILED: <err> = 'Traceback (most recent call last):\n File "./py-watch", line 60, in <module>\n dump_trace(watched_proc)\n File "./py-watch", line 32, in dump_trace\n \'thread apply all py-bt\'])\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 575, in call\n p = Popen(*popenargs, **kwargs)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 822, in __init__\n restore_signals, start_new_session)\n File "/usr/lib64/python2.7/site-packages/subprocess32.py", line 1567, in _execute_child\n raise child_exception_type(errno_num, err_msg)\nOSError: [Errno 2] No such file or directory: \'gdb\'\n'; <rc> = 1

Cool, easy fix.

...
Nir, could you remind me what is "ERROR: InterpreterNotFound: python3.6" and how can we avoid it? it keeps distracting during debugging test failures.

We can avoid it in travis using env matrix.

Currently we run "make check" which run all the the tox envs (e.g. storage-py27,storage-py36) regardless of the build type. This is good for manual usage when you don't know which python version is available on a developer machine. For example if I have python 3.7 installed, maybe I like to test.

We can change this so we will test only the *-py27 on centos, and both *-py27 and *-py36 on Fedora.

We can do the same in ovirt CI but it will be harder, we don't have a declerative way to configure this.

This behavior could also be achieved in STDCI V1 relatively easily by having different scripts and configurations for CentOS and Fedora. I’d be glad to help to setup such configuration if needed. BTW, we actually do support a YAML configuration for exactly such cases (STDCI V2). We are currently proactively migrating smaller oVirt projects (left VDSM and engine to be last).

...

Nir _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UKVQVCL2M6NE5V...

-- Daniel Belenky RHV DevOps

2588

Age (days ago)

2592

Last active (days ago)

List overview

Download

18 comments

5 participants

participants (5)

Dan Kenigsberg
Daniel Belenky
Edward Haas
Greg Sheremeta
Nir Soffer

[VDSM] Travis builds still fail on .coverage rename

tags

participants (5)