On Sun, Jul 8, 2018 at 1:42 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
On Sat, Jul 7, 2018 at 9:11 AM Edward Haas <ehaas(a)redhat.com>
wrote:
> On Sat, Jul 7, 2018 at 9:02 AM, Edward Haas <ehaas(a)redhat.com> wrote:
>
>>
>>
>> On Fri, Jul 6, 2018 at 9:16 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>>> On Fri, Jul 6, 2018 at 7:05 PM Edward Haas <ehaas(a)redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On 6 Jul 2018, at 18:41, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, 6 Jul 2018, 18:25 Edward Haas, <ehaas(a)redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 6 Jul 2018, at 14:35, Nir Soffer <nsoffer(a)redhat.com>
wrote:
>>>>>
>>>>> On Fri, Jul 6, 2018 at 1:12 PM Edward Haas <ehaas(a)redhat.com>
wrote:
>>>>>
>>>>>> I do not know if it is relevant or not, but the tests that
travis
>>>>>> runs for master are taken from the 4.2 branch.
>>>>>> OVS tests are now running using pytest.
>>>>>>
>>>>>
>>>>> What do you mean by "taken from 4.2 branch"?
>>>>>
>>>>>
>>>>> I mean that the branch checked out is 4.2 and not master. It even
>>>>> says so on the console output.
>>>>>
>>>>
>>>> Can you share the url of that build?
>>>>
>>>>
>>>> I just clicked the icon on the vdsm repo:
https://travis-ci.org/
>>>> oVirt/vdsm
>>>>
>>>
>>> This is indeed 4.2 build. Any commit in github is tested in travis.
>>> We would like to fix also the 4.2 builds, but first we need to fix
>>> master builds.
>>>
>>> You can see here that master build fail:
>>>
https://travis-ci.org/oVirt/vdsm/builds
>>>
>>> Since we added gbd and python-debuginfo:
>>>
https://travis-ci.org/oVirt/vdsm/builds/400644077
>>>
>>> - centos build fail (network-py27)
>>>
https://travis-ci.org/oVirt/vdsm/jobs/400644079
>>>
>>> - fedora 28 build pass
>>>
https://travis-ci.org/oVirt/vdsm/jobs/400644081
>>>
>>> - fedora rawhide fail because we cannot rebuild the image,
>>> python-libblokdev is missing in rawhide.
>>>
https://travis-ci.org/oVirt/vdsm/jobs/400644083
>>> See
https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/
>>> CDNETITY5RYOCQBIQQF2NUF6RAHGJRPW/
>>>
>>>
>>> I don't know anything about these tests, but this failure looks like:
>>>
>>> 1. first test has a timeout
>>> 2. first test cleanup did not run because the cleanup code is not
>>> correct
>>> 3. second test fail because the first test did not clean up
>>>
>>> This looks like real issue in the code.
>>>
>>
>> This is the same problem we had on oVirt CI, there are linux bridges on
>> the node.
>> I have posted a patch to fail earlier and how the real problem:
>>
https://gerrit.ovirt.org/#/c/92867/
>> The travis-ci run for it is here:
https://travis-ci.org/EdDev/
>> vdsm/jobs/401143906
>> This is the problem:
>> cmdutils.py 151 DEBUG /usr/share/openvswitch/scripts/ovs-ctl
>> --system-id=random start (cwd None)
>> cmdutils.py 159 DEBUG FAILED: <err> = 'rmmod: ERROR: Module bridge is
in
>> use by: br_netfilter\n'; <rc> = 1
>>
>
Who is using rmmod?
The ovs service is trying to load the ovs kmod, for doing so it needs to
take down the bridge one and reload it after the ovs one.
> Any idea who is creating the "br_netfilter" bridge? I guess this is
>> travis-ci related.
>>
>
Why do we care about br_netfilter? do we require a system without any
bridge?
Yes, in case ovs kmod has not been loaded in advance.
> Actually, this may be Docker or some other package that is
> installed/setup on it.
> How can I run the docker with the tests locally to debug this?
>
Run this in vdsm root directory (copied from .travis.yml):
export DOCKER_IMAGE=ovirtorg/vdsm-test-centos
docker pull $DOCKER_IMAGE
docker run \
--env TRAVIS_CI=1 \
--privileged \
--rm \
-it \
-v `pwd`:/vdsm:Z \
$DOCKER_IMAGE \
bash -c "cd /vdsm && ./autogen.sh --system && make &&
make --jobs=2
check"
Since this is privileged container, you probably want to run this inside a
vm.
OK, will try. But I think the kmod is up to the machine the docker runs in,
so in this case it is the travis slave.
> We run "make check" both in travis (.travis.yml) and ovirt ci
>>>>> (automation/check-patch.sh)
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 6, 2018 at 12:51 AM, Nir Soffer
<nsoffer(a)redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 5, 2018 at 10:55 PM Nir Soffer
<nsoffer(a)redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Jul 5, 2018 at 5:53 PM Nir Soffer
<nsoffer(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Jul 5, 2018 at 5:43 PM Dan Kenigsberg
<danken(a)redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Thu, Jul 5, 2018 at 2:52 AM, Nir Soffer
<nsoffer(a)redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > On Wed, Jul 4, 2018 at 1:00 PM Dan
Kenigsberg <
>>>>>>>>>> danken(a)redhat.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> On Wed, Jul 4, 2018 at 12:48 PM, Nir
Soffer <
>>>>>>>>>> nsoffer(a)redhat.com> wrote:
>>>>>>>>>> >> > Dan, travis build still fail when
renaming coverage file
>>>>>>>>>> even after
>>>>>>>>>> >> > your last patch.
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > ...........................SS.
>>>>>>>>>>
SS..........................................................
>>>>>>>>>>
............................................................
>>>>>>>>>>
...........................................SS...............
>>>>>>>>>>
...................................S.S......................
>>>>>>>>>>
..........S................................SS.....SS........
>>>>>>>>>>
....................................S...............SSS...S.
>>>>>>>>>>
....S.............................................S.........
>>>>>>>>>>
.......................................................SSS..
>>>>>>>>>>
..........SSSS..SSSSSSSSS.SS................................
>>>>>>>>>>
............................................................
>>>>>>>>>>
............................................................
>>>>>>>>>> ..........
>>>>>>>>>> >> > ------------------------------
>>>>>>>>>> ----------------------------------------
>>>>>>>>>> >> > Ran 1267 tests in 99.239s
>>>>>>>>>> >> > OK (SKIP=63)
>>>>>>>>>> >> > [ -n
"$NOSE_WITH_COVERAGE" ] && mv .coverage
>>>>>>>>>> .coverage-nose-py2
>>>>>>>>>> >> > make[1]: *** [check] Error 1
>>>>>>>>>> >> > make[1]: Leaving directory
`/vdsm/tests'
>>>>>>>>>> >> > ERROR: InvocationError:
'/usr/bin/make -C tests check'
>>>>>>>>>> >> >
>>>>>>>>>> >> >
https://travis-ci.org/oVirt/vdsm/jobs/399932012
>>>>>>>>>> >> >
>>>>>>>>>> >> > Do you have any idea what is wrong
there?
>>>>>>>>>> >> >
>>>>>>>>>> >> > Why we don't have any error
message from the failed command?
>>>>>>>>>> >>
>>>>>>>>>> >> No idea, nothing pops to mind.
>>>>>>>>>> >> We can revert to the sillier [ -f
.coverage ] condition
>>>>>>>>>> instead of
>>>>>>>>>> >> understanding (yeah, this feels dirty)
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Thanks, your patch
(
https://gerrit.ovirt.org/#/c/92813/)
>>>>>>>>>> fixed this
>>>>>>>>>> > failure.
>>>>>>>>>> >
>>>>>>>>>> > Now we have failures for the pywatch_test,
and some network
>>>>>>>>>> > tests. Can someone from network look at
this?
>>>>>>>>>> >
https://travis-ci.org/nirs/vdsm/builds/400204807
>>>>>>>>>>
>>>>>>>>>>
https://travis-ci.org/nirs/vdsm/jobs/400204808
shows
>>>>>>>>>>
>>>>>>>>>> ConfigNetworkError: (21,
'Executing commands
>>>>>>>>>> failed:
>>>>>>>>>> ovs-vsctl: cannot create a bridge named
vdsmbr_test because a
>>>>>>>>>> bridge
>>>>>>>>>> named vdsmbr_test already exists')
>>>>>>>>>>
>>>>>>>>>> which I thought was limited to dirty ovirt-ci
jenkins slaves.
>>>>>>>>>> Any idea
>>>>>>>>>> why it shows here?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Maybe one failed test leave dirty host to the next
test?
>>>>>>>>>
>>>>>>>>
>>>>>>> network tests fail now only on CentOS now.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> py-watch seems to be failing due to missing gdb
on the travis
>>>>>>>>>> image
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> cmdutils.py 151 DEBUG
./py-watch 0.1 sleep 10
>>>>>>>>>> (cwd None)
>>>>>>>>>> cmdutils.py 159 DEBUG FAILED:
<err> =
>>>>>>>>>> 'Traceback
>>>>>>>>>> (most recent call last):\n File
"./py-watch", line 60, in
>>>>>>>>>> <module>\n
>>>>>>>>>> dump_trace(watched_proc)\n File
"./py-watch", line 32, in
>>>>>>>>>> dump_trace\n \'thread apply all
py-bt\'])\n File
>>>>>>>>>>
"/usr/lib64/python2.7/site-packages/subprocess32.py", line 575,
>>>>>>>>>> in
>>>>>>>>>> call\n p = Popen(*popenargs, **kwargs)\n
File
>>>>>>>>>>
"/usr/lib64/python2.7/site-packages/subprocess32.py", line 822,
>>>>>>>>>> in
>>>>>>>>>> __init__\n restore_signals,
start_new_session)\n File
>>>>>>>>>>
"/usr/lib64/python2.7/site-packages/subprocess32.py", line
>>>>>>>>>> 1567, in
>>>>>>>>>> _execute_child\n raise
child_exception_type(errno_num,
>>>>>>>>>> err_msg)\nOSError: [Errno 2] No such file or
directory:
>>>>>>>>>> \'gdb\'\n';
>>>>>>>>>> <rc> = 1
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cool, easy fix.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Fixed by
https://gerrit.ovirt.org/#/c/92846/
>>>>>>>>
>>>>>>>
>>>>>>> Fedora 28 build is green with this change:
>>>>>>>
https://travis-ci.org/nirs/vdsm/jobs/400549561
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ___________________________________ summary
____________________________________
>>>>>>> tests: commands succeeded
>>>>>>> storage-py27: commands succeeded
>>>>>>> storage-py36: commands succeeded
>>>>>>> lib-py27: commands succeeded
>>>>>>> lib-py36: commands succeeded
>>>>>>> network-py27: commands succeeded
>>>>>>> network-py36: commands succeeded
>>>>>>> virt-py27: commands succeeded
>>>>>>> virt-py36: commands succeeded
>>>>>>> congratulations :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Nir, could you remind me what is "ERROR:
InterpreterNotFound:
>>>>>>>>>> python3.6" and how can we avoid it? it keeps
distracting during
>>>>>>>>>> debugging test failures.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We can avoid it in travis using env matrix.
>>>>>>>>>
>>>>>>>>> Currently we run "make check" which run all
the the tox envs
>>>>>>>>> (e.g. storage-py27,storage-py36) regardless of the
build type.
>>>>>>>>> This is good
>>>>>>>>> for manual usage when you don't know which python
version is
>>>>>>>>> available
>>>>>>>>> on a developer machine. For example if I have python
3.7
>>>>>>>>> installed, maybe
>>>>>>>>> I like to test.
>>>>>>>>>
>>>>>>>>> We can change this so we will test only the *-py27 on
centos, and
>>>>>>>>> both
>>>>>>>>> *-py27 and *-py36 on Fedora.
>>>>>>>>>
>>>>>>>>> We can do the same in ovirt CI but it will be harder,
we don't
>>>>>>>>> have a declerative
>>>>>>>>> way to configure this.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Fixed all builds using --enable-python3:
>>>>>>>>
https://gerrit.ovirt.org/#/c/92847/
>>>>>>>>
>>>>>>>
>>>>>>> Here is an example from CentOS build - no false errors.
>>>>>>>
>>>>>>> ___________________________________ summary
____________________________________
>>>>>>> tests: commands succeeded
>>>>>>> storage-py27: commands succeeded
>>>>>>> lib-py27: commands succeeded
>>>>>>> ERROR: network-py27: commands failed
>>>>>>> virt-py27: commands succeeded
>>>>>>> make: *** [tests] Error 1
>>>>>>> make: *** Waiting for unfinished jobs....
>>>>>>> ___________________________________ summary
____________________________________
>>>>>>> pylint: commands succeeded
>>>>>>> congratulations :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Nir
>>>>>>>>
>>>>>>>
>>>>>>
>>