[ovirt-devel] ovirt master system tests fail

Daniel Belenky dbelenky at redhat.com
Thu Jan 12 12:12:03 UTC 2017


Hi all,

test-repo ovirt experimental master job fails, and it seems that there is
an issue with 'add_host' phase under the '*bootstrap*' suite.
>From the logs, it seems that the suite was unable to fire up the host /
something is wrong with host

<error type="exceptions.RuntimeError" message="Host
lago-basic-suite-master-host1 is in non operational state
-------------------- >> begin captured logging << --------------------
lago.ssh: DEBUG: start task Get ssh client for
lago-basic-suite-master-host0 lago.ssh: DEBUG: Still got 100 tries for
lago-basic-suite-master-host0 lago.ssh: DEBUG: end task Get ssh client for
lago-basic-suite-master-host0 lago.ssh: DEBUG: Running aab0eff8 on
lago-basic-suite-master-host0: yum install -y iptables lago.ssh: DEBUG:
Command aab0eff8 on lago-basic-suite-master-host0 returned with 0 lago.ssh:
DEBUG: Command aab0eff8 on lago-basic-suite-master-host0 output: Loaded
plugins: fastestmirror Loading mirror speeds from cached hostfile * base:
centos.host-engine.com * extras: linux.mirrors.es.net * updates:
mirror.n5tech.com Package iptables-1.4.21-17.el7.x86_64 already installed
and latest version Nothing to do lago.ssh: DEBUG: start task Get ssh client
for lago-basic-suite-master-host1 lago.ssh: DEBUG: Still got 100 tries for
lago-basic-suite-master-host1 lago.ssh: DEBUG: end task Get ssh client for
lago-basic-suite-master-host1 lago.ssh: DEBUG: Running ab5c94f2 on
lago-basic-suite-master-host1: yum install -y iptables lago.ssh: DEBUG:
Command ab5c94f2 on lago-basic-suite-master-host1 returned with 0 lago.ssh:
DEBUG: Command ab5c94f2 on lago-basic-suite-master-host1 output: Loaded
plugins: fastestmirror Loading mirror speeds from cached hostfile * base:
mirror.n5tech.com * extras: ftp.osuosl.org * updates: mirrors.usc.edu
Package iptables-1.4.21-17.el7.x86_64 already installed and latest version
Nothing to do ovirtlago.testlib: ERROR: * Unhandled exception in <function
_host_is_up at 0x322e938> Traceback (most recent call last): File
"/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 217, in
assert_equals_within res = func() File
"/home/jenkins/workspace/test-repo_ovirt_experimental_master/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py",
line 162, in _host_is_up raise RuntimeError('Host %s is in non operational
state' % host.name()) RuntimeError: Host lago-basic-suite-master-host1 is
in non operational state --------------------- >> end captured logging <<
---------------------">


>From the engine.log, I found a timeout in the rpc call (but this error is
seen on jobs that success too, so might not be relevant(?))

2017-01-12 05:49:53,383-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
(org.ovirt.thread.pool-7-thread-2) [76b0383f] Command
'PollVDSCommand(HostName = lago-basic-suite-master-host1,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='40eb11ba-e6ac-478a-b8b1-73b7892ace65'})' execution failed:
VDSGenericException: VDSNetworkException: Timeout during rpc call
2017-01-12 05:49:53,383-05 DEBUG
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
(org.ovirt.thread.pool-7-thread-2) [76b0383f] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Timeout during rpc call

... (the full error is very long, so I wont paste it here, its in the*
engine.log*)

2017-01-12 05:49:58,291-05 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand]
(org.ovirt.thread.pool-7-thread-1) [30b2ca77] Timeout waiting for VDSM
response: Internal timeout occured



In the host's vdsm.log, there are some errors too:

2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache]
looking for unfetched domain 380623d8-1e85-4831-9048-3d05932f3d3a
(sdc:151)
2017-01-12 05:51:48,336 ERROR (jsonrpc/0) [storage.StorageDomainCache]
looking for domain 380623d8-1e85-4831-9048-3d05932f3d3a (sdc:168)
2017-01-12 05:51:48,395 WARN  (jsonrpc/0) [storage.LVM] lvm vgs
failed: 5 [] ['  WARNING: Not using lvmetad because config setting
use_lvmetad=0.', '  WARNING: To avoid corruption, rescan devices to
make changes visible (pvscan --cache).', '  Volume group
"380623d8-1e85-4831-9048-3d05932f3d3a" not found', '  Cannot process
volume group 380623d8-1e85-4831-9048-3d05932f3d3a'] (lvm:377)
2017-01-12 05:51:48,398 ERROR (jsonrpc/0) [storage.StorageDomainCache]
domain 380623d8-1e85-4831-9048-3d05932f3d3a not found (sdc:157)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
(u'380623d8-1e85-4831-9048-3d05932f3d3a',)


and

2017-01-12 05:53:45,375 ERROR (JsonRpc (StompReactor))
[vds.dispatcher] SSL error receiving from
<yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 43814, 0, 0) at
0x235a2d8>: unexpected eof (betterAsyncore:119)


Link to Jenkins
<http://jenkins.ovirt.org/view/experimental%20jobs/job/test-repo_ovirt_experimental_master/4693/artifact/exported-artifacts/basic_suite_master.sh-el7/exported-artifacts/>

Can someone please take a look?

Thanks,


*Daniel Belenky*

*RHV DevOps*

*Red Hat Israel*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170112/f08d8e3d/attachment.html>


More information about the Devel mailing list