Re: [ovirt-devel] oVirt system tests currently failing to AddHost on master

23 Dec 2016

      On Fri, Dec 23, 2016 at 6:20 PM, Barak Korren <bkorren@redhat.com> wrote:
...
On 22 December 2016 at 21:56, Nir Soffer <nsoffer@redhat.com> wrote:
...
On Thu, Dec 22, 2016 at 9:12 PM, Fred Rolland <frolland@redhat.com> wrote:
...
SuperVdsm fails to starts :
MainThread::ERROR::2016-12-22
12:42:08,699::supervdsmServer::317::SuperVdsm.Server::(main) Could not start
Super Vdsm
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 297, in main
    server = manager.get_server()
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 493, in
get_server
    self._authkey, self._serializer)
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 162, in
__init__
    self.listener = Listener(address=address, backlog=16)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 136, in
__init__
    self._listener = SocketListener(address, family, backlog)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 260, in
__init__
    self._socket.bind(address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
On Thu, Dec 22, 2016 at 7:54 PM, Barak Korren <bkorren@redhat.com> wrote:
...
It hard to tell currently when did this start b/c we had so package
issues that made the tests fail before reaching that point most of the
day.
Since we currently have an issue in Lago with collecting AddHost logs
(Hopefully we'll resolve this in the next release early next week),
I`ve ran the tests locally and attached the bundle of generated logs
to this message.
Included in the attached file are engine logs, host-deploy logs and
VDSM logs for both test hosts.
From a quick look inside it seems the issue is with VDSM failing to start.
From host-deploy/ovirt-host-deploy-20161222124209-192.168.203.4-604a4799.log:
2016-12-22 12:42:05 DEBUG otopi.plugins.otopi.services.systemd
plugin.executeRaw:813 execute: ('/bin/systemctl', 'start',
'vdsmd.service'), executable='None', cwd='None', env=None
2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd
plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start',
'vdsmd.service'), rc=1
2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd
plugin.execute:921 execute-output: ('/bin/systemctl', 'start',
'vdsmd.service') stdout:
2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd
plugin.execute:926 execute-output: ('/bin/systemctl', 'start',
'vdsmd.service') stderr:
A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
This means that one of the services vdsm depends on could not start.
2016-12-22 12:42:09 DEBUG otopi.context context._executeMethod:142
method exception
Traceback (most recent call last):
  File "/tmp/ovirt-bUCuRxXXzU/pythonlib/otopi/context.py", line 132,
in _executeMethod
    method['method']()
  File "/tmp/ovirt-bUCuRxXXzU/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
line 209, in _start
    self.services.state('vdsmd', True)
  File "/tmp/ovirt-bUCuRxXXzU/otopi-plugins/otopi/services/systemd.py",
line 141, in state
    service=name,
RuntimeError: Failed to start service 'vdsmd'
This error is not very useful for anyone. What we need in otopi log is
the output of
journalctl -xe (suggested by systemctl).
Didi, can we collect this info when starting a service fail?
Barak, can you log in to the host with this error and collect the output?
By the time I looged in to the host, all IP addresses are gone (I'm
guessing the setup process killed dhclient), so I'm having to work via
the serial console)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
    link/ether 54:52:c0:a8:cb:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5652:c0ff:fea8:cb02/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
    link/ether 54:52:c0:a8:cc:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5652:c0ff:fea8:cc02/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
    link/ether 54:52:c0:a8:cc:03 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5652:c0ff:fea8:cc03/64 scope link
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
    link/ether 54:52:c0:a8:ca:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5652:c0ff:fea8:ca02/64 scope link
       valid_lft forever preferred_lft forever
Here is the interesting stuff I can gather from journalctl:
Dec 22 12:42:06 lago-basic-suite-master-host0
ovirt-imageio-daemon[5007]: Traceback (most recent call last):
Dec 22 12:42:06 lago-basic-suite-master-host0
ovirt-imageio-daemon[5007]: File "/usr/bin/ovirt-imageio-daemon", line
14, in <module>
Thanks, Barak.

My guess stays with

Bug 1400003 - imageio fails during system startup

as the culprit.

Re: [ovirt-devel] oVirt system tests currently failing to AddHost on master

Dan Kenigsberg