
On Fri, Dec 23, 2016 at 6:20 PM, Barak Korren <bkorren@redhat.com> wrote:
On 22 December 2016 at 21:56, Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Dec 22, 2016 at 9:12 PM, Fred Rolland <frolland@redhat.com> wrote:
SuperVdsm fails to starts :
MainThread::ERROR::2016-12-22 12:42:08,699::supervdsmServer::317::SuperVdsm.Server::(main) Could not start Super Vdsm Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer", line 297, in main server = manager.get_server() File "/usr/lib64/python2.7/multiprocessing/managers.py", line 493, in get_server self._authkey, self._serializer) File "/usr/lib64/python2.7/multiprocessing/managers.py", line 162, in __init__ self.listener = Listener(address=address, backlog=16) File "/usr/lib64/python2.7/multiprocessing/connection.py", line 136, in __init__ self._listener = SocketListener(address, family, backlog) File "/usr/lib64/python2.7/multiprocessing/connection.py", line 260, in __init__ self._socket.bind(address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 2] No such file or directory
On Thu, Dec 22, 2016 at 7:54 PM, Barak Korren <bkorren@redhat.com> wrote:
It hard to tell currently when did this start b/c we had so package issues that made the tests fail before reaching that point most of the day.
Since we currently have an issue in Lago with collecting AddHost logs (Hopefully we'll resolve this in the next release early next week), I`ve ran the tests locally and attached the bundle of generated logs to this message.
Included in the attached file are engine logs, host-deploy logs and VDSM logs for both test hosts.
From a quick look inside it seems the issue is with VDSM failing to start.
From host-deploy/ovirt-host-deploy-20161222124209-192.168.203.4-604a4799.log:
2016-12-22 12:42:05 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'start', 'vdsmd.service'), executable='None', cwd='None', env=None 2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start', 'vdsmd.service'), rc=1 2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stdout:
2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'vdsmd.service') stderr: A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
This means that one of the services vdsm depends on could not start.
2016-12-22 12:42:09 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-bUCuRxXXzU/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-bUCuRxXXzU/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 209, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-bUCuRxXXzU/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd'
This error is not very useful for anyone. What we need in otopi log is the output of journalctl -xe (suggested by systemctl).
Didi, can we collect this info when starting a service fail?
Barak, can you log in to the host with this error and collect the output?
By the time I looged in to the host, all IP addresses are gone (I'm guessing the setup process killed dhclient), so I'm having to work via the serial console)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 54:52:c0:a8:cb:02 brd ff:ff:ff:ff:ff:ff inet6 fe80::5652:c0ff:fea8:cb02/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 54:52:c0:a8:cc:02 brd ff:ff:ff:ff:ff:ff inet6 fe80::5652:c0ff:fea8:cc02/64 scope link valid_lft forever preferred_lft forever 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 54:52:c0:a8:cc:03 brd ff:ff:ff:ff:ff:ff inet6 fe80::5652:c0ff:fea8:cc03/64 scope link valid_lft forever preferred_lft forever 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 54:52:c0:a8:ca:02 brd ff:ff:ff:ff:ff:ff inet6 fe80::5652:c0ff:fea8:ca02/64 scope link valid_lft forever preferred_lft forever
Here is the interesting stuff I can gather from journalctl:
Dec 22 12:42:06 lago-basic-suite-master-host0 ovirt-imageio-daemon[5007]: Traceback (most recent call last): Dec 22 12:42:06 lago-basic-suite-master-host0 ovirt-imageio-daemon[5007]: File "/usr/bin/ovirt-imageio-daemon", line 14, in <module>
Thanks, Barak. My guess stays with Bug 1400003 - imageio fails during system startup as the culprit.