On Fri, Dec 23, 2016 at 6:20 PM, Barak Korren <bkorren(a)redhat.com> wrote:
On 22 December 2016 at 21:56, Nir Soffer <nsoffer(a)redhat.com>
wrote:
> On Thu, Dec 22, 2016 at 9:12 PM, Fred Rolland <frolland(a)redhat.com> wrote:
>> SuperVdsm fails to starts :
>>
>> MainThread::ERROR::2016-12-22
>> 12:42:08,699::supervdsmServer::317::SuperVdsm.Server::(main) Could not start
>> Super Vdsm
>> Traceback (most recent call last):
>> File "/usr/share/vdsm/supervdsmServer", line 297, in main
>> server = manager.get_server()
>> File "/usr/lib64/python2.7/multiprocessing/managers.py", line 493,
in
>> get_server
>> self._authkey, self._serializer)
>> File "/usr/lib64/python2.7/multiprocessing/managers.py", line 162,
in
>> __init__
>> self.listener = Listener(address=address, backlog=16)
>> File "/usr/lib64/python2.7/multiprocessing/connection.py", line 136,
in
>> __init__
>> self._listener = SocketListener(address, family, backlog)
>> File "/usr/lib64/python2.7/multiprocessing/connection.py", line 260,
in
>> __init__
>> self._socket.bind(address)
>> File "/usr/lib64/python2.7/socket.py", line 224, in meth
>> return getattr(self._sock,name)(*args)
>> error: [Errno 2] No such file or directory
>>
>>
>> On Thu, Dec 22, 2016 at 7:54 PM, Barak Korren <bkorren(a)redhat.com> wrote:
>>>
>>> It hard to tell currently when did this start b/c we had so package
>>> issues that made the tests fail before reaching that point most of the
>>> day.
>>>
>>> Since we currently have an issue in Lago with collecting AddHost logs
>>> (Hopefully we'll resolve this in the next release early next week),
>>> I`ve ran the tests locally and attached the bundle of generated logs
>>> to this message.
>>>
>>> Included in the attached file are engine logs, host-deploy logs and
>>> VDSM logs for both test hosts.
>>>
>>> From a quick look inside it seems the issue is with VDSM failing to start.
>
> From host-deploy/ovirt-host-deploy-20161222124209-192.168.203.4-604a4799.log:
>
> 2016-12-22 12:42:05 DEBUG otopi.plugins.otopi.services.systemd
> plugin.executeRaw:813 execute: ('/bin/systemctl', 'start',
> 'vdsmd.service'), executable='None', cwd='None', env=None
> 2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd
> plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start',
> 'vdsmd.service'), rc=1
> 2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd
> plugin.execute:921 execute-output: ('/bin/systemctl', 'start',
> 'vdsmd.service') stdout:
>
>
> 2016-12-22 12:42:09 DEBUG otopi.plugins.otopi.services.systemd
> plugin.execute:926 execute-output: ('/bin/systemctl', 'start',
> 'vdsmd.service') stderr:
> A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.
>
> This means that one of the services vdsm depends on could not start.
>
> 2016-12-22 12:42:09 DEBUG otopi.context context._executeMethod:142
> method exception
> Traceback (most recent call last):
> File "/tmp/ovirt-bUCuRxXXzU/pythonlib/otopi/context.py", line 132,
> in _executeMethod
> method['method']()
> File
"/tmp/ovirt-bUCuRxXXzU/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
> line 209, in _start
> self.services.state('vdsmd', True)
> File "/tmp/ovirt-bUCuRxXXzU/otopi-plugins/otopi/services/systemd.py",
> line 141, in state
> service=name,
> RuntimeError: Failed to start service 'vdsmd'
>
> This error is not very useful for anyone. What we need in otopi log is
> the output of
> journalctl -xe (suggested by systemctl).
>
> Didi, can we collect this info when starting a service fail?
>
> Barak, can you log in to the host with this error and collect the output?
>
By the time I looged in to the host, all IP addresses are gone (I'm
guessing the setup process killed dhclient), so I'm having to work via
the serial console)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether 54:52:c0:a8:cb:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5652:c0ff:fea8:cb02/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether 54:52:c0:a8:cc:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5652:c0ff:fea8:cc02/64 scope link
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether 54:52:c0:a8:cc:03 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5652:c0ff:fea8:cc03/64 scope link
valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether 54:52:c0:a8:ca:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5652:c0ff:fea8:ca02/64 scope link
valid_lft forever preferred_lft forever
Here is the interesting stuff I can gather from journalctl:
Dec 22 12:42:06 lago-basic-suite-master-host0
ovirt-imageio-daemon[5007]: Traceback (most recent call last):
Dec 22 12:42:06 lago-basic-suite-master-host0
ovirt-imageio-daemon[5007]: File "/usr/bin/ovirt-imageio-daemon", line
14, in <module>
Thanks, Barak.
My guess stays with
Bug 1400003 - imageio fails during system startup
as the culprit.