<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar David <<a href="mailto:didi@redhat.com">didi@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Nov 6, 2017 at 1:57 PM, Dafna Ron <<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>> wrote:<br>
> adding Didi.<br>
><br>
><br>
> On 11/06/2017 11:51 AM, Ala Hino wrote:<br>
><br>
> Suspected patch (<a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a>) is about cold merge<br>
> and has nothing to do with host deploy.<br>
><br>
> On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron <<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> We failed test 002_bootstrap.verify_add_hosts<br>
>><br>
>> I can see we only tried to install one of the hosts (host-0) and failed.<br>
>> the second host has no log which means we did not try to deploy it.<br>
>><br>
>> The error suggests that we ovirt-imageio-daemon failed to start. However,<br>
>> there is another message that I think should be addressed about conflicting<br>
>> vdsm and libvirt configurations.<br>
>><br>
>> Link to suspected patches: <a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a><br>
>><br>
>><br>
>> Link to Job:<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/</a><br>
>><br>
>><br>
>> Link to all logs:<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/</a><br>
>><br>
>><br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log</a><br>
>><br>
>><br>
>> (Relevant) error snippet from the log:<br>
>><br>
>> <error><br>
>><br>
>> \<br>
>><br>
>> 2017-11-06 02:56:46,526-0500 DEBUG<br>
>> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921<br>
>> execute-output: ('/usr/bin/vdsm-tool', 'configure', '--force') stdout:<br>
>><br>
>> Checking configuration status...<br>
>><br>
>> abrt is not configured for vdsm<br>
>> WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on<br>
>> vdsm configuration<br>
>> lvm requires configuration<br>
>> libvirt is not configured for vdsm yet<br>
>> FAILED: conflicting vdsm and libvirt-qemu tls configuration.<br>
>> vdsm.conf with ssl=True requires the following changes:<br>
>> libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1<br>
>> qemu.conf: spice_tls=1.<br>
>> multipath requires configuration<br>
>><br>
>><br>
>> 2017-11-06 02:56:47,551-0500 DEBUG otopi.plugins.otopi.services.systemd<br>
>> plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start',<br>
>> 'ovirt-imageio-daemon.service') stderr:<br>
>> Job for ovirt-imageio-daemon.service failed because the control process<br>
>> exited with error code. See "systemctl status ovirt-imageio-daemon.service"<br>
>> and "journalctl -xe" for details.<br>
>><br>
>> 2017-11-06 02:56:47,552-0500 DEBUG otopi.context<br>
>> context._executeMethod:143 method exception<br>
>> Traceback (most recent call last):<br>
>> File "/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py", line 133, in<br>
>> _executeMethod<br>
>> method['method']()<br>
>> File<br>
>> "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",<br>
>> line 179, in _start<br>
>> self.services.state('ovirt-imageio-daemon', True)<br>
>> File "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py",<br>
>> line 141, in state<br>
>> service=name,<br>
>> RuntimeError: Failed to start service 'ovirt-imageio-daemon'<br>
>> 2017-11-06 02:56:47,553-0500 ERROR otopi.context<br>
>> context._executeMethod:152 Failed to execute stage 'Closing up': Failed to<br>
>> start service 'ovirt-imageio-daemon'<br>
<br>
In /var/log/messages of the host [1], there is:<br>
<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 systemd: Starting oVirt<br>
ImageIO Daemon...<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 python: detected<br>
unhandled Python exception in '/usr/bin/ovirt-imageio-daemon'<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 python: can't<br>
communicate with ABRT daemon, is it running? [Errno 2] No such file or<br>
directory<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
Traceback (most recent call last):<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/bin/ovirt-imageio-daemon", line 14, in <module><br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
server.main(sys.argv)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",<br>
line 57, in main<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
start(config)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",<br>
line 85, in start<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
WSGIRequestHandler)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
self.server_bind()<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/wsgiref/simple_server.py", line 48, in<br>
server_bind<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
HTTPServer.server_bind(self)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in<br>
server_bind<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
SocketServer.TCPServer.server_bind(self)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
self.socket.bind(self.server_address)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/socket.py", line 224, in meth<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
return getattr(self._sock,name)(*args)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
socket.error: [Errno 98] Address already in use<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 systemd:<br>
ovirt-imageio-daemon.service: main process exited, code=exited,<br>
status=1/FAILURE<br>
<br>
ovirt-host-deploy stops it, and immediately tries to start it:<br>
<br>
2017-11-06 02:56:47,203-0500 DEBUG<br>
otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
execute-result: ('/usr/bin/systemctl', 'stop',<br>
'ovirt-imageio-daemon.service'), rc=0<br>
...<br>
2017-11-06 02:56:47,550-0500 DEBUG<br>
otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
execute-result: ('/usr/bin/systemctl', 'start',<br>
'ovirt-imageio-daemon.service'), rc=1<br>
<br>
Also, imageio-daemon's log [2] looks a bit weird to me - it has 5<br>
'Starting' lines, but no<br>
other lines I would have expected to have, reading its source, and as<br>
I can see in another<br>
run, that did finish successfully [3].<br>
<br>
Adding Idan, but not sure it's a bug in the daemon.<br>
<br>
[1] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/</a><br>
<br>
[2] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a><br>
<br>
[3] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a></blockquote><div><br></div><div>Looks like the daemon is already running on this host - maybe host deploy</div><div>is trying to start the service twice?</div><div><br></div><div>We did not change the startup code couple of years, so this must be some</div><div>change in another component.</div><div><br></div><div>This patch will make it easier to detect future issues, logging any error</div><div>to the daemon log during startup:</div><div><a href="https://gerrit.ovirt.org/83670/">https://gerrit.ovirt.org/83670/</a><br></div><div><br></div><div>Nir</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
>><br>
>> </error><br>
>><br>
>><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Didi<br>
_______________________________________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>
</blockquote></div></div>