<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar David &lt;<a href="mailto:didi@redhat.com">didi@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Nov 6, 2017 at 1:57 PM, Dafna Ron &lt;<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>&gt; wrote:<br>
&gt; adding Didi.<br>
&gt;<br>
&gt;<br>
&gt; On 11/06/2017 11:51 AM, Ala Hino wrote:<br>
&gt;<br>
&gt; Suspected patch (<a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a>) is about cold merge<br>
&gt; and has nothing to do with host deploy.<br>
&gt;<br>
&gt; On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron &lt;<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hi,<br>
&gt;&gt;<br>
&gt;&gt; We failed test 002_bootstrap.verify_add_hosts<br>
&gt;&gt;<br>
&gt;&gt; I can see we only tried to install one of the hosts (host-0) and failed.<br>
&gt;&gt; the second host has no log which means we did not try to deploy it.<br>
&gt;&gt;<br>
&gt;&gt; The error suggests that we ovirt-imageio-daemon failed to start. However,<br>
&gt;&gt; there is another message that I think should be addressed about conflicting<br>
&gt;&gt; vdsm and libvirt configurations.<br>
&gt;&gt;<br>
&gt;&gt; Link to suspected patches: <a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Link to Job:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/</a><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Link to all logs:<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/</a><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log</a><br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; (Relevant) error snippet from the log:<br>
&gt;&gt;<br>
&gt;&gt; &lt;error&gt;<br>
&gt;&gt;<br>
&gt;&gt; \<br>
&gt;&gt;<br>
&gt;&gt; 2017-11-06 02:56:46,526-0500 DEBUG<br>
&gt;&gt; otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921<br>
&gt;&gt; execute-output: (&#39;/usr/bin/vdsm-tool&#39;, &#39;configure&#39;, &#39;--force&#39;) stdout:<br>
&gt;&gt;<br>
&gt;&gt; Checking configuration status...<br>
&gt;&gt;<br>
&gt;&gt; abrt is not configured for vdsm<br>
&gt;&gt; WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on<br>
&gt;&gt; vdsm configuration<br>
&gt;&gt; lvm requires configuration<br>
&gt;&gt; libvirt is not configured for vdsm yet<br>
&gt;&gt; FAILED: conflicting vdsm and libvirt-qemu tls configuration.<br>
&gt;&gt; vdsm.conf with ssl=True requires the following changes:<br>
&gt;&gt; libvirtd.conf: listen_tcp=0, auth_tcp=&quot;sasl&quot;, listen_tls=1<br>
&gt;&gt; qemu.conf: spice_tls=1.<br>
&gt;&gt; multipath requires configuration<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; 2017-11-06 02:56:47,551-0500 DEBUG otopi.plugins.otopi.services.systemd<br>
&gt;&gt; plugin.execute:926 execute-output: (&#39;/usr/bin/systemctl&#39;, &#39;start&#39;,<br>
&gt;&gt; &#39;ovirt-imageio-daemon.service&#39;) stderr:<br>
&gt;&gt; Job for ovirt-imageio-daemon.service failed because the control process<br>
&gt;&gt; exited with error code. See &quot;systemctl status ovirt-imageio-daemon.service&quot;<br>
&gt;&gt; and &quot;journalctl -xe&quot; for details.<br>
&gt;&gt;<br>
&gt;&gt; 2017-11-06 02:56:47,552-0500 DEBUG otopi.context<br>
&gt;&gt; context._executeMethod:143 method exception<br>
&gt;&gt; Traceback (most recent call last):<br>
&gt;&gt;   File &quot;/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py&quot;, line 133, in<br>
&gt;&gt; _executeMethod<br>
&gt;&gt;     method[&#39;method&#39;]()<br>
&gt;&gt;   File<br>
&gt;&gt; &quot;/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py&quot;,<br>
&gt;&gt; line 179, in _start<br>
&gt;&gt;     self.services.state(&#39;ovirt-imageio-daemon&#39;, True)<br>
&gt;&gt;   File &quot;/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py&quot;,<br>
&gt;&gt; line 141, in state<br>
&gt;&gt;     service=name,<br>
&gt;&gt; RuntimeError: Failed to start service &#39;ovirt-imageio-daemon&#39;<br>
&gt;&gt; 2017-11-06 02:56:47,553-0500 ERROR otopi.context<br>
&gt;&gt; context._executeMethod:152 Failed to execute stage &#39;Closing up&#39;: Failed to<br>
&gt;&gt; start service &#39;ovirt-imageio-daemon&#39;<br>
<br>
In /var/log/messages of the host [1], there is:<br>
<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd: Starting oVirt<br>
ImageIO Daemon...<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 python: detected<br>
unhandled Python exception in &#39;/usr/bin/ovirt-imageio-daemon&#39;<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 python: can&#39;t<br>
communicate with ABRT daemon, is it running? [Errno 2] No such file or<br>
directory<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
Traceback (most recent call last):<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/bin/ovirt-imageio-daemon&quot;, line 14, in &lt;module&gt;<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
server.main(sys.argv)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py&quot;,<br>
line 57, in main<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
start(config)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py&quot;,<br>
line 85, in start<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
WSGIRequestHandler)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib64/python2.7/SocketServer.py&quot;, line 419, in __init__<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
self.server_bind()<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib64/python2.7/wsgiref/simple_server.py&quot;, line 48, in<br>
server_bind<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
HTTPServer.server_bind(self)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib64/python2.7/BaseHTTPServer.py&quot;, line 108, in<br>
server_bind<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
SocketServer.TCPServer.server_bind(self)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib64/python2.7/SocketServer.py&quot;, line 430, in server_bind<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
self.socket.bind(self.server_address)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
File &quot;/usr/lib64/python2.7/socket.py&quot;, line 224, in meth<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
return getattr(self._sock,name)(*args)<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:<br>
socket.error: [Errno 98] Address already in use<br>
Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:<br>
ovirt-imageio-daemon.service: main process exited, code=exited,<br>
status=1/FAILURE<br>
<br>
ovirt-host-deploy stops it, and immediately tries to start it:<br>
<br>
2017-11-06 02:56:47,203-0500 DEBUG<br>
otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
execute-result: (&#39;/usr/bin/systemctl&#39;, &#39;stop&#39;,<br>
&#39;ovirt-imageio-daemon.service&#39;), rc=0<br>
...<br>
2017-11-06 02:56:47,550-0500 DEBUG<br>
otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
execute-result: (&#39;/usr/bin/systemctl&#39;, &#39;start&#39;,<br>
&#39;ovirt-imageio-daemon.service&#39;), rc=1<br>
<br>
Also, imageio-daemon&#39;s log [2] looks a bit weird to me - it has 5<br>
&#39;Starting&#39; lines, but no<br>
other lines I would have expected to have, reading its source, and as<br>
I can see in another<br>
run, that did finish successfully [3].<br>
<br>
Adding Idan, but not sure it&#39;s a bug in the daemon.<br>
<br>
[1] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/</a><br>
<br>
[2] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a><br>
<br>
[3] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a></blockquote><div><br></div><div>Looks like the daemon is already running on this host - maybe host deploy</div><div>is trying to start the service twice?</div><div><br></div><div>We did not change the startup code couple of years, so this must be some</div><div>change in another component.</div><div><br></div><div>This patch will make it easier to detect future issues, logging any error</div><div>to the daemon log during startup:</div><div><a href="https://gerrit.ovirt.org/83670/">https://gerrit.ovirt.org/83670/</a><br></div><div><br></div><div>Nir</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
&gt;&gt;<br>
&gt;&gt; &lt;/error&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
<br>
<br>
<br>
--<br>
Didi<br>
_______________________________________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>
</blockquote></div></div>