<div dir="ltr">This still use the older daemon, the patch improving logging was merged today at 13:02<div>Please check again with current version.</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Nov 7, 2017 at 11:54 AM Dafna Ron &lt;<a href="mailto:dron@redhat.com">dron@redhat.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <div class="m_1524040804961939071moz-cite-prefix">we had the same failure this morning: <br>
      <br>
      Failed build:<br>
      <br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/</a><br>
      <br>
      All Logs: <br>
      <br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/</a><br>
      <br>
      engine log: <br>
      <br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171107030411-lago-basic-suite-master-host-0-5f90b210.log" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171107030411-lago-basic-suite-master-host-0-5f90b210.log</a><br>
      <br>
      host logs: <br>
      <br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/</a></div></div><div text="#000000" bgcolor="#FFFFFF"><div class="m_1524040804961939071moz-cite-prefix"><br>
      <br>
      <br>
      On 11/06/2017 08:26 PM, Nir Soffer wrote:<br>
    </div></div><div text="#000000" bgcolor="#FFFFFF">
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr">On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar
            David &lt;<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>&gt; wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon,
            Nov 6, 2017 at 1:57 PM, Dafna Ron &lt;<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>&gt; wrote:<br>
            &gt; adding Didi.<br>
            &gt;<br>
            &gt;<br>
            &gt; On 11/06/2017 11:51 AM, Ala Hino wrote:<br>
            &gt;<br>
            &gt; Suspected patch (<a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a>)
            is about cold merge<br>
            &gt; and has nothing to do with host deploy.<br>
            &gt;<br>
            &gt; On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron &lt;<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>&gt; wrote:<br>
            &gt;&gt;<br>
            &gt;&gt; Hi,<br>
            &gt;&gt;<br>
            &gt;&gt; We failed test 002_bootstrap.verify_add_hosts<br>
            &gt;&gt;<br>
            &gt;&gt; I can see we only tried to install one of the hosts
            (host-0) and failed.<br>
            &gt;&gt; the second host has no log which means we did not
            try to deploy it.<br>
            &gt;&gt;<br>
            &gt;&gt; The error suggests that we ovirt-imageio-daemon
            failed to start. However,<br>
            &gt;&gt; there is another message that I think should be
            addressed about conflicting<br>
            &gt;&gt; vdsm and libvirt configurations.<br>
            &gt;&gt;<br>
            &gt;&gt; Link to suspected patches: <a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a><br>
            &gt;&gt;<br>
            &gt;&gt;<br>
            &gt;&gt; Link to Job:<br>
            &gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/</a><br>
            &gt;&gt;<br>
            &gt;&gt;<br>
            &gt;&gt; Link to all logs:<br>
            &gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/</a><br>
            &gt;&gt;<br>
            &gt;&gt;<br>
            &gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log</a><br>
            &gt;&gt;<br>
            &gt;&gt;<br>
            &gt;&gt; (Relevant) error snippet from the log:<br>
            &gt;&gt;<br>
            &gt;&gt; &lt;error&gt;<br>
            &gt;&gt;<br>
            &gt;&gt; \<br>
            &gt;&gt;<br>
            &gt;&gt; 2017-11-06 02:56:46,526-0500 DEBUG<br>
            &gt;&gt; otopi.plugins.ovirt_host_deploy.vdsm.packages
            plugin.execute:921<br>
            &gt;&gt; execute-output: (&#39;/usr/bin/vdsm-tool&#39;, &#39;configure&#39;,
            &#39;--force&#39;) stdout:<br>
            &gt;&gt;<br>
            &gt;&gt; Checking configuration status...<br>
            &gt;&gt;<br>
            &gt;&gt; abrt is not configured for vdsm<br>
            &gt;&gt; WARNING: LVM local configuration:
            /etc/lvm/lvmlocal.conf is not based on<br>
            &gt;&gt; vdsm configuration<br>
            &gt;&gt; lvm requires configuration<br>
            &gt;&gt; libvirt is not configured for vdsm yet<br>
            &gt;&gt; FAILED: conflicting vdsm and libvirt-qemu tls
            configuration.<br>
            &gt;&gt; vdsm.conf with ssl=True requires the following
            changes:<br>
            &gt;&gt; libvirtd.conf: listen_tcp=0, auth_tcp=&quot;sasl&quot;,
            listen_tls=1<br>
            &gt;&gt; qemu.conf: spice_tls=1.<br>
            &gt;&gt; multipath requires configuration<br>
            &gt;&gt;<br>
            &gt;&gt;<br>
            &gt;&gt; 2017-11-06 02:56:47,551-0500 DEBUG
            otopi.plugins.otopi.services.systemd<br>
            &gt;&gt; plugin.execute:926 execute-output:
            (&#39;/usr/bin/systemctl&#39;, &#39;start&#39;,<br>
            &gt;&gt; &#39;ovirt-imageio-daemon.service&#39;) stderr:<br>
            &gt;&gt; Job for ovirt-imageio-daemon.service failed because
            the control process<br>
            &gt;&gt; exited with error code. See &quot;systemctl status
            ovirt-imageio-daemon.service&quot;<br>
            &gt;&gt; and &quot;journalctl -xe&quot; for details.<br>
            &gt;&gt;<br>
            &gt;&gt; 2017-11-06 02:56:47,552-0500 DEBUG otopi.context<br>
            &gt;&gt; context._executeMethod:143 method exception<br>
            &gt;&gt; Traceback (most recent call last):<br>
            &gt;&gt;   File
            &quot;/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py&quot;, line
            133, in<br>
            &gt;&gt; _executeMethod<br>
            &gt;&gt;     method[&#39;method&#39;]()<br>
            &gt;&gt;   File<br>
            &gt;&gt;
&quot;/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py&quot;,<br>
            &gt;&gt; line 179, in _start<br>
            &gt;&gt;     self.services.state(&#39;ovirt-imageio-daemon&#39;,
            True)<br>
            &gt;&gt;   File
            &quot;/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py&quot;,<br>
            &gt;&gt; line 141, in state<br>
            &gt;&gt;     service=name,<br>
            &gt;&gt; RuntimeError: Failed to start service
            &#39;ovirt-imageio-daemon&#39;<br>
            &gt;&gt; 2017-11-06 02:56:47,553-0500 ERROR otopi.context<br>
            &gt;&gt; context._executeMethod:152 Failed to execute stage
            &#39;Closing up&#39;: Failed to<br>
            &gt;&gt; start service &#39;ovirt-imageio-daemon&#39;<br>
            <br>
            In /var/log/messages of the host [1], there is:<br>
            <br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:
            Starting oVirt<br>
            ImageIO Daemon...<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0 python:
            detected<br>
            unhandled Python exception in
            &#39;/usr/bin/ovirt-imageio-daemon&#39;<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0 python: can&#39;t<br>
            communicate with ABRT daemon, is it running? [Errno 2] No
            such file or<br>
            directory<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            Traceback (most recent call last):<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File &quot;/usr/bin/ovirt-imageio-daemon&quot;, line 14, in
            &lt;module&gt;<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            server.main(sys.argv)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File
            &quot;/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py&quot;,<br>
            line 57, in main<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            start(config)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File
            &quot;/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py&quot;,<br>
            line 85, in start<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            WSGIRequestHandler)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File &quot;/usr/lib64/python2.7/SocketServer.py&quot;, line 419, in
            __init__<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            self.server_bind()<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File &quot;/usr/lib64/python2.7/wsgiref/simple_server.py&quot;, line
            48, in<br>
            server_bind<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            HTTPServer.server_bind(self)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File &quot;/usr/lib64/python2.7/BaseHTTPServer.py&quot;, line 108, in<br>
            server_bind<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            SocketServer.TCPServer.server_bind(self)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File &quot;/usr/lib64/python2.7/SocketServer.py&quot;, line 430, in
            server_bind<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            self.socket.bind(self.server_address)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            File &quot;/usr/lib64/python2.7/socket.py&quot;, line 224, in meth<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            return getattr(self._sock,name)(*args)<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:<br>
            socket.error: [Errno 98] Address already in use<br>
            Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:<br>
            ovirt-imageio-daemon.service: main process exited,
            code=exited,<br>
            status=1/FAILURE<br>
            <br>
            ovirt-host-deploy stops it, and immediately tries to start
            it:<br>
            <br>
            2017-11-06 02:56:47,203-0500 DEBUG<br>
            otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
            execute-result: (&#39;/usr/bin/systemctl&#39;, &#39;stop&#39;,<br>
            &#39;ovirt-imageio-daemon.service&#39;), rc=0<br>
            ...<br>
            2017-11-06 02:56:47,550-0500 DEBUG<br>
            otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
            execute-result: (&#39;/usr/bin/systemctl&#39;, &#39;start&#39;,<br>
            &#39;ovirt-imageio-daemon.service&#39;), rc=1<br>
            <br>
            Also, imageio-daemon&#39;s log [2] looks a bit weird to me - it
            has 5<br>
            &#39;Starting&#39; lines, but no<br>
            other lines I would have expected to have, reading its
            source, and as<br>
            I can see in another<br>
            run, that did finish successfully [3].<br>
            <br>
            Adding Idan, but not sure it&#39;s a bug in the daemon.<br>
            <br>
            [1] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/</a><br>
            <br>
            [2] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a><br>
            <br>
            [3] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a></blockquote>
          <div><br>
          </div>
          <div>Looks like the daemon is already running on this host -
            maybe host deploy</div>
          <div>is trying to start the service twice?</div>
          <div><br>
          </div>
          <div>We did not change the startup code couple of years, so
            this must be some</div>
          <div>change in another component.</div>
          <div><br>
          </div>
          <div>This patch will make it easier to detect future issues,
            logging any error</div>
          <div>to the daemon log during startup:</div>
          <div><a href="https://gerrit.ovirt.org/83670/" target="_blank">https://gerrit.ovirt.org/83670/</a><br>
          </div>
          <div><br>
          </div>
          <div>Nir</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
            <br>
            &gt;&gt;<br>
            &gt;&gt; &lt;/error&gt;<br>
            &gt;&gt;<br>
            &gt;&gt;<br>
            &gt;<br>
            &gt;<br>
            <br>
            <br>
            <br>
            --<br>
            Didi<br>
            _______________________________________________<br>
            Devel mailing list<br>
            <a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
            <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </div></blockquote></div>