<div dir="ltr">This still use the older daemon, the patch improving logging was merged today at 13:02<div>Please check again with current version.</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Nov 7, 2017 at 11:54 AM Dafna Ron <<a href="mailto:dron@redhat.com">dron@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="m_1524040804961939071moz-cite-prefix">we had the same failure this morning: <br>
<br>
Failed build:<br>
<br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/</a><br>
<br>
All Logs: <br>
<br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/</a><br>
<br>
engine log: <br>
<br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171107030411-lago-basic-suite-master-host-0-5f90b210.log" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171107030411-lago-basic-suite-master-host-0-5f90b210.log</a><br>
<br>
host logs: <br>
<br>
<a class="m_1524040804961939071moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/</a></div></div><div text="#000000" bgcolor="#FFFFFF"><div class="m_1524040804961939071moz-cite-prefix"><br>
<br>
<br>
On 11/06/2017 08:26 PM, Nir Soffer wrote:<br>
</div></div><div text="#000000" bgcolor="#FFFFFF">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar
David <<a href="mailto:didi@redhat.com" target="_blank">didi@redhat.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon,
Nov 6, 2017 at 1:57 PM, Dafna Ron <<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>> wrote:<br>
> adding Didi.<br>
><br>
><br>
> On 11/06/2017 11:51 AM, Ala Hino wrote:<br>
><br>
> Suspected patch (<a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a>)
is about cold merge<br>
> and has nothing to do with host deploy.<br>
><br>
> On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron <<a href="mailto:dron@redhat.com" target="_blank">dron@redhat.com</a>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> We failed test 002_bootstrap.verify_add_hosts<br>
>><br>
>> I can see we only tried to install one of the hosts
(host-0) and failed.<br>
>> the second host has no log which means we did not
try to deploy it.<br>
>><br>
>> The error suggests that we ovirt-imageio-daemon
failed to start. However,<br>
>> there is another message that I think should be
addressed about conflicting<br>
>> vdsm and libvirt configurations.<br>
>><br>
>> Link to suspected patches: <a href="https://gerrit.ovirt.org/#/c/83612/" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/c/83612/</a><br>
>><br>
>><br>
>> Link to Job:<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/</a><br>
>><br>
>><br>
>> Link to all logs:<br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/</a><br>
>><br>
>><br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log</a><br>
>><br>
>><br>
>> (Relevant) error snippet from the log:<br>
>><br>
>> <error><br>
>><br>
>> \<br>
>><br>
>> 2017-11-06 02:56:46,526-0500 DEBUG<br>
>> otopi.plugins.ovirt_host_deploy.vdsm.packages
plugin.execute:921<br>
>> execute-output: ('/usr/bin/vdsm-tool', 'configure',
'--force') stdout:<br>
>><br>
>> Checking configuration status...<br>
>><br>
>> abrt is not configured for vdsm<br>
>> WARNING: LVM local configuration:
/etc/lvm/lvmlocal.conf is not based on<br>
>> vdsm configuration<br>
>> lvm requires configuration<br>
>> libvirt is not configured for vdsm yet<br>
>> FAILED: conflicting vdsm and libvirt-qemu tls
configuration.<br>
>> vdsm.conf with ssl=True requires the following
changes:<br>
>> libvirtd.conf: listen_tcp=0, auth_tcp="sasl",
listen_tls=1<br>
>> qemu.conf: spice_tls=1.<br>
>> multipath requires configuration<br>
>><br>
>><br>
>> 2017-11-06 02:56:47,551-0500 DEBUG
otopi.plugins.otopi.services.systemd<br>
>> plugin.execute:926 execute-output:
('/usr/bin/systemctl', 'start',<br>
>> 'ovirt-imageio-daemon.service') stderr:<br>
>> Job for ovirt-imageio-daemon.service failed because
the control process<br>
>> exited with error code. See "systemctl status
ovirt-imageio-daemon.service"<br>
>> and "journalctl -xe" for details.<br>
>><br>
>> 2017-11-06 02:56:47,552-0500 DEBUG otopi.context<br>
>> context._executeMethod:143 method exception<br>
>> Traceback (most recent call last):<br>
>> File
"/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py", line
133, in<br>
>> _executeMethod<br>
>> method['method']()<br>
>> File<br>
>>
"/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",<br>
>> line 179, in _start<br>
>> self.services.state('ovirt-imageio-daemon',
True)<br>
>> File
"/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py",<br>
>> line 141, in state<br>
>> service=name,<br>
>> RuntimeError: Failed to start service
'ovirt-imageio-daemon'<br>
>> 2017-11-06 02:56:47,553-0500 ERROR otopi.context<br>
>> context._executeMethod:152 Failed to execute stage
'Closing up': Failed to<br>
>> start service 'ovirt-imageio-daemon'<br>
<br>
In /var/log/messages of the host [1], there is:<br>
<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 systemd:
Starting oVirt<br>
ImageIO Daemon...<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 python:
detected<br>
unhandled Python exception in
'/usr/bin/ovirt-imageio-daemon'<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 python: can't<br>
communicate with ABRT daemon, is it running? [Errno 2] No
such file or<br>
directory<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
Traceback (most recent call last):<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File "/usr/bin/ovirt-imageio-daemon", line 14, in
<module><br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
server.main(sys.argv)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File
"/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",<br>
line 57, in main<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
start(config)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File
"/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",<br>
line 85, in start<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
WSGIRequestHandler)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/SocketServer.py", line 419, in
__init__<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
self.server_bind()<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/wsgiref/simple_server.py", line
48, in<br>
server_bind<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
HTTPServer.server_bind(self)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in<br>
server_bind<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
SocketServer.TCPServer.server_bind(self)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/SocketServer.py", line 430, in
server_bind<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
self.socket.bind(self.server_address)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
File "/usr/lib64/python2.7/socket.py", line 224, in meth<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
return getattr(self._sock,name)(*args)<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0
ovirt-imageio-daemon:<br>
socket.error: [Errno 98] Address already in use<br>
Nov 6 02:56:47 lago-basic-suite-master-host-0 systemd:<br>
ovirt-imageio-daemon.service: main process exited,
code=exited,<br>
status=1/FAILURE<br>
<br>
ovirt-host-deploy stops it, and immediately tries to start
it:<br>
<br>
2017-11-06 02:56:47,203-0500 DEBUG<br>
otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
execute-result: ('/usr/bin/systemctl', 'stop',<br>
'ovirt-imageio-daemon.service'), rc=0<br>
...<br>
2017-11-06 02:56:47,550-0500 DEBUG<br>
otopi.plugins.otopi.services.systemd plugin.executeRaw:863<br>
execute-result: ('/usr/bin/systemctl', 'start',<br>
'ovirt-imageio-daemon.service'), rc=1<br>
<br>
Also, imageio-daemon's log [2] looks a bit weird to me - it
has 5<br>
'Starting' lines, but no<br>
other lines I would have expected to have, reading its
source, and as<br>
I can see in another<br>
run, that did finish successfully [3].<br>
<br>
Adding Idan, but not sure it's a bug in the daemon.<br>
<br>
[1] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/</a><br>
<br>
[2] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a><br>
<br>
[3] <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log</a></blockquote>
<div><br>
</div>
<div>Looks like the daemon is already running on this host -
maybe host deploy</div>
<div>is trying to start the service twice?</div>
<div><br>
</div>
<div>We did not change the startup code couple of years, so
this must be some</div>
<div>change in another component.</div>
<div><br>
</div>
<div>This patch will make it easier to detect future issues,
logging any error</div>
<div>to the daemon log during startup:</div>
<div><a href="https://gerrit.ovirt.org/83670/" target="_blank">https://gerrit.ovirt.org/83670/</a><br>
</div>
<div><br>
</div>
<div>Nir</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
>><br>
>> </error><br>
>><br>
>><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Didi<br>
_______________________________________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>
</blockquote>
</div>
</div>
</blockquote>
<p><br>
</p>
</div></blockquote></div>