[ovirt-devel] [ OST Failure Report ] [ oVirt Master ] [ 06-11-2017 ] [ 002_bootstrap.verify_add_hosts ]

Dafna Ron dron at redhat.com
Tue Nov 7 09:54:05 UTC 2017


we had the same failure this morning:

Failed build:

http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/

All Logs:

http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/

engine log:

http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171107030411-lago-basic-suite-master-host-0-5f90b210.log

host logs:

http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/


On 11/06/2017 08:26 PM, Nir Soffer wrote:
> On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar David <didi at redhat.com
> <mailto:didi at redhat.com>> wrote:
>
>     On Mon, Nov 6, 2017 at 1:57 PM, Dafna Ron <dron at redhat.com
>     <mailto:dron at redhat.com>> wrote:
>     > adding Didi.
>     >
>     >
>     > On 11/06/2017 11:51 AM, Ala Hino wrote:
>     >
>     > Suspected patch (https://gerrit.ovirt.org/#/c/83612/) is about
>     cold merge
>     > and has nothing to do with host deploy.
>     >
>     > On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron <dron at redhat.com
>     <mailto:dron at redhat.com>> wrote:
>     >>
>     >> Hi,
>     >>
>     >> We failed test 002_bootstrap.verify_add_hosts
>     >>
>     >> I can see we only tried to install one of the hosts (host-0)
>     and failed.
>     >> the second host has no log which means we did not try to deploy it.
>     >>
>     >> The error suggests that we ovirt-imageio-daemon failed to
>     start. However,
>     >> there is another message that I think should be addressed about
>     conflicting
>     >> vdsm and libvirt configurations.
>     >>
>     >> Link to suspected patches: https://gerrit.ovirt.org/#/c/83612/
>     >>
>     >>
>     >> Link to Job:
>     >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/
>     >>
>     >>
>     >> Link to all logs:
>     >>
>     http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/
>     >>
>     >>
>     >>
>     http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log
>     >>
>     >>
>     >> (Relevant) error snippet from the log:
>     >>
>     >> <error>
>     >>
>     >> \
>     >>
>     >> 2017-11-06 02:56:46,526-0500 DEBUG
>     >> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921
>     >> execute-output: ('/usr/bin/vdsm-tool', 'configure', '--force')
>     stdout:
>     >>
>     >> Checking configuration status...
>     >>
>     >> abrt is not configured for vdsm
>     >> WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not
>     based on
>     >> vdsm configuration
>     >> lvm requires configuration
>     >> libvirt is not configured for vdsm yet
>     >> FAILED: conflicting vdsm and libvirt-qemu tls configuration.
>     >> vdsm.conf with ssl=True requires the following changes:
>     >> libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
>     >> qemu.conf: spice_tls=1.
>     >> multipath requires configuration
>     >>
>     >>
>     >> 2017-11-06 02:56:47,551-0500 DEBUG
>     otopi.plugins.otopi.services.systemd
>     >> plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start',
>     >> 'ovirt-imageio-daemon.service') stderr:
>     >> Job for ovirt-imageio-daemon.service failed because the control
>     process
>     >> exited with error code. See "systemctl status
>     ovirt-imageio-daemon.service"
>     >> and "journalctl -xe" for details.
>     >>
>     >> 2017-11-06 02:56:47,552-0500 DEBUG otopi.context
>     >> context._executeMethod:143 method exception
>     >> Traceback (most recent call last):
>     >>   File "/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py", line
>     133, in
>     >> _executeMethod
>     >>     method['method']()
>     >>   File
>     >>
>     "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
>     >> line 179, in _start
>     >>     self.services.state('ovirt-imageio-daemon', True)
>     >>   File
>     "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py",
>     >> line 141, in state
>     >>     service=name,
>     >> RuntimeError: Failed to start service 'ovirt-imageio-daemon'
>     >> 2017-11-06 02:56:47,553-0500 ERROR otopi.context
>     >> context._executeMethod:152 Failed to execute stage 'Closing
>     up': Failed to
>     >> start service 'ovirt-imageio-daemon'
>
>     In /var/log/messages of the host [1], there is:
>
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd: Starting oVirt
>     ImageIO Daemon...
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 python: detected
>     unhandled Python exception in '/usr/bin/ovirt-imageio-daemon'
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 python: can't
>     communicate with ABRT daemon, is it running? [Errno 2] No such file or
>     directory
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     Traceback (most recent call last):
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File "/usr/bin/ovirt-imageio-daemon", line 14, in <module>
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     server.main(sys.argv)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File
>     "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
>     line 57, in main
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     start(config)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File
>     "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
>     line 85, in start
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     WSGIRequestHandler)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     self.server_bind()
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File "/usr/lib64/python2.7/wsgiref/simple_server.py", line 48, in
>     server_bind
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     HTTPServer.server_bind(self)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
>     server_bind
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     SocketServer.TCPServer.server_bind(self)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     self.socket.bind(self.server_address)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     File "/usr/lib64/python2.7/socket.py", line 224, in meth
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     return getattr(self._sock,name)(*args)
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
>     socket.error: [Errno 98] Address already in use
>     Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:
>     ovirt-imageio-daemon.service: main process exited, code=exited,
>     status=1/FAILURE
>
>     ovirt-host-deploy stops it, and immediately tries to start it:
>
>     2017-11-06 02:56:47,203-0500 DEBUG
>     otopi.plugins.otopi.services.systemd plugin.executeRaw:863
>     execute-result: ('/usr/bin/systemctl', 'stop',
>     'ovirt-imageio-daemon.service'), rc=0
>     ...
>     2017-11-06 02:56:47,550-0500 DEBUG
>     otopi.plugins.otopi.services.systemd plugin.executeRaw:863
>     execute-result: ('/usr/bin/systemctl', 'start',
>     'ovirt-imageio-daemon.service'), rc=1
>
>     Also, imageio-daemon's log [2] looks a bit weird to me - it has 5
>     'Starting' lines, but no
>     other lines I would have expected to have, reading its source, and as
>     I can see in another
>     run, that did finish successfully [3].
>
>     Adding Idan, but not sure it's a bug in the daemon.
>
>     [1]
>     http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/
>
>     [2]
>     http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log
>
>     [3]
>     http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log
>
>
> Looks like the daemon is already running on this host - maybe host deploy
> is trying to start the service twice?
>
> We did not change the startup code couple of years, so this must be some
> change in another component.
>
> This patch will make it easier to detect future issues, logging any error
> to the daemon log during startup:
> https://gerrit.ovirt.org/83670/
>
> Nir
>  
>
>
>
>     >>
>     >> </error>
>     >>
>     >>
>     >
>     >
>
>
>
>     --
>     Didi
>     _______________________________________________
>     Devel mailing list
>     Devel at ovirt.org <mailto:Devel at ovirt.org>
>     http://lists.ovirt.org/mailman/listinfo/devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20171107/0be0ccb7/attachment-0001.html>


More information about the Devel mailing list