[ovirt-devel] [ OST Failure Report ] [ oVirt Master ] [ 06-11-2017 ] [ 002_bootstrap.verify_add_hosts ]

Nir Soffer nsoffer at redhat.com
Mon Nov 6 20:26:10 UTC 2017


On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar David <didi at redhat.com> wrote:

> On Mon, Nov 6, 2017 at 1:57 PM, Dafna Ron <dron at redhat.com> wrote:
> > adding Didi.
> >
> >
> > On 11/06/2017 11:51 AM, Ala Hino wrote:
> >
> > Suspected patch (https://gerrit.ovirt.org/#/c/83612/) is about cold
> merge
> > and has nothing to do with host deploy.
> >
> > On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron <dron at redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> We failed test 002_bootstrap.verify_add_hosts
> >>
> >> I can see we only tried to install one of the hosts (host-0) and failed.
> >> the second host has no log which means we did not try to deploy it.
> >>
> >> The error suggests that we ovirt-imageio-daemon failed to start.
> However,
> >> there is another message that I think should be addressed about
> conflicting
> >> vdsm and libvirt configurations.
> >>
> >> Link to suspected patches: https://gerrit.ovirt.org/#/c/83612/
> >>
> >>
> >> Link to Job:
> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/
> >>
> >>
> >> Link to all logs:
> >>
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/
> >>
> >>
> >>
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log
> >>
> >>
> >> (Relevant) error snippet from the log:
> >>
> >> <error>
> >>
> >> \
> >>
> >> 2017-11-06 02:56:46,526-0500 DEBUG
> >> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921
> >> execute-output: ('/usr/bin/vdsm-tool', 'configure', '--force') stdout:
> >>
> >> Checking configuration status...
> >>
> >> abrt is not configured for vdsm
> >> WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on
> >> vdsm configuration
> >> lvm requires configuration
> >> libvirt is not configured for vdsm yet
> >> FAILED: conflicting vdsm and libvirt-qemu tls configuration.
> >> vdsm.conf with ssl=True requires the following changes:
> >> libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
> >> qemu.conf: spice_tls=1.
> >> multipath requires configuration
> >>
> >>
> >> 2017-11-06 02:56:47,551-0500 DEBUG otopi.plugins.otopi.services.systemd
> >> plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start',
> >> 'ovirt-imageio-daemon.service') stderr:
> >> Job for ovirt-imageio-daemon.service failed because the control process
> >> exited with error code. See "systemctl status
> ovirt-imageio-daemon.service"
> >> and "journalctl -xe" for details.
> >>
> >> 2017-11-06 02:56:47,552-0500 DEBUG otopi.context
> >> context._executeMethod:143 method exception
> >> Traceback (most recent call last):
> >>   File "/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py", line 133, in
> >> _executeMethod
> >>     method['method']()
> >>   File
> >>
> "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
> >> line 179, in _start
> >>     self.services.state('ovirt-imageio-daemon', True)
> >>   File "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py",
> >> line 141, in state
> >>     service=name,
> >> RuntimeError: Failed to start service 'ovirt-imageio-daemon'
> >> 2017-11-06 02:56:47,553-0500 ERROR otopi.context
> >> context._executeMethod:152 Failed to execute stage 'Closing up': Failed
> to
> >> start service 'ovirt-imageio-daemon'
>
> In /var/log/messages of the host [1], there is:
>
> Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd: Starting oVirt
> ImageIO Daemon...
> Nov  6 02:56:47 lago-basic-suite-master-host-0 python: detected
> unhandled Python exception in '/usr/bin/ovirt-imageio-daemon'
> Nov  6 02:56:47 lago-basic-suite-master-host-0 python: can't
> communicate with ABRT daemon, is it running? [Errno 2] No such file or
> directory
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> Traceback (most recent call last):
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/bin/ovirt-imageio-daemon", line 14, in <module>
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> server.main(sys.argv)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
> line 57, in main
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> start(config)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
> line 85, in start
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> WSGIRequestHandler)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> self.server_bind()
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib64/python2.7/wsgiref/simple_server.py", line 48, in
> server_bind
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> HTTPServer.server_bind(self)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
> server_bind
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> SocketServer.TCPServer.server_bind(self)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> self.socket.bind(self.server_address)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> File "/usr/lib64/python2.7/socket.py", line 224, in meth
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> return getattr(self._sock,name)(*args)
> Nov  6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon:
> socket.error: [Errno 98] Address already in use
> Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:
> ovirt-imageio-daemon.service: main process exited, code=exited,
> status=1/FAILURE
>
> ovirt-host-deploy stops it, and immediately tries to start it:
>
> 2017-11-06 02:56:47,203-0500 DEBUG
> otopi.plugins.otopi.services.systemd plugin.executeRaw:863
> execute-result: ('/usr/bin/systemctl', 'stop',
> 'ovirt-imageio-daemon.service'), rc=0
> ...
> 2017-11-06 02:56:47,550-0500 DEBUG
> otopi.plugins.otopi.services.systemd plugin.executeRaw:863
> execute-result: ('/usr/bin/systemctl', 'start',
> 'ovirt-imageio-daemon.service'), rc=1
>
> Also, imageio-daemon's log [2] looks a bit weird to me - it has 5
> 'Starting' lines, but no
> other lines I would have expected to have, reading its source, and as
> I can see in another
> run, that did finish successfully [3].
>
> Adding Idan, but not sure it's a bug in the daemon.
>
> [1]
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/
>
> [2]
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log
>
> [3]
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log


Looks like the daemon is already running on this host - maybe host deploy
is trying to start the service twice?

We did not change the startup code couple of years, so this must be some
change in another component.

This patch will make it easier to detect future issues, logging any error
to the daemon log during startup:
https://gerrit.ovirt.org/83670/

Nir


>
>
> >>
> >> </error>
> >>
> >>
> >
> >
>
>
>
> --
> Didi
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20171106/dcb22270/attachment.html>


More information about the Devel mailing list