On Mon,
            Nov 6, 2017 at 1:57 PM, Dafna Ron <dron@redhat.com> wrote:
            > adding Didi.
            >
            >
            > On 11/06/2017 11:51 AM, Ala Hino wrote:
            >
            > Suspected patch (https://gerrit.ovirt.org/#/c/83612/)
            is about cold merge
            > and has nothing to do with host deploy.
            >
            > On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron <dron@redhat.com> wrote:
            >>
            >> Hi,
            >>
            >> We failed test 002_bootstrap.verify_add_hosts
            >>
            >> I can see we only tried to install one of the hosts
            (host-0) and failed.
            >> the second host has no log which means we did not
            try to deploy it.
            >>
            >> The error suggests that we ovirt-imageio-daemon
            failed to start. However,
            >> there is another message that I think should be
            addressed about conflicting
            >> vdsm and libvirt configurations.
            >>
            >> Link to suspected patches: https://gerrit.ovirt.org/#/c/83612/
            >>
            >>
            >> Link to Job:
            >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/
            >>
            >>
            >> Link to all logs:
            >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/
            >>
            >>
            >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log
            >>
            >>
            >> (Relevant) error snippet from the log:
            >>
            >> <error>
            >>
            >> \
            >>
            >> 2017-11-06 02:56:46,526-0500 DEBUG
            >> otopi.plugins.ovirt_host_deploy.vdsm.packages
            plugin.execute:921
            >> execute-output: ('/usr/bin/vdsm-tool', 'configure',
            '--force') stdout:
            >>
            >> Checking configuration status...
            >>
            >> abrt is not configured for vdsm
            >> WARNING: LVM local configuration:
            /etc/lvm/lvmlocal.conf is not based on
            >> vdsm configuration
            >> lvm requires configuration
            >> libvirt is not configured for vdsm yet
            >> FAILED: conflicting vdsm and libvirt-qemu tls
            configuration.
            >> vdsm.conf with ssl=True requires the following
            changes:
            >> libvirtd.conf: listen_tcp=0, auth_tcp="sasl",
            listen_tls=1
            >> qemu.conf: spice_tls=1.
            >> multipath requires configuration
            >>
            >>
            >> 2017-11-06 02:56:47,551-0500 DEBUG
            otopi.plugins.otopi.services.systemd
            >> plugin.execute:926 execute-output:
            ('/usr/bin/systemctl', 'start',
            >> 'ovirt-imageio-daemon.service') stderr:
            >> Job for ovirt-imageio-daemon.service failed because
            the control process
            >> exited with error code. See "systemctl status
            ovirt-imageio-daemon.service"
            >> and "journalctl -xe" for details.
            >>
            >> 2017-11-06 02:56:47,552-0500 DEBUG otopi.context
            >> context._executeMethod:143 method exception
            >> Traceback (most recent call last):
            >>   File
            "/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py", line
            133, in
            >> _executeMethod
            >>     method['method']()
            >>   File
            >>
"/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
            >> line 179, in _start
            >>     self.services.state('ovirt-imageio-daemon',
            True)
            >>   File
            "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py",
            >> line 141, in state
            >>     service=name,
            >> RuntimeError: Failed to start service
            'ovirt-imageio-daemon'
            >> 2017-11-06 02:56:47,553-0500 ERROR otopi.context
            >> context._executeMethod:152 Failed to execute stage
            'Closing up': Failed to
            >> start service 'ovirt-imageio-daemon'
            
            In /var/log/messages of the host [1], there is:
            
            Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:
            Starting oVirt
            ImageIO Daemon...
            Nov  6 02:56:47 lago-basic-suite-master-host-0 python:
            detected
            unhandled Python exception in
            '/usr/bin/ovirt-imageio-daemon'
            Nov  6 02:56:47 lago-basic-suite-master-host-0 python: can't
            communicate with ABRT daemon, is it running? [Errno 2] No
            such file or
            directory
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            Traceback (most recent call last):
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File "/usr/bin/ovirt-imageio-daemon", line 14, in
            <module>
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            server.main(sys.argv)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File
            "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
            line 57, in main
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            start(config)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File
            "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
            line 85, in start
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            WSGIRequestHandler)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File "/usr/lib64/python2.7/SocketServer.py", line 419, in
            __init__
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            self.server_bind()
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File "/usr/lib64/python2.7/wsgiref/simple_server.py", line
            48, in
            server_bind
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            HTTPServer.server_bind(self)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in
            server_bind
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            SocketServer.TCPServer.server_bind(self)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File "/usr/lib64/python2.7/SocketServer.py", line 430, in
            server_bind
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            self.socket.bind(self.server_address)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            File "/usr/lib64/python2.7/socket.py", line 224, in meth
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            return getattr(self._sock,name)(*args)
            Nov  6 02:56:47 lago-basic-suite-master-host-0
            ovirt-imageio-daemon:
            socket.error: [Errno 98] Address already in use
            Nov  6 02:56:47 lago-basic-suite-master-host-0 systemd:
            ovirt-imageio-daemon.service: main process exited,
            code=exited,
            status=1/FAILURE
            
            ovirt-host-deploy stops it, and immediately tries to start
            it:
            
            2017-11-06 02:56:47,203-0500 DEBUG
            otopi.plugins.otopi.services.systemd plugin.executeRaw:863
            execute-result: ('/usr/bin/systemctl', 'stop',
            'ovirt-imageio-daemon.service'), rc=0
            ...
            2017-11-06 02:56:47,550-0500 DEBUG
            otopi.plugins.otopi.services.systemd plugin.executeRaw:863
            execute-result: ('/usr/bin/systemctl', 'start',
            'ovirt-imageio-daemon.service'), rc=1
            
            Also, imageio-daemon's log [2] looks a bit weird to me - it
            has 5
            'Starting' lines, but no
            other lines I would have expected to have, reading its
            source, and as
            I can see in another
            run, that did finish successfully [3].
            
            Adding Idan, but not sure it's a bug in the daemon.
            
            [1] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/
            
            [2] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log
            
            [3] http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log
          
          
          Looks like the daemon is already running on this host -
            maybe host deploy
          is trying to start the service twice?
          
          
          We did not change the startup code couple of years, so
            this must be some
          change in another component.
          
          
          This patch will make it easier to detect future issues,
            logging any error
          to the daemon log during startup:
          
          
          
          Nir
           
          
            
            >>
            >> </error>
            >>
            >>
            >
            >
            
            
            
            --
            Didi
            _______________________________________________
            Devel mailing list
            Devel@ovirt.org
            http://lists.ovirt.org/mailman/listinfo/devel