
The difference between the two, is this patch: https://github.com/oVirt/ovirt-ansible/pull/62/files#diff-f7199e24fce211caec... Could the libvirt-guest restart cause this issue? Anyway can you try to remove ovirt-ansible-roles-1.1.1-0.1.master.20170830145950.el7.centos.noarch.rpm from tested and see if it reproduces? On Thu, Aug 31, 2017 at 8:55 AM, Barak Korren <bkorren@redhat.com> wrote:
On 30 August 2017 at 22:20, Martin Perina <mperina@redhat.com> wrote:
So we're back in square one. Another possible culprit may be ansible: Vdsm is stopped two seconds after it logs to the host.
Aug 30 11:26:24 lago-basic-suite-master-host-0 systemd: Starting Session 10 of user root. Aug 30 11:26:25 lago-basic-suite-master-host-0 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Aug 30 11:26:25 lago-basic-suite-master-host-0 python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=bash -c "rpm -qi vdsm | grep -oE 'Version\\s+:\\s+[0-9\\.]+' | awk '{print $3}'" removes=None creates=None chdir=None Aug 30 11:26:26 lago-basic-suite-master-host-0 python: ansible-systemd Invoked with no_block=False name=libvirt-guests enabled=True daemon_reload=False state=started user=False masked=None Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Reloading. Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Stopped MOM instance configured for VDSM purposes. Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Stopping Virtual Desktop Server Manager...
could it be that it triggers a systemd-reload that makes systemd croak on the vdsm-mom cycle?
We are not restarting VDSM within ovirt-host-deploy Ansible role, the VDSM restart is performed in host-deploy part same as in previous versions.
Within ovirt-host-deploy-firewalld we only enable and restart firewalld service.
comparing a successful add-host flow [1] to a failed one [2] we notice that in the failed add host ansible logs in twice (session 10 and session 11). Could it be somehow related? Notice that Session 11 uses the OLD way (awk+grep based) to find vdsm's version.
Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd-logind: New session 10 of user root. Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd: Starting Session 10 of user root. Aug 30 05:55:53 lago-basic-suite-master-host-0 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Aug 30 05:55:54 lago-basic-suite-master-host-0 python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=bash -c "rpm -qi vdsm | grep -oE 'Version\\s+:\\s+[0-9\\.]+' | awk '{print $3}'" removes=None creates=None chdir=None Aug 30 05:55:54 lago-basic-suite-master-host-0 python: ansible-systemd Invoked with no_block=False name=libvirt-guests enabled=True daemon_reload=False state=started user=False masked=None Aug 30 05:55:54 lago-basic-suite-master-host-0 systemd: Reloading. Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopped MOM instance configured for VDSM purposes. Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopping Virtual Desktop Server Manager... Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Starting Suspend Active Libvirt Guests... Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Started Suspend Active Libvirt Guests. Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org) Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: hostname: lago-basic-suite-master-host-0.lago.local Aug 30 05:55:55 lago-basic-suite-master-host-0 vdsmd_init_common.sh: vdsm: Running run_final_hooks Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: End of file while reading data: Input/output error Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopped Virtual Desktop Server Manager. Aug 30 05:55:55 lago-basic-suite-master-host-0 python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=bash -c "rpm -q vdsm --qf '%{VERSION}'" removes=None creates=None chdir=None Aug 30 05:55:55 lago-basic-suite-master-host-0 python: ansible-systemd Invoked with no_block=False name=iptables enabled=False daemon_reload=False state=stopped user=False masked=None
[1]: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2197/artifact/...
[2]: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2151/artifact/...
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted