
On 30 August 2017 at 22:20, Martin Perina <mperina@redhat.com> wrote:
So we're back in square one. Another possible culprit may be ansible: Vdsm is stopped two seconds after it logs to the host.
Aug 30 11:26:24 lago-basic-suite-master-host-0 systemd: Starting Session 10 of user root. Aug 30 11:26:25 lago-basic-suite-master-host-0 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Aug 30 11:26:25 lago-basic-suite-master-host-0 python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=bash -c "rpm -qi vdsm | grep -oE 'Version\\s+:\\s+[0-9\\.]+' | awk '{print $3}'" removes=None creates=None chdir=None Aug 30 11:26:26 lago-basic-suite-master-host-0 python: ansible-systemd Invoked with no_block=False name=libvirt-guests enabled=True daemon_reload=False state=started user=False masked=None Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Reloading. Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Stopped MOM instance configured for VDSM purposes. Aug 30 11:26:26 lago-basic-suite-master-host-0 systemd: Stopping Virtual Desktop Server Manager...
could it be that it triggers a systemd-reload that makes systemd croak on the vdsm-mom cycle?
We are not restarting VDSM within ovirt-host-deploy Ansible role, the VDSM restart is performed in host-deploy part same as in previous versions.
Within ovirt-host-deploy-firewalld we only enable and restart firewalld service.
comparing a successful add-host flow [1] to a failed one [2] we notice that in the failed add host ansible logs in twice (session 10 and session 11). Could it be somehow related? Notice that Session 11 uses the OLD way (awk+grep based) to find vdsm's version. Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd-logind: New session 10 of user root. Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd: Starting Session 10 of user root. Aug 30 05:55:53 lago-basic-suite-master-host-0 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Aug 30 05:55:54 lago-basic-suite-master-host-0 python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=bash -c "rpm -qi vdsm | grep -oE 'Version\\s+:\\s+[0-9\\.]+' | awk '{print $3}'" removes=None creates=None chdir=None Aug 30 05:55:54 lago-basic-suite-master-host-0 python: ansible-systemd Invoked with no_block=False name=libvirt-guests enabled=True daemon_reload=False state=started user=False masked=None Aug 30 05:55:54 lago-basic-suite-master-host-0 systemd: Reloading. Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopped MOM instance configured for VDSM purposes. Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopping Virtual Desktop Server Manager... Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Starting Suspend Active Libvirt Guests... Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Started Suspend Active Libvirt Guests. Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: libvirt version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org) Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: hostname: lago-basic-suite-master-host-0.lago.local Aug 30 05:55:55 lago-basic-suite-master-host-0 vdsmd_init_common.sh: vdsm: Running run_final_hooks Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: End of file while reading data: Input/output error Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopped Virtual Desktop Server Manager. Aug 30 05:55:55 lago-basic-suite-master-host-0 python: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=bash -c "rpm -q vdsm --qf '%{VERSION}'" removes=None creates=None chdir=None Aug 30 05:55:55 lago-basic-suite-master-host-0 python: ansible-systemd Invoked with no_block=False name=iptables enabled=False daemon_reload=False state=stopped user=False masked=None [1]: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2197/artifact/... [2]: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2151/artifact/... -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted