
On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
Simone,
What do you think about this failure?
Thanks, Piotr
On Mon, May 28, 2018 at 7:12 AM, Barak Korren <bkorren@redhat.com> wrote:
On 27 May 2018 at 14:59, Piotr Kliczewski <pkliczew@redhat.com> wrote:
Martin,
I only can see:
2018-05-25 13:57:44,255-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH error running command root@lago-upgrade-from-release-suite-master-host-0:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': TimeLimitExceededException: SSH session timeout host 'root@lago-upgrade-from-release-suite-master-host-0' 2018-05-25 13:57:44,259-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] Timeout during host lago-upgrade-from-release-suite-master-host-0 install: SSH session timeout host 'root@lago-upgrade-from-release-suite-master-host-0'
There are no additional logs. SSH to host timeout. Are we sure that it is an issue caused by Ravi's change?
We have some quite strong circumstantial evidence: - Issue had affected all engine patches since that patch in a similar fashion. - Prior engine patch [1] passed successfully [2] - Other subsequent OST runs without engine patches passed successfully as well [3].
[1]: https://gerrit.ovirt.org/c/91595/2 [2]: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7777/ [3]: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7778/
Please note - the issue is affecting a test that is run by an upgrade suit on the post-upgrade system. It has no affect on the basic suit. So it probably has to do with some behaviour that is specific to upgraded systems.
I will try to reproduce later today in dev env, but I agree with Piotr's investigation, engine was not able to connect to the host using SSH and that's why no host-deploy logs were fetched.
Thanks, Piotr
On Sun, May 27, 2018 at 11:21 AM, Martin Perina <mperina@redhat.com> wrote:
Adding also Piotr to the thread
On Sun, 27 May 2018, 08:46 Barak Korren, <bkorren@redhat.com> wrote:
Test failed: [ AddHost (in upgrade-from-release-suite) ]
Link to suspected patches: https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS versions < 1.2 for hosts with cluster level>=4.1
Link to Job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7776/
Link to all logs: http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste r/7776/artifact/exported-artifacts/upgrade-from-release-suit -master-el7/test_logs/upgrade-from-release-suite-master/post -002_bootstrap.py/
Error snippet from log:
From nosetst log: <error>
AssertionError: False != True after 1200 seconds
</error>
Not finding a host deploy log in /var/log/ovirt-engine for some reason. This seems to have cause consistent failure in all other engine patches that followed it.
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
-- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o.