On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <dron@redhat.com> wrote:
Martin, do you have any updates? please note that ovirt-engine has been broken for a few days so perhaps we should stop merging or revert the original change?

​Still looking at it, here are partial results:

1. New host installation: never reproduced, 4.2 host is always installed fine on 4.2 engine
2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2 was always successfull
3. Reinstallation - once it happened to me that during reinstallation the host remain stucked during Reinstallation and the whole​ reinstallation failed due to timeout
    - that may be the issue which can be seen in CI, but so far I don't have reliable reproducer to be able to debug why host-deploy process on the host is stucked



On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
+Martin

He is working on it.

Thanks,
Piotr

On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <dron@redhat.com> wrote:
Hi Piotr,

Any update on this?

Thanks.
Dafna


On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <piotr.kliczewski@gmail.com> wrote:
On Mon, May 28, 2018 at 11:41 AM, Barak Korren <bkorren@redhat.com> wrote:
>
>
> On 28 May 2018 at 12:38, Piotr Kliczewski <piotr.kliczewski@gmail.com>
> wrote:
>>
>> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <bkorren@redhat.com> wrote:
>> > Note: we're now seeing a very similar issue in the 4.2 branch as well
>> > that
>> > seems to have been introduced by the following patch:
>>
>> Can you point to specific job so we could take a look at the logs?
>
>
> Whoops, sorry, here:
> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>

Looks like the same issue:

2018-05-28 03:41:03,606-04 ERROR
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
(EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error running
command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null
2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar
--warning=no-timestamp -C "${MYTMP}" -x &&
"${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
DIALOG/customization=bool:True': TimeLimitExceededException: SSH
session timeout host
'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
2018-05-28 03:41:03,606-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
[1244c90f] Error during deploy dialog
2018-05-28 03:41:03,611-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
(EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during
host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
session timeout host
'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'

>>
>>
>> >
>> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong ciphers
>> > for
>> > 4.2 hosts
>> >
>> > On 28 May 2018 at 10:26, Barak Korren <bkorren@redhat.com> wrote:
>> >>
>> >>
>> >>
>> >> On 28 May 2018 at 10:19, Martin Perina <mperina@redhat.com> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski
>> >>> <pkliczew@redhat.com>
>> >>> wrote:
>> >>>>
>> >>>> Simone,
>> >>>>
>> >>>> What do you think about this failure?
>> >>>>
>> >>>> Thanks,
>> >>>> Piotr
>> >>>>
>> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren <bkorren@redhat.com>
>> >>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski <pkliczew@redhat.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Martin,
>> >>>>>>
>> >>>>>> I only can see:
>> >>>>>>
>> >>>>>> 2018-05-25 13:57:44,255-04 ERROR
>> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH error
>> >>>>>> running
>> >>>>>> command root@lago-upgrade-from-release-suite-master-host-0:'umask
>> >>>>>> 0077;
>> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)";
>> >>>>>> trap
>> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\"
>> >>>>>> >
>> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&
>> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>> >>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH
>> >>>>>> session
>> >>>>>> timeout host 'root@lago-upgrade-from-release-suite-master-host-0'
>> >>>>>> 2018-05-25 13:57:44,259-04 ERROR
>> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] Timeout during
>> >>>>>> host
>> >>>>>> lago-upgrade-from-release-suite-master-host-0 install: SSH session
>> >>>>>> timeout
>> >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0'
>> >>>>>>
>> >>>>>> There are no additional logs. SSH to host timeout. Are we sure that
>> >>>>>> it
>> >>>>>> is an issue caused by Ravi's change?
>> >>>>>
>> >>>>>
>> >>>>> We have some quite strong circumstantial evidence:
>> >>>>> - Issue had affected all engine patches since that patch in a
>> >>>>> similar
>> >>>>> fashion.
>> >>>>> - Prior engine patch [1] passed successfully [2]
>> >>>>> - Other subsequent OST runs without engine patches passed
>> >>>>> successfully
>> >>>>> as well [3].
>> >>>>>
>> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2
>> >>>>> [2]:
>> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7777/
>> >>>>> [3]:
>> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7778/
>> >>>>>
>> >>>>>
>> >>>>> Please note - the issue is affecting a test that is run by an
>> >>>>> upgrade
>> >>>>> suit on the post-upgrade system. It has no affect on the basic suit.
>> >>>>> So it
>> >>>>> probably has to do with some behaviour that is specific to upgraded
>> >>>>> systems.
>> >>>
>> >>>
>> >>> I will try to reproduce later today in dev env, but I agree with
>> >>> Piotr's
>> >>> investigation, engine was not able to connect to the host using SSH
>> >>> and
>> >>> that's why no host-deploy logs were fetched.
>> >>
>> >>
>> >> Lago fetches the logs from the host too (And it can take then from the
>> >> VM
>> >> image directly if the host is not responsive over SSH), can we get at
>> >> the
>> >> host-deploy logs that way?
>> >>
>> >>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Piotr
>> >>>>>>
>> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina
>> >>>>>> <mperina@redhat.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Adding also Piotr to the thread
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, <bkorren@redhat.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ]
>> >>>>>>>>
>> >>>>>>>> Link to suspected patches:
>> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS versions < 1.2
>> >>>>>>>> for hosts with cluster level>=4.1
>> >>>>>>>>
>> >>>>>>>> Link to Job:
>> >>>>>>>>
>> >>>>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7776/
>> >>>>>>>>
>> >>>>>>>> Link to all logs:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7776/artifact/exported-artifacts/upgrade-from-release-suit-master-el7/test_logs/upgrade-from-release-suite-master/post-002_bootstrap.py/
>> >>>>>>>>
>> >>>>>>>> Error snippet from log:
>> >>>>>>>>
>> >>>>>>>> From nosetst log:
>> >>>>>>>> <error>
>> >>>>>>>>
>> >>>>>>>> AssertionError: False != True after 1200 seconds
>> >>>>>>>>
>> >>>>>>>> </error>
>> >>>>>>>>
>> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine for some
>> >>>>>>>> reason.
>> >>>>>>>> This seems to have cause consistent failure in all other engine
>> >>>>>>>> patches that followed it.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Barak Korren
>> >>>>>>>> RHV DevOps team , RHCE, RHCi
>> >>>>>>>> Red Hat EMEA
>> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Barak Korren
>> >>>>> RHV DevOps team , RHCE, RHCi
>> >>>>> Red Hat EMEA
>> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Martin Perina
>> >>> Associate Manager, Software Engineering
>> >>> Red Hat Czech s.r.o.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Barak Korren
>> >> RHV DevOps team , RHCE, RHCi
>> >> Red Hat EMEA
>> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >
>> >
>> >
>> >
>> > --
>> > Barak Korren
>> > RHV DevOps team , RHCE, RHCi
>> > Red Hat EMEA
>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >
>> > _______________________________________________
>> > Devel mailing list -- devel@ovirt.org
>> > To unsubscribe send an email to devel-leave@ovirt.org
>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> > oVirt Code of Conduct:
>> > https://www.ovirt.org/community/about/community-guidelines/
>> > List Archives:
>> >
>> > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>> >
>
>
>
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/






--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.