Martin, do you have any updates? please note that ovirt-engine has
been
broken for a few days so perhaps we should stop merging or revert the
original change?
Still looking at it, here are partial results:
1. New host installation: never reproduced, 4.2 host is always installed
fine on 4.2 engine
2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2
was always successfull
3. Reinstallation - once it happened to me that during reinstallation the
host remain stucked during Reinstallation and the whole reinstallation
failed due to timeout
- that may be the issue which can be seen in CI, but so far I don't
have reliable reproducer to be able to debug why host-deploy process on the
host is stucked
On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <pkliczew(a)redhat.com>
wrote:
> +Martin
>
> He is working on it.
>
> Thanks,
> Piotr
>
> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <dron(a)redhat.com> wrote:
>
>> Hi Piotr,
>>
>> Any update on this?
>>
>> Thanks.
>> Dafna
>>
>>
>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <
>> piotr.kliczewski(a)gmail.com> wrote:
>>
>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren <bkorren(a)redhat.com>
>>> wrote:
>>> >
>>> >
>>> > On 28 May 2018 at 12:38, Piotr Kliczewski
<piotr.kliczewski(a)gmail.com
>>> >
>>> > wrote:
>>> >>
>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren
<bkorren(a)redhat.com>
>>> wrote:
>>> >> > Note: we're now seeing a very similar issue in the 4.2
branch as
>>> well
>>> >> > that
>>> >> > seems to have been introduced by the following patch:
>>> >>
>>> >> Can you point to specific job so we could take a look at the logs?
>>> >
>>> >
>>> > Whoops, sorry, here:
>>> >
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>>> >
>>>
>>> Looks like the same issue:
>>>
>>> 2018-05-28 03:41:03,606-04 ERROR
>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error running
>>> command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\"
> /dev/null
>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1"
0; tar
>>> --warning=no-timestamp -C "${MYTMP}" -x &&
>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH
>>> session timeout host
>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>> 2018-05-28 03:41:03,606-04 ERROR
>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
>>> [1244c90f] Error during deploy dialog
>>> 2018-05-28 03:41:03,611-04 ERROR
>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during
>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
>>> session timeout host
>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>
>>> >>
>>> >>
>>> >> >
>>> >> >
https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong
>>> ciphers
>>> >> > for
>>> >> > 4.2 hosts
>>> >> >
>>> >> > On 28 May 2018 at 10:26, Barak Korren
<bkorren(a)redhat.com> wrote:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On 28 May 2018 at 10:19, Martin Perina
<mperina(a)redhat.com>
>>> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski
>>> >> >>> <pkliczew(a)redhat.com>
>>> >> >>> wrote:
>>> >> >>>>
>>> >> >>>> Simone,
>>> >> >>>>
>>> >> >>>> What do you think about this failure?
>>> >> >>>>
>>> >> >>>> Thanks,
>>> >> >>>> Piotr
>>> >> >>>>
>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren <
>>> bkorren(a)redhat.com>
>>> >> >>>> wrote:
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski
<pkliczew(a)redhat.com
>>> >
>>> >> >>>>> wrote:
>>> >> >>>>>>
>>> >> >>>>>> Martin,
>>> >> >>>>>>
>>> >> >>>>>> I only can see:
>>> >> >>>>>>
>>> >> >>>>>> 2018-05-25 13:57:44,255-04 ERROR
>>> >> >>>>>>
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1)
[55a7b15b] SSH
>>> error
>>> >> >>>>>> running
>>> >> >>>>>> command root@lago-upgrade-from-release
>>> -suite-master-host-0:'umask
>>> >> >>>>>> 0077;
>>> >> >>>>>>
MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>> ovirt-XXXXXXXXXX)";
>>> >> >>>>>> trap
>>> >> >>>>>> "chmod -R u+rwX \"${MYTMP}\"
> /dev/null 2>&1; rm -fr
>>> \"${MYTMP}\"
>>> >> >>>>>> >
>>> >> >>>>>> /dev/null 2>&1" 0; tar
--warning=no-timestamp -C "${MYTMP}"
>>> -x &&
>>> >> >>>>>> "${MYTMP}"/ovirt-host-deploy
DIALOG/dialect=str:machine
>>> >> >>>>>> DIALOG/customization=bool:True':
TimeLimitExceededException:
>>> SSH
>>> >> >>>>>> session
>>> >> >>>>>> timeout host
'root@lago-upgrade-from-releas
>>> e-suite-master-host-0'
>>> >> >>>>>> 2018-05-25 13:57:44,259-04 ERROR
>>> >> >>>>>>
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>> >> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1)
[55a7b15b] Timeout
>>> during
>>> >> >>>>>> host
>>> >> >>>>>>
lago-upgrade-from-release-suite-master-host-0 install: SSH
>>> session
>>> >> >>>>>> timeout
>>> >> >>>>>> host
'root@lago-upgrade-from-release-suite-master-host-0'
>>> >> >>>>>>
>>> >> >>>>>> There are no additional logs. SSH to host
timeout. Are we
>>> sure that
>>> >> >>>>>> it
>>> >> >>>>>> is an issue caused by Ravi's change?
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> We have some quite strong circumstantial
evidence:
>>> >> >>>>> - Issue had affected all engine patches since
that patch in a
>>> >> >>>>> similar
>>> >> >>>>> fashion.
>>> >> >>>>> - Prior engine patch [1] passed successfully
[2]
>>> >> >>>>> - Other subsequent OST runs without engine
patches passed
>>> >> >>>>> successfully
>>> >> >>>>> as well [3].
>>> >> >>>>>
>>> >> >>>>> [1]:
https://gerrit.ovirt.org/c/91595/2
>>> >> >>>>> [2]:
>>> >> >>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
>>> r/7777/
>>> >> >>>>> [3]:
>>> >> >>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste
>>> r/7778/
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> Please note - the issue is affecting a test
that is run by an
>>> >> >>>>> upgrade
>>> >> >>>>> suit on the post-upgrade system. It has no
affect on the basic
>>> suit.
>>> >> >>>>> So it
>>> >> >>>>> probably has to do with some behaviour that is
specific to
>>> upgraded
>>> >> >>>>> systems.
>>> >> >>>
>>> >> >>>
>>> >> >>> I will try to reproduce later today in dev env, but I
agree with
>>> >> >>> Piotr's
>>> >> >>> investigation, engine was not able to connect to the
host using
>>> SSH
>>> >> >>> and
>>> >> >>> that's why no host-deploy logs were fetched.
>>> >> >>
>>> >> >>
>>> >> >> Lago fetches the logs from the host too (And it can take
then
>>> from the
>>> >> >> VM
>>> >> >> image directly if the host is not responsive over SSH), can
we
>>> get at
>>> >> >> the
>>> >> >> host-deploy logs that way?
>>> >> >>
>>> >> >>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>> Piotr
>>> >> >>>>>>
>>> >> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin
Perina
>>> >> >>>>>> <mperina(a)redhat.com>
>>> >> >>>>>> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>> Adding also Piotr to the thread
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> On Sun, 27 May 2018, 08:46 Barak
Korren, <bkorren(a)redhat.com
>>> >
>>> >> >>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>> Test failed: [ AddHost (in
upgrade-from-release-suite) ]
>>> >> >>>>>>>>
>>> >> >>>>>>>> Link to suspected patches:
>>> >> >>>>>>>>
https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS
>>> versions < 1.2
>>> >> >>>>>>>> for hosts with cluster
level>=4.1
>>> >> >>>>>>>>
>>> >> >>>>>>>> Link to Job:
>>> >> >>>>>>>>
>>> >> >>>>>>>>
http://jenkins.ovirt.org/job/o
>>> virt-master_change-queue-tester/7776/
>>> >> >>>>>>>>
>>> >> >>>>>>>> Link to all logs:
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>>
http://jenkins.ovirt.org/job/o
>>> virt-master_change-queue-tester/7776/artifact/exported-artif
>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
>>> from-release-suite-master/post-002_bootstrap.py/
>>> >> >>>>>>>>
>>> >> >>>>>>>> Error snippet from log:
>>> >> >>>>>>>>
>>> >> >>>>>>>> From nosetst log:
>>> >> >>>>>>>> <error>
>>> >> >>>>>>>>
>>> >> >>>>>>>> AssertionError: False != True after
1200 seconds
>>> >> >>>>>>>>
>>> >> >>>>>>>> </error>
>>> >> >>>>>>>>
>>> >> >>>>>>>> Not finding a host deploy log in
/var/log/ovirt-engine for
>>> some
>>> >> >>>>>>>> reason.
>>> >> >>>>>>>> This seems to have cause consistent
failure in all other
>>> engine
>>> >> >>>>>>>> patches that followed it.
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> --
>>> >> >>>>>>>> Barak Korren
>>> >> >>>>>>>> RHV DevOps team , RHCE, RHCi
>>> >> >>>>>>>> Red Hat EMEA
>>> >> >>>>>>>>
redhat.com | TRIED. TESTED.
TRUSTED. |
redhat.com/trusted
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> Barak Korren
>>> >> >>>>> RHV DevOps team , RHCE, RHCi
>>> >> >>>>> Red Hat EMEA
>>> >> >>>>>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Martin Perina
>>> >> >>> Associate Manager, Software Engineering
>>> >> >>> Red Hat Czech s.r.o.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Barak Korren
>>> >> >> RHV DevOps team , RHCE, RHCi
>>> >> >> Red Hat EMEA
>>> >> >>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Barak Korren
>>> >> > RHV DevOps team , RHCE, RHCi
>>> >> > Red Hat EMEA
>>> >> >
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>> >> >
>>> >> > _______________________________________________
>>> >> > Devel mailing list -- devel(a)ovirt.org
>>> >> > To unsubscribe send an email to devel-leave(a)ovirt.org
>>> >> > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>> >> > oVirt Code of Conduct:
>>> >> >
https://www.ovirt.org/community/about/community-guidelines/
>>> >> > List Archives:
>>> >> >
>>> >> >
https://lists.ovirt.org/archives/list/devel@ovirt.org/messag
>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>>> >> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Barak Korren
>>> > RHV DevOps team , RHCE, RHCi
>>> > Red Hat EMEA
>>> >
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>> _______________________________________________
>>> Devel mailing list -- devel(a)ovirt.org
>>> To unsubscribe send an email to devel-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
https://www.ovirt.org/communit
>>> y/about/community-guidelines/
>>> List Archives:
https://lists.ovirt.org/archiv
>>> es/list/devel(a)ovirt.org/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
>>>
>>
>>
>
--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.