Master revert patches [1], [2] merged, 4.2 revert patches [3], [4]
waiting
to be merged.
We will repost patches to master tomorrow and will continue to investigate
mysterious host-deploy issue.
Btw, upgrade-from-prev-release on master [5] currently fails with:
18:59:31 + cp 'ovirt-system-tests/upgrade-from-prevrelease-suite-master/*.repo'
exported-artifacts
18:59:31 cp: cannot stat 'ovirt-system-tests/upgrade-
from-prevrelease-suite-master/*.repo': No such file or directory
18:59:31 POST BUILD TASK : FAILURE
So how can we test upgrade from 4.2 to master?
This is not the real issue, the real issue is
*00:00:19.190* /tmp/jenkins6944523151752956846.sh: line 4:
ovirt-system-tests/upgrade-from-prevrelease-suite-master/extra_sources:
No such file or directory
This is happening because there is no
'upgrade-from-prevrelease-suite-master', the suite to be used is
'upgrade-from-release-suite-master'.
Martin
[1]
https://gerrit.ovirt.org/91741
[2]
https://gerrit.ovirt.org/91742
[3]
https://gerrit.ovirt.org/91744
[4]
https://gerrit.ovirt.org/91745
[5]
https://jenkins.ovirt.org/view/oVirt%20system%20tests/
job/ovirt-system-tests_manual/2758/console
On Tue, May 29, 2018 at 3:42 PM, Barak Korren <bkorren(a)redhat.com> wrote:
>
>
> On 29 May 2018 at 16:30, Martin Perina <mperina(a)redhat.com> wrote:
>
>>
>>
>> On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <dron(a)redhat.com> wrote:
>>
>>> Martin, do you have any updates? please note that ovirt-engine has been
>>> broken for a few days so perhaps we should stop merging or revert the
>>> original change?
>>>
>>
>> Still looking at it, here are partial results:
>>
>> 1. New host installation: never reproduced, 4.2 host is always installed
>> fine on 4.2 engine
>> 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to
>> 4.2 was always successfull
>> 3. Reinstallation - once it happened to me that during reinstallation
>> the host remain stucked during Reinstallation and the whole reinstallation
>> failed due to timeout
>> - that may be the issue which can be seen in CI, but so far I don't
>> have reliable reproducer to be able to debug why host-deploy process on the
>> host is stucked
>>
>
> Did you try using OST locally? it reproduces consistently with the OST
> upgrade suit. You can also use the manual job and pass a URL to any engine
> build beyond the marked patch. But there you'll have the same issue as with
> the CQ job where you won't have logs...
>
> Note, the process that happens there is AFAIK:
> 1. The oVirt 4.1 release is installed.
> 2. engine-setup runs
> 3. repos are changed to the master repo
> 4. engine is upgraded
> 5. bootstrap (including AddHost that fails is carried out)
>
>
>>
>>
>>>
>>> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski
<pkliczew(a)redhat.com>
>>> wrote:
>>>
>>>> +Martin
>>>>
>>>> He is working on it.
>>>>
>>>> Thanks,
>>>> Piotr
>>>>
>>>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <dron(a)redhat.com>
wrote:
>>>>
>>>>> Hi Piotr,
>>>>>
>>>>> Any update on this?
>>>>>
>>>>> Thanks.
>>>>> Dafna
>>>>>
>>>>>
>>>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <
>>>>> piotr.kliczewski(a)gmail.com> wrote:
>>>>>
>>>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren
<bkorren(a)redhat.com>
>>>>>> wrote:
>>>>>> >
>>>>>> >
>>>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski <
>>>>>> piotr.kliczewski(a)gmail.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <
>>>>>> bkorren(a)redhat.com> wrote:
>>>>>> >> > Note: we're now seeing a very similar issue in
the 4.2 branch
>>>>>> as well
>>>>>> >> > that
>>>>>> >> > seems to have been introduced by the following
patch:
>>>>>> >>
>>>>>> >> Can you point to specific job so we could take a look at
the logs?
>>>>>> >
>>>>>> >
>>>>>> > Whoops, sorry, here:
>>>>>> >
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>>>>>> >
>>>>>>
>>>>>> Looks like the same issue:
>>>>>>
>>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error
>>>>>> running
>>>>>> command
root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
>>>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d
-t
>>>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX
\"${MYTMP}\" > /dev/null
>>>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null
2>&1" 0; tar
>>>>>> --warning=no-timestamp -C "${MYTMP}" -x &&
>>>>>> "${MYTMP}"/ovirt-host-deploy
DIALOG/dialect=str:machine
>>>>>> DIALOG/customization=bool:True': TimeLimitExceededException:
SSH
>>>>>> session timeout host
>>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
>>>>>> [1244c90f] Error during deploy dialog
>>>>>> 2018-05-28 03:41:03,611-04 ERROR
>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout
during
>>>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
>>>>>> session timeout host
>>>>>> 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>>
>>>>>> >>
>>>>>> >>
>>>>>> >> >
>>>>>> >> >
https://gerrit.ovirt.org/c/91638/2 - core: Enable
only strong
>>>>>> ciphers
>>>>>> >> > for
>>>>>> >> > 4.2 hosts
>>>>>> >> >
>>>>>> >> > On 28 May 2018 at 10:26, Barak Korren
<bkorren(a)redhat.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> On 28 May 2018 at 10:19, Martin Perina
<mperina(a)redhat.com>
>>>>>> wrote:
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr
Kliczewski
>>>>>> >> >>> <pkliczew(a)redhat.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>>>
>>>>>> >> >>>> Simone,
>>>>>> >> >>>>
>>>>>> >> >>>> What do you think about this failure?
>>>>>> >> >>>>
>>>>>> >> >>>> Thanks,
>>>>>> >> >>>> Piotr
>>>>>> >> >>>>
>>>>>> >> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak
Korren <
>>>>>> bkorren(a)redhat.com>
>>>>>> >> >>>> wrote:
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> On 27 May 2018 at 14:59, Piotr
Kliczewski <
>>>>>> pkliczew(a)redhat.com>
>>>>>> >> >>>>> wrote:
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Martin,
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> I only can see:
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> 2018-05-25 13:57:44,255-04
ERROR
>>>>>> >> >>>>>>
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>>> >> >>>>>>
(EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH
>>>>>> error
>>>>>> >> >>>>>> running
>>>>>> >> >>>>>> command
root@lago-upgrade-from-release
>>>>>> -suite-master-host-0:'umask
>>>>>> >> >>>>>> 0077;
>>>>>> >> >>>>>>
MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>>>> ovirt-XXXXXXXXXX)";
>>>>>> >> >>>>>> trap
>>>>>> >> >>>>>> "chmod -R u+rwX
\"${MYTMP}\" > /dev/null 2>&1; rm -fr
>>>>>> \"${MYTMP}\"
>>>>>> >> >>>>>> >
>>>>>> >> >>>>>> /dev/null 2>&1" 0;
tar --warning=no-timestamp -C
>>>>>> "${MYTMP}" -x &&
>>>>>> >> >>>>>>
"${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>>>> >> >>>>>>
DIALOG/customization=bool:True':
>>>>>> TimeLimitExceededException: SSH
>>>>>> >> >>>>>> session
>>>>>> >> >>>>>> timeout host
'root@lago-upgrade-from-releas
>>>>>> e-suite-master-host-0'
>>>>>> >> >>>>>> 2018-05-25 13:57:44,259-04
ERROR
>>>>>> >> >>>>>>
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>>> >> >>>>>>
(EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b]
>>>>>> Timeout during
>>>>>> >> >>>>>> host
>>>>>> >> >>>>>>
lago-upgrade-from-release-suite-master-host-0 install:
>>>>>> SSH session
>>>>>> >> >>>>>> timeout
>>>>>> >> >>>>>> host
'root@lago-upgrade-from-release-suite-master-host-0'
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> There are no additional logs.
SSH to host timeout. Are we
>>>>>> sure that
>>>>>> >> >>>>>> it
>>>>>> >> >>>>>> is an issue caused by
Ravi's change?
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> We have some quite strong
circumstantial evidence:
>>>>>> >> >>>>> - Issue had affected all engine
patches since that patch in
>>>>>> a
>>>>>> >> >>>>> similar
>>>>>> >> >>>>> fashion.
>>>>>> >> >>>>> - Prior engine patch [1] passed
successfully [2]
>>>>>> >> >>>>> - Other subsequent OST runs without
engine patches passed
>>>>>> >> >>>>> successfully
>>>>>> >> >>>>> as well [3].
>>>>>> >> >>>>>
>>>>>> >> >>>>> [1]:
https://gerrit.ovirt.org/c/91595/2
>>>>>> >> >>>>> [2]:
>>>>>> >> >>>>>
http://jenkins.ovirt.org/job/o
>>>>>> virt-master_change-queue-tester/7777/
>>>>>> >> >>>>> [3]:
>>>>>> >> >>>>>
http://jenkins.ovirt.org/job/o
>>>>>> virt-master_change-queue-tester/7778/
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> Please note - the issue is
affecting a test that is run by
>>>>>> an
>>>>>> >> >>>>> upgrade
>>>>>> >> >>>>> suit on the post-upgrade system. It
has no affect on the
>>>>>> basic suit.
>>>>>> >> >>>>> So it
>>>>>> >> >>>>> probably has to do with some
behaviour that is specific to
>>>>>> upgraded
>>>>>> >> >>>>> systems.
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> I will try to reproduce later today in dev
env, but I agree
>>>>>> with
>>>>>> >> >>> Piotr's
>>>>>> >> >>> investigation, engine was not able to
connect to the host
>>>>>> using SSH
>>>>>> >> >>> and
>>>>>> >> >>> that's why no host-deploy logs were
fetched.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Lago fetches the logs from the host too (And it
can take then
>>>>>> from the
>>>>>> >> >> VM
>>>>>> >> >> image directly if the host is not responsive
over SSH), can we
>>>>>> get at
>>>>>> >> >> the
>>>>>> >> >> host-deploy logs that way?
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Thanks,
>>>>>> >> >>>>>> Piotr
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> On Sun, May 27, 2018 at 11:21
AM, Martin Perina
>>>>>> >> >>>>>> <mperina(a)redhat.com>
>>>>>> >> >>>>>> wrote:
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> Adding also Piotr to the
thread
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> On Sun, 27 May 2018, 08:46
Barak Korren, <
>>>>>> bkorren(a)redhat.com>
>>>>>> >> >>>>>>> wrote:
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> Test failed: [ AddHost
(in upgrade-from-release-suite) ]
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> Link to suspected
patches:
>>>>>> >> >>>>>>>>
https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS
>>>>>> versions < 1.2
>>>>>> >> >>>>>>>> for hosts with cluster
level>=4.1
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> Link to Job:
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>>
http://jenkins.ovirt.org/job/o
>>>>>> virt-master_change-queue-tester/7776/
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> Link to all logs:
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>>
http://jenkins.ovirt.org/job/o
>>>>>> virt-master_change-queue-tester/7776/artifact/exported-artif
>>>>>> acts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
>>>>>> from-release-suite-master/post-002_bootstrap.py/
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> Error snippet from
log:
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> From nosetst log:
>>>>>> >> >>>>>>>> <error>
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> AssertionError: False
!= True after 1200 seconds
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> </error>
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> Not finding a host
deploy log in /var/log/ovirt-engine
>>>>>> for some
>>>>>> >> >>>>>>>> reason.
>>>>>> >> >>>>>>>> This seems to have
cause consistent failure in all other
>>>>>> engine
>>>>>> >> >>>>>>>> patches that followed
it.
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> --
>>>>>> >> >>>>>>>> Barak Korren
>>>>>> >> >>>>>>>> RHV DevOps team , RHCE,
RHCi
>>>>>> >> >>>>>>>> Red Hat EMEA
>>>>>> >> >>>>>>>>
redhat.com | TRIED.
TESTED. TRUSTED. |
>>>>>>
redhat.com/trusted
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> --
>>>>>> >> >>>>> Barak Korren
>>>>>> >> >>>>> RHV DevOps team , RHCE, RHCi
>>>>>> >> >>>>> Red Hat EMEA
>>>>>> >> >>>>>
redhat.com | TRIED. TESTED.
TRUSTED. |
redhat.com/trusted
>>>>>> >> >>>>
>>>>>> >> >>>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Martin Perina
>>>>>> >> >>> Associate Manager, Software Engineering
>>>>>> >> >>> Red Hat Czech s.r.o.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Barak Korren
>>>>>> >> >> RHV DevOps team , RHCE, RHCi
>>>>>> >> >> Red Hat EMEA
>>>>>> >> >>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> > Barak Korren
>>>>>> >> > RHV DevOps team , RHCE, RHCi
>>>>>> >> > Red Hat EMEA
>>>>>> >> >
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>>>> >> >
>>>>>> >> > _______________________________________________
>>>>>> >> > Devel mailing list -- devel(a)ovirt.org
>>>>>> >> > To unsubscribe send an email to
devel-leave(a)ovirt.org
>>>>>> >> > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> >> > oVirt Code of Conduct:
>>>>>> >> >
https://www.ovirt.org/community/about/community-guidelines/
>>>>>> >> > List Archives:
>>>>>> >> >
>>>>>> >> >
https://lists.ovirt.org/archives/list/devel@ovirt.org/messag
>>>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Barak Korren
>>>>>> > RHV DevOps team , RHCE, RHCi
>>>>>> > Red Hat EMEA
>>>>>> >
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>>>> _______________________________________________
>>>>>> Devel mailing list -- devel(a)ovirt.org
>>>>>> To unsubscribe send an email to devel-leave(a)ovirt.org
>>>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/communit
>>>>>> y/about/community-guidelines/
>>>>>> List Archives:
https://lists.ovirt.org/archiv
>>>>>>
es/list/devel(a)ovirt.org/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Martin Perina
>> Associate Manager, Software Engineering
>> Red Hat Czech s.r.o.
>>
>
>
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>
--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.
| TRIED. TESTED. TRUSTED. |