On Wed, May 30, 2018 at 8:13 AM, Barak Korren <bkorren(a)redhat.com> wrote:
>
>
> On 29 May 2018 at 22:29, Martin Perina <mperina(a)redhat.com> wrote:
>
>> Master revert patches [1], [2] merged, 4.2 revert patches [3], [4]
>> waiting to be merged.
>>
>> We will repost patches to master tomorrow and will continue to
>> investigate mysterious host-deploy issue.
>>
>> Btw, upgrade-from-prev-release on master [5] currently fails with:
>>
>> 18:59:31 + cp 'ovirt-system-tests/upgrade-fr
>> om-prevrelease-suite-master/*.repo' exported-artifacts
>> 18:59:31 cp: cannot stat 'ovirt-system-tests/upgrade-fr
>> om-prevrelease-suite-master/*.repo': No such file or directory
>> 18:59:31 POST BUILD TASK : FAILURE
>>
>> So how can we test upgrade from 4.2 to master?
>>
>
> This is not the real issue, the real issue is
>
> *00:00:19.190* /tmp/jenkins6944523151752956846.sh: line 4:
ovirt-system-tests/upgrade-from-prevrelease-suite-master/extra_sources: No such file or
directory
>
>
>
> This is happening because there is no
'upgrade-from-prevrelease-suite-master',
> the suite to be used is 'upgrade-from-release-suite-master'.
>
Yes, but looking at [6] we are testing upgrade from 4.1 to master, is
that true? If so, how this can work? We are supporting upgrade only between
directly following versions, so it should not be possible to upgrade from
4.1 to master directly ...
Well, I wonder where is the patch to change that, should have been created
when 4.2 went GA...
So is this table in [7] valid?
*Target oVirt version which will be tested.*
ENGINE_VERSION prev release release
master 4.2 master
--- 4.1 4.2
4.1 --- 4.1
It looks messed up.... I uess we'll need to 'git blame'...
>
>>
>> Martin
>>
>>
>> [1]
https://gerrit.ovirt.org/91741
>> [2]
https://gerrit.ovirt.org/91742
>> [3]
https://gerrit.ovirt.org/91744
>> [4]
https://gerrit.ovirt.org/91745
>> [5]
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ov
>> irt-system-tests_manual/2758/console
>>
>
[6]
https://github.com/oVirt/ovirt-system-tests/blob/
master/upgrade-from-release-suite-master/pre-reposync-config.repo
[7]
https://jenkins.ovirt.org/view/oVirt%20system%20tests/
job/ovirt-system-tests_manual/build?delay=0sec
>
>>
>> On Tue, May 29, 2018 at 3:42 PM, Barak Korren <bkorren(a)redhat.com>
>> wrote:
>>
>>>
>>>
>>> On 29 May 2018 at 16:30, Martin Perina <mperina(a)redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <dron(a)redhat.com>
wrote:
>>>>
>>>>> Martin, do you have any updates? please note that ovirt-engine has
>>>>> been broken for a few days so perhaps we should stop merging or
revert the
>>>>> original change?
>>>>>
>>>>
>>>> Still looking at it, here are partial results:
>>>>
>>>> 1. New host installation: never reproduced, 4.2 host is always
>>>> installed fine on 4.2 engine
>>>> 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to
>>>> 4.2 was always successfull
>>>> 3. Reinstallation - once it happened to me that during reinstallation
>>>> the host remain stucked during Reinstallation and the whole
reinstallation
>>>> failed due to timeout
>>>> - that may be the issue which can be seen in CI, but so far I
>>>> don't have reliable reproducer to be able to debug why host-deploy
process
>>>> on the host is stucked
>>>>
>>>
>>> Did you try using OST locally? it reproduces consistently with the OST
>>> upgrade suit. You can also use the manual job and pass a URL to any engine
>>> build beyond the marked patch. But there you'll have the same issue as
with
>>> the CQ job where you won't have logs...
>>>
>>> Note, the process that happens there is AFAIK:
>>> 1. The oVirt 4.1 release is installed.
>>> 2. engine-setup runs
>>> 3. repos are changed to the master repo
>>> 4. engine is upgraded
>>> 5. bootstrap (including AddHost that fails is carried out)
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <
>>>>> pkliczew(a)redhat.com> wrote:
>>>>>
>>>>>> +Martin
>>>>>>
>>>>>> He is working on it.
>>>>>>
>>>>>> Thanks,
>>>>>> Piotr
>>>>>>
>>>>>> On Tue, May 29, 2018 at 2:22 PM, Dafna Ron
<dron(a)redhat.com> wrote:
>>>>>>
>>>>>>> Hi Piotr,
>>>>>>>
>>>>>>> Any update on this?
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Dafna
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <
>>>>>>> piotr.kliczewski(a)gmail.com> wrote:
>>>>>>>
>>>>>>>> On Mon, May 28, 2018 at 11:41 AM, Barak Korren
<bkorren(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 28 May 2018 at 12:38, Piotr Kliczewski <
>>>>>>>> piotr.kliczewski(a)gmail.com>
>>>>>>>> > wrote:
>>>>>>>> >>
>>>>>>>> >> On Mon, May 28, 2018 at 10:57 AM, Barak Korren
<
>>>>>>>> bkorren(a)redhat.com> wrote:
>>>>>>>> >> > Note: we're now seeing a very similar
issue in the 4.2 branch
>>>>>>>> as well
>>>>>>>> >> > that
>>>>>>>> >> > seems to have been introduced by the
following patch:
>>>>>>>> >>
>>>>>>>> >> Can you point to specific job so we could take a
look at the
>>>>>>>> logs?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Whoops, sorry, here:
>>>>>>>> >
http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>>>>>>>> >
>>>>>>>>
>>>>>>>> Looks like the same issue:
>>>>>>>>
>>>>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH
error
>>>>>>>> running
>>>>>>>> command
root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
>>>>>>>> 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}"
mktemp -d -t
>>>>>>>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX
\"${MYTMP}\" > /dev/null
>>>>>>>> 2>&1; rm -fr \"${MYTMP}\" > /dev/null
2>&1" 0; tar
>>>>>>>> --warning=no-timestamp -C "${MYTMP}" -x
&&
>>>>>>>> "${MYTMP}"/ovirt-host-deploy
DIALOG/dialect=str:machine
>>>>>>>> DIALOG/customization=bool:True':
TimeLimitExceededException: SSH
>>>>>>>> session timeout host
>>>>>>>>
'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>>>> 2018-05-28 03:41:03,606-04 ERROR
>>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
(VdsDeploy)
>>>>>>>> [1244c90f] Error during deploy dialog
>>>>>>>> 2018-05-28 03:41:03,611-04 ERROR
>>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f]
Timeout
>>>>>>>> during
>>>>>>>> host lago-upgrade-from-prevrelease-suite-4-2-host-0
install: SSH
>>>>>>>> session timeout host
>>>>>>>>
'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
>>>>>>>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> >
>>>>>>>> >> >
https://gerrit.ovirt.org/c/91638/2 - core:
Enable only
>>>>>>>> strong ciphers
>>>>>>>> >> > for
>>>>>>>> >> > 4.2 hosts
>>>>>>>> >> >
>>>>>>>> >> > On 28 May 2018 at 10:26, Barak Korren
<bkorren(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> On 28 May 2018 at 10:19, Martin Perina
<mperina(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Mon, May 28, 2018 at 9:00 AM,
Piotr Kliczewski
>>>>>>>> >> >>> <pkliczew(a)redhat.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> Simone,
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> What do you think about this
failure?
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> Thanks,
>>>>>>>> >> >>>> Piotr
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> On Mon, May 28, 2018 at 7:12
AM, Barak Korren <
>>>>>>>> bkorren(a)redhat.com>
>>>>>>>> >> >>>> wrote:
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> On 27 May 2018 at 14:59,
Piotr Kliczewski <
>>>>>>>> pkliczew(a)redhat.com>
>>>>>>>> >> >>>>> wrote:
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> Martin,
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> I only can see:
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> 2018-05-25
13:57:44,255-04 ERROR
>>>>>>>> >> >>>>>>
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
>>>>>>>> >> >>>>>>
(EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b]
>>>>>>>> SSH error
>>>>>>>> >> >>>>>> running
>>>>>>>> >> >>>>>> command
root@lago-upgrade-from-release
>>>>>>>> -suite-master-host-0:'umask
>>>>>>>> >> >>>>>> 0077;
>>>>>>>> >> >>>>>>
MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>>>>>>>> ovirt-XXXXXXXXXX)";
>>>>>>>> >> >>>>>> trap
>>>>>>>> >> >>>>>> "chmod -R u+rwX
\"${MYTMP}\" > /dev/null 2>&1; rm -fr
>>>>>>>> \"${MYTMP}\"
>>>>>>>> >> >>>>>> >
>>>>>>>> >> >>>>>> /dev/null
2>&1" 0; tar --warning=no-timestamp -C
>>>>>>>> "${MYTMP}" -x &&
>>>>>>>> >> >>>>>>
"${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>>>>>>>> >> >>>>>>
DIALOG/customization=bool:True':
>>>>>>>> TimeLimitExceededException: SSH
>>>>>>>> >> >>>>>> session
>>>>>>>> >> >>>>>> timeout host
'root@lago-upgrade-from-releas
>>>>>>>> e-suite-master-host-0'
>>>>>>>> >> >>>>>> 2018-05-25
13:57:44,259-04 ERROR
>>>>>>>> >> >>>>>>
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>>>>>>>> >> >>>>>>
(EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b]
>>>>>>>> Timeout during
>>>>>>>> >> >>>>>> host
>>>>>>>> >> >>>>>>
lago-upgrade-from-release-suite-master-host-0 install:
>>>>>>>> SSH session
>>>>>>>> >> >>>>>> timeout
>>>>>>>> >> >>>>>> host
'root@lago-upgrade-from-releas
>>>>>>>> e-suite-master-host-0'
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> There are no additional
logs. SSH to host timeout. Are
>>>>>>>> we sure that
>>>>>>>> >> >>>>>> it
>>>>>>>> >> >>>>>> is an issue caused by
Ravi's change?
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> We have some quite strong
circumstantial evidence:
>>>>>>>> >> >>>>> - Issue had affected all
engine patches since that patch
>>>>>>>> in a
>>>>>>>> >> >>>>> similar
>>>>>>>> >> >>>>> fashion.
>>>>>>>> >> >>>>> - Prior engine patch [1]
passed successfully [2]
>>>>>>>> >> >>>>> - Other subsequent OST runs
without engine patches passed
>>>>>>>> >> >>>>> successfully
>>>>>>>> >> >>>>> as well [3].
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> [1]:
https://gerrit.ovirt.org/c/91595/2
>>>>>>>> >> >>>>> [2]:
>>>>>>>> >> >>>>>
http://jenkins.ovirt.org/job/o
>>>>>>>> virt-master_change-queue-tester/7777/
>>>>>>>> >> >>>>> [3]:
>>>>>>>> >> >>>>>
http://jenkins.ovirt.org/job/o
>>>>>>>> virt-master_change-queue-tester/7778/
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> Please note - the issue is
affecting a test that is run
>>>>>>>> by an
>>>>>>>> >> >>>>> upgrade
>>>>>>>> >> >>>>> suit on the post-upgrade
system. It has no affect on the
>>>>>>>> basic suit.
>>>>>>>> >> >>>>> So it
>>>>>>>> >> >>>>> probably has to do with
some behaviour that is specific
>>>>>>>> to upgraded
>>>>>>>> >> >>>>> systems.
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> I will try to reproduce later today
in dev env, but I agree
>>>>>>>> with
>>>>>>>> >> >>> Piotr's
>>>>>>>> >> >>> investigation, engine was not able
to connect to the host
>>>>>>>> using SSH
>>>>>>>> >> >>> and
>>>>>>>> >> >>> that's why no host-deploy logs
were fetched.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Lago fetches the logs from the host too
(And it can take
>>>>>>>> then from the
>>>>>>>> >> >> VM
>>>>>>>> >> >> image directly if the host is not
responsive over SSH), can
>>>>>>>> we get at
>>>>>>>> >> >> the
>>>>>>>> >> >> host-deploy logs that way?
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> Thanks,
>>>>>>>> >> >>>>>> Piotr
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> On Sun, May 27, 2018 at
11:21 AM, Martin Perina
>>>>>>>> >> >>>>>>
<mperina(a)redhat.com>
>>>>>>>> >> >>>>>> wrote:
>>>>>>>> >> >>>>>>>
>>>>>>>> >> >>>>>>> Adding also Piotr
to the thread
>>>>>>>> >> >>>>>>>
>>>>>>>> >> >>>>>>>
>>>>>>>> >> >>>>>>> On Sun, 27 May
2018, 08:46 Barak Korren, <
>>>>>>>> bkorren(a)redhat.com>
>>>>>>>> >> >>>>>>> wrote:
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> Test failed: [
AddHost (in upgrade-from-release-suite)
>>>>>>>> ]
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> Link to
suspected patches:
>>>>>>>> >> >>>>>>>>
https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS
>>>>>>>> versions < 1.2
>>>>>>>> >> >>>>>>>> for hosts with
cluster level>=4.1
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> Link to Job:
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>>
http://jenkins.ovirt.org/job/o
>>>>>>>> virt-master_change-queue-tester/7776/
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> Link to all
logs:
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>>
http://jenkins.ovirt.org/job/o
>>>>>>>>
virt-master_change-queue-tester/7776/artifact/exported-artif
>>>>>>>>
acts/upgrade-from-release-suit-master-el7/test_logs/upgrade-
>>>>>>>> from-release-suite-master/post-002_bootstrap.py/
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> Error snippet
from log:
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> From nosetst
log:
>>>>>>>> >> >>>>>>>> <error>
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> AssertionError:
False != True after 1200 seconds
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> </error>
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> Not finding a
host deploy log in /var/log/ovirt-engine
>>>>>>>> for some
>>>>>>>> >> >>>>>>>> reason.
>>>>>>>> >> >>>>>>>> This seems to
have cause consistent failure in all
>>>>>>>> other engine
>>>>>>>> >> >>>>>>>> patches that
followed it.
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> --
>>>>>>>> >> >>>>>>>> Barak Korren
>>>>>>>> >> >>>>>>>> RHV DevOps team
, RHCE, RHCi
>>>>>>>> >> >>>>>>>> Red Hat EMEA
>>>>>>>> >> >>>>>>>>
redhat.com |
TRIED. TESTED. TRUSTED. |
>>>>>>>>
redhat.com/trusted
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> --
>>>>>>>> >> >>>>> Barak Korren
>>>>>>>> >> >>>>> RHV DevOps team , RHCE,
RHCi
>>>>>>>> >> >>>>> Red Hat EMEA
>>>>>>>> >> >>>>>
redhat.com | TRIED. TESTED.
TRUSTED. |
redhat.com/trusted
>>>>>>>> >> >>>>
>>>>>>>> >> >>>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Martin Perina
>>>>>>>> >> >>> Associate Manager, Software
Engineering
>>>>>>>> >> >>> Red Hat Czech s.r.o.
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> --
>>>>>>>> >> >> Barak Korren
>>>>>>>> >> >> RHV DevOps team , RHCE, RHCi
>>>>>>>> >> >> Red Hat EMEA
>>>>>>>> >> >>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > --
>>>>>>>> >> > Barak Korren
>>>>>>>> >> > RHV DevOps team , RHCE, RHCi
>>>>>>>> >> > Red Hat EMEA
>>>>>>>> >> >
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>>>>>> >> >
>>>>>>>> >> >
_______________________________________________
>>>>>>>> >> > Devel mailing list -- devel(a)ovirt.org
>>>>>>>> >> > To unsubscribe send an email to
devel-leave(a)ovirt.org
>>>>>>>> >> > Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>>>> >> > oVirt Code of Conduct:
>>>>>>>> >> >
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>>> >> > List Archives:
>>>>>>>> >> >
>>>>>>>> >> >
https://lists.ovirt.org/archives/list/devel@ovirt.org/messag
>>>>>>>> e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>>>>>>>> >> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Barak Korren
>>>>>>>> > RHV DevOps team , RHCE, RHCi
>>>>>>>> > Red Hat EMEA
>>>>>>>> >
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>>>>>> _______________________________________________
>>>>>>>> Devel mailing list -- devel(a)ovirt.org
>>>>>>>> To unsubscribe send an email to devel-leave(a)ovirt.org
>>>>>>>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
>>>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/communit
>>>>>>>> y/about/community-guidelines/
>>>>>>>> List Archives:
https://lists.ovirt.org/archiv
>>>>>>>>
es/list/devel(a)ovirt.org/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Martin Perina
>>>> Associate Manager, Software Engineering
>>>> Red Hat Czech s.r.o.
>>>>
>>>
>>>
>>>
>>> --
>>> Barak Korren
>>> RHV DevOps team , RHCE, RHCi
>>> Red Hat EMEA
>>>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>>>
>>
>>
>>
>> --
>> Martin Perina
>> Associate Manager, Software Engineering
>> Red Hat Czech s.r.o.
>>
>
>
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>
--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.
| TRIED. TESTED. TRUSTED. |