On Wed, May 30, 2018 at 10:44 AM, Barak Korren <bkorren@redhat.com> wrote:


On 30 May 2018 at 10:36, Martin Perina <mperina@redhat.com> wrote:


On Wed, May 30, 2018 at 9:31 AM, Barak Korren <bkorren@redhat.com> wrote:


On 30 May 2018 at 10:24, Martin Perina <mperina@redhat.com> wrote:


On Wed, May 30, 2018 at 8:13 AM, Barak Korren <bkorren@redhat.com> wrote:


On 29 May 2018 at 22:29, Martin Perina <mperina@redhat.com> wrote:
Master revert patches [1], [2] merged, 4.2 revert patches [3], [4] waiting to be merged.

We will repost patches to master tomorrow and will continue to investigate mysterious host-deploy issue.

Btw, upgrade-from-prev-release on master [5] currently fails with:

18:59:31 + cp 'ovirt-system-tests/upgrade-from-prevrelease-suite-master/*.repo' exported-artifacts
18:59:31 cp: cannot stat 'ovirt-system-tests/upgrade-from-prevrelease-suite-master/*.repo': No such file or directory
18:59:31 POST BUILD TASK : FAILURE

So how can we test upgrade from 4.2 to master?

This is not the real issue, the real issue is

00:00:19.190 /tmp/jenkins6944523151752956846.sh: line 4: ovirt-system-tests/upgrade-from-prevrelease-suite-master/extra_sources: No such file or directory


This is happening because there is no 'upgrade-from-prevrelease-suite-master', the suite to be used is 'upgrade-from-release-suite-master'.

​Yes, but looking at [6]​ we are testing upgrade from 4.1 to master, is that true? If so, how this can work? We are supporting upgrade only between directly following versions, so it should not be possible to upgrade from 4.1 to master directly ...


Well, I wonder where is the patch to change that, should have been created when 4.2 went GA...
 

So is this table in [7] valid?

​​
Target oVirt version which will be tested.
ENGINE_VERSION prev release release
master 4.2 master
--- 4.1 4.2
4.1 --- 4.1



It looks messed up.... I uess we'll need to 'git blame'...


I don't think the table from the manual job is the best source of truth to check which upgrade flows are running, this is a static table that might not be updated fully
(we should probably update it ), but I suggest to check the OST git repo to make sure which suites are available:

Here is what we have now: (from OST repo):

upgrade-from-prevrelease-suite-4.2   -> upgrade from 4.1 -> 4.2
upgrade-from-release-suite-master    -> upgrade from 4.1- > master


When master was moved to 4.2, we didn't made sure to add the missing flows of:

4.2 stable -> 4.2 latest
4.2 stable -> master 

These are not added automatically and they need a maintainer to actively work and add them, and also decide if and what should be added to CQ for verification.
Like any other suite we have, it needs a maintainer from the relevant team to keep adding tests for it or add new flows when needed.

For now Daniel sent a patch [1] to add one of the missing flows, I think Asaf from Sandro's team also started to work on one of the flows, but I'm not sure on the details.

[1] https://gerrit.ovirt.org/#/c/91783/




 

​Right, IMO table should look like:
Target oVirt version which will be tested.
ENGINE_VERSION prev release release
master 4.2 master
4.2
4.1 4.2
4.1 --- 4.1


Yeah, but we need OST to reflect that first....

Any any case the 4.2 'from prev release' suit seems to be doing the right thing - so we still need tom figure out how and why the issue discussed in this thread is affecting it.


 
​And maybe even completely remove last line enabling 4.1 upgrade from 4.1​ as we are not going to release any 4.1 version ...


Yeah all the 4.1 suits were dropped from OST already.
 

 





On Tue, May 29, 2018 at 3:42 PM, Barak Korren <bkorren@redhat.com> wrote:


On 29 May 2018 at 16:30, Martin Perina <mperina@redhat.com> wrote:


On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <dron@redhat.com> wrote:
Martin, do you have any updates? please note that ovirt-engine has been broken for a few days so perhaps we should stop merging or revert the original change?

​Still looking at it, here are partial results:

1. New host installation: never reproduced, 4.2 host is always installed fine on 4.2 engine
2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2 was always successfull
3. Reinstallation - once it happened to me that during reinstallation the host remain stucked during Reinstallation and the whole​ reinstallation failed due to timeout
    - that may be the issue which can be seen in CI, but so far I don't have reliable reproducer to be able to debug why host-deploy process on the host is stucked

Did you try using OST locally? it reproduces consistently with the OST upgrade suit. You can also use the manual job and pass a URL to any engine build beyond the marked patch. But there you'll have the same issue as with the CQ job where you won't have logs...

Note, the process that happens there is AFAIK:
1. The oVirt 4.1 release is installed.
2. engine-setup runs
3. repos are changed to the master repo
4. engine is upgraded
5. bootstrap (including AddHost that fails is carried out)
 



On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
+Martin

He is working on it.

Thanks,
Piotr

On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <dron@redhat.com> wrote:
Hi Piotr,

Any update on this?

Thanks.
Dafna


On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski <piotr.kliczewski@gmail.com> wrote:
On Mon, May 28, 2018 at 11:41 AM, Barak Korren <bkorren@redhat.com> wrote:
>
>
> On 28 May 2018 at 12:38, Piotr Kliczewski <piotr.kliczewski@gmail.com>
> wrote:
>>
>> On Mon, May 28, 2018 at 10:57 AM, Barak Korren <bkorren@redhat.com> wrote:
>> > Note: we're now seeing a very similar issue in the 4.2 branch as well
>> > that
>> > seems to have been introduced by the following patch:
>>
>> Can you point to specific job so we could take a look at the logs?
>
>
> Whoops, sorry, here:
> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
>

Looks like the same issue:

2018-05-28 03:41:03,606-04 ERROR
[org.ovirt.engine.core.uutils.ssh.SSHDialog]
(EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error running
command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask
0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null
2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar
--warning=no-timestamp -C "${MYTMP}" -x &&
"${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
DIALOG/customization=bool:True': TimeLimitExceededException: SSH
session timeout host
'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
2018-05-28 03:41:03,606-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy)
[1244c90f] Error during deploy dialog
2018-05-28 03:41:03,611-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
(EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during
host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH
session timeout host
'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'

>>
>>
>> >
>> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong ciphers
>> > for
>> > 4.2 hosts
>> >
>> > On 28 May 2018 at 10:26, Barak Korren <bkorren@redhat.com> wrote:
>> >>
>> >>
>> >>
>> >> On 28 May 2018 at 10:19, Martin Perina <mperina@redhat.com> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski
>> >>> <pkliczew@redhat.com>
>> >>> wrote:
>> >>>>
>> >>>> Simone,
>> >>>>
>> >>>> What do you think about this failure?
>> >>>>
>> >>>> Thanks,
>> >>>> Piotr
>> >>>>
>> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren <bkorren@redhat.com>
>> >>>> wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski <pkliczew@redhat.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Martin,
>> >>>>>>
>> >>>>>> I only can see:
>> >>>>>>
>> >>>>>> 2018-05-25 13:57:44,255-04 ERROR
>> >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH error
>> >>>>>> running
>> >>>>>> command root@lago-upgrade-from-release-suite-master-host-0:'umask
>> >>>>>> 0077;
>> >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)";
>> >>>>>> trap
>> >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\"
>> >>>>>> >
>> >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&
>> >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>> >>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH
>> >>>>>> session
>> >>>>>> timeout host 'root@lago-upgrade-from-release-suite-master-host-0'
>> >>>>>> 2018-05-25 13:57:44,259-04 ERROR
>> >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>> >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] Timeout during
>> >>>>>> host
>> >>>>>> lago-upgrade-from-release-suite-master-host-0 install: SSH session
>> >>>>>> timeout
>> >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0'
>> >>>>>>
>> >>>>>> There are no additional logs. SSH to host timeout. Are we sure that
>> >>>>>> it
>> >>>>>> is an issue caused by Ravi's change?
>> >>>>>
>> >>>>>
>> >>>>> We have some quite strong circumstantial evidence:
>> >>>>> - Issue had affected all engine patches since that patch in a
>> >>>>> similar
>> >>>>> fashion.
>> >>>>> - Prior engine patch [1] passed successfully [2]
>> >>>>> - Other subsequent OST runs without engine patches passed
>> >>>>> successfully
>> >>>>> as well [3].
>> >>>>>
>> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2
>> >>>>> [2]:
>> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7777/
>> >>>>> [3]:
>> >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7778/
>> >>>>>
>> >>>>>
>> >>>>> Please note - the issue is affecting a test that is run by an
>> >>>>> upgrade
>> >>>>> suit on the post-upgrade system. It has no affect on the basic suit.
>> >>>>> So it
>> >>>>> probably has to do with some behaviour that is specific to upgraded
>> >>>>> systems.
>> >>>
>> >>>
>> >>> I will try to reproduce later today in dev env, but I agree with
>> >>> Piotr's
>> >>> investigation, engine was not able to connect to the host using SSH
>> >>> and
>> >>> that's why no host-deploy logs were fetched.
>> >>
>> >>
>> >> Lago fetches the logs from the host too (And it can take then from the
>> >> VM
>> >> image directly if the host is not responsive over SSH), can we get at
>> >> the
>> >> host-deploy logs that way?
>> >>
>> >>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Piotr
>> >>>>>>
>> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina
>> >>>>>> <mperina@redhat.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Adding also Piotr to the thread
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, <bkorren@redhat.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ]
>> >>>>>>>>
>> >>>>>>>> Link to suspected patches:
>> >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS versions < 1.2
>> >>>>>>>> for hosts with cluster level>=4.1
>> >>>>>>>>
>> >>>>>>>> Link to Job:
>> >>>>>>>>
>> >>>>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7776/
>> >>>>>>>>
>> >>>>>>>> Link to all logs:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7776/artifact/exported-artifacts/upgrade-from-release-suit-master-el7/test_logs/upgrade-from-release-suite-master/post-002_bootstrap.py/
>> >>>>>>>>
>> >>>>>>>> Error snippet from log:
>> >>>>>>>>
>> >>>>>>>> From nosetst log:
>> >>>>>>>> <error>
>> >>>>>>>>
>> >>>>>>>> AssertionError: False != True after 1200 seconds
>> >>>>>>>>
>> >>>>>>>> </error>
>> >>>>>>>>
>> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine for some
>> >>>>>>>> reason.
>> >>>>>>>> This seems to have cause consistent failure in all other engine
>> >>>>>>>> patches that followed it.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Barak Korren
>> >>>>>>>> RHV DevOps team , RHCE, RHCi
>> >>>>>>>> Red Hat EMEA
>> >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Barak Korren
>> >>>>> RHV DevOps team , RHCE, RHCi
>> >>>>> Red Hat EMEA
>> >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Martin Perina
>> >>> Associate Manager, Software Engineering
>> >>> Red Hat Czech s.r.o.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Barak Korren
>> >> RHV DevOps team , RHCE, RHCi
>> >> Red Hat EMEA
>> >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >
>> >
>> >
>> >
>> > --
>> > Barak Korren
>> > RHV DevOps team , RHCE, RHCi
>> > Red Hat EMEA
>> > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>> >
>> > _______________________________________________
>> > Devel mailing list -- devel@ovirt.org
>> > To unsubscribe send an email to devel-leave@ovirt.org
>> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> > oVirt Code of Conduct:
>> > https://www.ovirt.org/community/about/community-guidelines/
>> > List Archives:
>> >
>> > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/
>> >
>
>
>
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/






--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.



--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted



--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.



--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted



--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.



--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted



--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.



--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted

_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/YBNK6RXDHHFPMPBEYSC2UP2C5AREXWHK/




--

Eyal edri


MANAGER

RHV DevOps

EMEA VIRTUALIZATION R&D


Red Hat EMEA

TRIED. TESTED. TRUSTED.
phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)