
On Tue, May 29, 2018 at 3:12 PM, Dafna Ron <dron@redhat.com> wrote:
Martin, do you have any updates? please note that ovirt-engine has been broken for a few days so perhaps we should stop merging or revert the original change?
Still looking at it, here are partial results: 1. New host installation: never reproduced, 4.2 host is always installed fine on 4.2 engine 2. Upgrade - never reproduced, upgrade of both 4.1 engine and host to 4.2 was always successfull 3. Reinstallation - once it happened to me that during reinstallation the host remain stucked during Reinstallation and the whole reinstallation failed due to timeout - that may be the issue which can be seen in CI, but so far I don't have reliable reproducer to be able to debug why host-deploy process on the host is stucked
On Tue, May 29, 2018 at 1:26 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
+Martin
He is working on it.
Thanks, Piotr
On Tue, May 29, 2018 at 2:22 PM, Dafna Ron <dron@redhat.com> wrote:
Hi Piotr,
Any update on this?
Thanks. Dafna
On Mon, May 28, 2018 at 10:59 AM, Piotr Kliczewski < piotr.kliczewski@gmail.com> wrote:
On Mon, May 28, 2018 at 11:41 AM, Barak Korren <bkorren@redhat.com> wrote:
On 28 May 2018 at 12:38, Piotr Kliczewski <piotr.kliczewski@gmail.com
wrote:
On Mon, May 28, 2018 at 10:57 AM, Barak Korren <bkorren@redhat.com>
wrote:
> Note: we're now seeing a very similar issue in the 4.2 branch as well > that > seems to have been introduced by the following patch:
Can you point to specific job so we could take a look at the logs?
Whoops, sorry, here: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/2034/
Looks like the same issue:
2018-05-28 03:41:03,606-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] SSH error running command root@lago-upgrade-from-prevrelease-suite-4-2-host-0:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': TimeLimitExceededException: SSH session timeout host 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0' 2018-05-28 03:41:03,606-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [1244c90f] Error during deploy dialog 2018-05-28 03:41:03,611-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [1244c90f] Timeout during host lago-upgrade-from-prevrelease-suite-4-2-host-0 install: SSH session timeout host 'root@lago-upgrade-from-prevrelease-suite-4-2-host-0'
> > https://gerrit.ovirt.org/c/91638/2 - core: Enable only strong
ciphers
> for > 4.2 hosts > > On 28 May 2018 at 10:26, Barak Korren <bkorren@redhat.com> wrote: >> >> >> >> On 28 May 2018 at 10:19, Martin Perina <mperina@redhat.com> wrote: >>> >>> >>> >>> On Mon, May 28, 2018 at 9:00 AM, Piotr Kliczewski >>> <pkliczew@redhat.com> >>> wrote: >>>> >>>> Simone, >>>> >>>> What do you think about this failure? >>>> >>>> Thanks, >>>> Piotr >>>> >>>> On Mon, May 28, 2018 at 7:12 AM, Barak Korren < bkorren@redhat.com> >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 27 May 2018 at 14:59, Piotr Kliczewski <pkliczew@redhat.com
>>>>> wrote: >>>>>> >>>>>> Martin, >>>>>> >>>>>> I only can see: >>>>>> >>>>>> 2018-05-25 13:57:44,255-04 ERROR >>>>>> [org.ovirt.engine.core.uutils.ssh.SSHDialog] >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] SSH error >>>>>> running >>>>>> command root@lago-upgrade-from-release -suite-master-host-0:'umask >>>>>> 0077; >>>>>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; >>>>>> trap >>>>>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" >>>>>> > >>>>>> /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x && >>>>>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine >>>>>> DIALOG/customization=bool:True': TimeLimitExceededException: SSH >>>>>> session >>>>>> timeout host 'root@lago-upgrade-from-releas e-suite-master-host-0' >>>>>> 2018-05-25 13:57:44,259-04 ERROR >>>>>> [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] >>>>>> (EE-ManagedThreadFactory-engine-Thread-1) [55a7b15b] Timeout during >>>>>> host >>>>>> lago-upgrade-from-release-suite-master-host-0 install: SSH session >>>>>> timeout >>>>>> host 'root@lago-upgrade-from-release-suite-master-host-0' >>>>>> >>>>>> There are no additional logs. SSH to host timeout. Are we sure that >>>>>> it >>>>>> is an issue caused by Ravi's change? >>>>> >>>>> >>>>> We have some quite strong circumstantial evidence: >>>>> - Issue had affected all engine patches since that patch in a >>>>> similar >>>>> fashion. >>>>> - Prior engine patch [1] passed successfully [2] >>>>> - Other subsequent OST runs without engine patches passed >>>>> successfully >>>>> as well [3]. >>>>> >>>>> [1]: https://gerrit.ovirt.org/c/91595/2 >>>>> [2]: >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste r/7777/ >>>>> [3]: >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste r/7778/ >>>>> >>>>> >>>>> Please note - the issue is affecting a test that is run by an >>>>> upgrade >>>>> suit on the post-upgrade system. It has no affect on the basic suit. >>>>> So it >>>>> probably has to do with some behaviour that is specific to upgraded >>>>> systems. >>> >>> >>> I will try to reproduce later today in dev env, but I agree with >>> Piotr's >>> investigation, engine was not able to connect to the host using SSH >>> and >>> that's why no host-deploy logs were fetched. >> >> >> Lago fetches the logs from the host too (And it can take then from the >> VM >> image directly if the host is not responsive over SSH), can we get at >> the >> host-deploy logs that way? >> >> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Piotr >>>>>> >>>>>> On Sun, May 27, 2018 at 11:21 AM, Martin Perina >>>>>> <mperina@redhat.com> >>>>>> wrote: >>>>>>> >>>>>>> Adding also Piotr to the thread >>>>>>> >>>>>>> >>>>>>> On Sun, 27 May 2018, 08:46 Barak Korren, <bkorren@redhat.com
>>>>>>> wrote: >>>>>>>> >>>>>>>> Test failed: [ AddHost (in upgrade-from-release-suite) ] >>>>>>>> >>>>>>>> Link to suspected patches: >>>>>>>> https://gerrit.ovirt.org/#/c/91445/5 - Disable TLS versions < 1.2 >>>>>>>> for hosts with cluster level>=4.1 >>>>>>>> >>>>>>>> Link to Job: >>>>>>>> >>>>>>>> http://jenkins.ovirt.org/job/o virt-master_change-queue-tester/7776/ >>>>>>>> >>>>>>>> Link to all logs: >>>>>>>> >>>>>>>> >>>>>>>> http://jenkins.ovirt.org/job/o virt-master_change-queue-tester/7776/artifact/exported-artif acts/upgrade-from-release-suit-master-el7/test_logs/upgrade- from-release-suite-master/post-002_bootstrap.py/ >>>>>>>> >>>>>>>> Error snippet from log: >>>>>>>> >>>>>>>> From nosetst log: >>>>>>>> <error> >>>>>>>> >>>>>>>> AssertionError: False != True after 1200 seconds >>>>>>>> >>>>>>>> </error> >>>>>>>> >>>>>>>> Not finding a host deploy log in /var/log/ovirt-engine for some >>>>>>>> reason. >>>>>>>> This seems to have cause consistent failure in all other engine >>>>>>>> patches that followed it. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Barak Korren >>>>>>>> RHV DevOps team , RHCE, RHCi >>>>>>>> Red Hat EMEA >>>>>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Barak Korren >>>>> RHV DevOps team , RHCE, RHCi >>>>> Red Hat EMEA >>>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>>> >>>> >>> >>> >>> >>> -- >>> Martin Perina >>> Associate Manager, Software Engineering >>> Red Hat Czech s.r.o. >> >> >> >> >> -- >> Barak Korren >> RHV DevOps team , RHCE, RHCi >> Red Hat EMEA >> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > > > > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > > _______________________________________________ > Devel mailing list -- devel@ovirt.org > To unsubscribe send an email to devel-leave@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > > https://lists.ovirt.org/archives/list/devel@ovirt.org/messag e/QIZ5L4FKII7X5FHQ4OXBBR2SLUIK5C74/ >
-- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/communit y/about/community-guidelines/ List Archives: https://lists.ovirt.org/archiv es/list/devel@ovirt.org/message/RDK42TYJKMX3M2DNUFKZO7CGNNOYWMJI/
-- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o.