[ovirt-devel] [ OST Failure Report ] [ oVirt 4.2 ] [ 2018-04-04 ] [006_migrations.prepare_migration_attachments_ipv6]

Dan Kenigsberg danken at redhat.com
Wed Apr 25 06:06:39 UTC 2018


On Tue, Apr 24, 2018 at 10:27 PM, Ravi Shankar Nori <rnori at redhat.com> wrote:
>
>
> On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori <rnori at redhat.com>
> wrote:
>>
>>
>>
>> On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg <danken at redhat.com>
>> wrote:
>>>
>>> On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori <rnori at redhat.com>
>>> wrote:
>>> >
>>> >
>>> > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken at redhat.com>
>>> > wrote:
>>> >>
>>> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori at redhat.com>
>>> >> wrote:
>>> >> >
>>> >> >
>>> >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <mperina at redhat.com>
>>> >> > wrote:
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori
>>> >> >> <rnori at redhat.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg
>>> >> >>> <danken at redhat.com>
>>> >> >>> wrote:
>>> >> >>>>
>>> >> >>>> Ravi's patch is in, but a similar problem remains, and the test
>>> >> >>>> cannot
>>> >> >>>> be put back into its place.
>>> >> >>>>
>>> >> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync
>>> >> >>>> requests queued up. At one point, the host resumed its
>>> >> >>>> connection,
>>> >> >>>> before the requests have been cleared of the queue. After the
>>> >> >>>> host is
>>> >> >>>> up, the following tests resume, and at a pseudorandom point in
>>> >> >>>> time,
>>> >> >>>> an old getCapsAsync request times out and kills our connection.
>>> >> >>>>
>>> >> >>>> I believe that as long as ANY request is on flight, the
>>> >> >>>> monitoring
>>> >> >>>> lock should not be released, and the host should not be declared
>>> >> >>>> as
>>> >> >>>> up.
>>> >>
>>> >> Would you relate to this analysis ^^^ ?
>>> >>
>>> >
>>> > The HostMonitoring lock issue has been fixed by
>>> > https://gerrit.ovirt.org/#/c/90189/
>>>
>>> Is there still a chance that a host moves to Up while former
>>> getCapsAsync request are still in-flight?
>>>
>>
>> Should not happen. Is there a way to execute/reproduce the failing test on
>> Dev env?
>>
>>>
>>> >
>>> >>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>>
>>> >> >>> Hi Dan,
>>> >> >>>
>>> >> >>> Can I have the link to the job on jenkins so I can look at the
>>> >> >>> logs
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
>>> >> >>
>>> >> >
>>> >> >
>>> >> > From the logs the only VDS lock that is being released twice is
>>> >> > VDS_FENCE
>>> >> > lock. Opened a BZ [1] for it. Will post a fix
>>> >> >
>>> >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300
>>> >>
>>> >> Can this possibly cause a surprise termination of host connection?
>>> >
>>> >
>>> > Not sure, from the logs VDS_FENCE is the only other VDS lock that is
>>> > being
>>> > released
>>
>>
>
> Would be helpful if I can get the exact flow that is failing and also the
> steps if any needed to reproduce the issue

By now the logs of
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
have been garbage-collected, so I cannot point you to the location in
the logs. Maybe Alona has a local copy. According to her analysis the
issue manifest itself when setupNetworks follows vdsm restart.

Have you tried running OST with prepare_migration_attachments_ipv6
reintroduced? It should always pass.

Regards,
Dan.


More information about the Devel mailing list