On Tue, Apr 24, 2018 at 10:27 PM, Ravi Shankar Nori <rnori(a)redhat.com> wrote:
On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori <rnori(a)redhat.com>
wrote:
>
>
>
> On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg <danken(a)redhat.com>
> wrote:
>>
>> On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori <rnori(a)redhat.com>
>> wrote:
>> >
>> >
>> > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken(a)redhat.com>
>> > wrote:
>> >>
>> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori
<rnori(a)redhat.com>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina
<mperina(a)redhat.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori
>> >> >> <rnori(a)redhat.com>
>> >> >> wrote:
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg
>> >> >>> <danken(a)redhat.com>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Ravi's patch is in, but a similar problem remains,
and the test
>> >> >>>> cannot
>> >> >>>> be put back into its place.
>> >> >>>>
>> >> >>>> It seems that while Vdsm was taken down, a couple of
getCapsAsync
>> >> >>>> requests queued up. At one point, the host resumed its
>> >> >>>> connection,
>> >> >>>> before the requests have been cleared of the queue.
After the
>> >> >>>> host is
>> >> >>>> up, the following tests resume, and at a pseudorandom
point in
>> >> >>>> time,
>> >> >>>> an old getCapsAsync request times out and kills our
connection.
>> >> >>>>
>> >> >>>> I believe that as long as ANY request is on flight,
the
>> >> >>>> monitoring
>> >> >>>> lock should not be released, and the host should not be
declared
>> >> >>>> as
>> >> >>>> up.
>> >>
>> >> Would you relate to this analysis ^^^ ?
>> >>
>> >
>> > The HostMonitoring lock issue has been fixed by
>> >
https://gerrit.ovirt.org/#/c/90189/
>>
>> Is there still a chance that a host moves to Up while former
>> getCapsAsync request are still in-flight?
>>
>
> Should not happen. Is there a way to execute/reproduce the failing test on
> Dev env?
>
>>
>> >
>> >>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>> Hi Dan,
>> >> >>>
>> >> >>> Can I have the link to the job on jenkins so I can look at
the
>> >> >>> logs
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
>> >> >>
>> >> >
>> >> >
>> >> > From the logs the only VDS lock that is being released twice is
>> >> > VDS_FENCE
>> >> > lock. Opened a BZ [1] for it. Will post a fix
>> >> >
>> >> > [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1571300
>> >>
>> >> Can this possibly cause a surprise termination of host connection?
>> >
>> >
>> > Not sure, from the logs VDS_FENCE is the only other VDS lock that is
>> > being
>> > released
>
>
Would be helpful if I can get the exact flow that is failing and also the
steps if any needed to reproduce the issue
have been garbage-collected, so I cannot point you to the location in
the logs. Maybe Alona has a local copy. According to her analysis the
issue manifest itself when setupNetworks follows vdsm restart.
Have you tried running OST with prepare_migration_attachments_ipv6
reintroduced? It should always pass.
Regards,
Dan.