On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori <rnori(a)redhat.com>
wrote:
On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
> On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori <rnori(a)redhat.com>
> wrote:
> >
> >
> > On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken(a)redhat.com>
> wrote:
> >>
> >> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori(a)redhat.com>
> >> wrote:
> >> >
> >> >
> >> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina
<mperina(a)redhat.com>
> >> > wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori <
> rnori(a)redhat.com>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg
<danken(a)redhat.com
> >
> >> >>> wrote:
> >> >>>>
> >> >>>> Ravi's patch is in, but a similar problem remains, and
the test
> >> >>>> cannot
> >> >>>> be put back into its place.
> >> >>>>
> >> >>>> It seems that while Vdsm was taken down, a couple of
getCapsAsync
> >> >>>> requests queued up. At one point, the host resumed its
connection,
> >> >>>> before the requests have been cleared of the queue. After
the
> host is
> >> >>>> up, the following tests resume, and at a pseudorandom point
in
> time,
> >> >>>> an old getCapsAsync request times out and kills our
connection.
> >> >>>>
> >> >>>> I believe that as long as ANY request is on flight, the
monitoring
> >> >>>> lock should not be released, and the host should not be
declared
> as
> >> >>>> up.
> >>
> >> Would you relate to this analysis ^^^ ?
> >>
> >
> > The HostMonitoring lock issue has been fixed by
> >
https://gerrit.ovirt.org/#/c/90189/
>
> Is there still a chance that a host moves to Up while former
> getCapsAsync request are still in-flight?
>
>
Should not happen. Is there a way to execute/reproduce the failing test on
Dev env?
> >
> >>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>> Hi Dan,
> >> >>>
> >> >>> Can I have the link to the job on jenkins so I can look at the
logs
> >> >>
> >> >>
> >> >>
> >> >>
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-che
> ck-patch/346/
> >> >>
> >> >
> >> >
> >> > From the logs the only VDS lock that is being released twice is
> >> > VDS_FENCE
> >> > lock. Opened a BZ [1] for it. Will post a fix
> >> >
> >> > [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1571300
> >>
> >> Can this possibly cause a surprise termination of host connection?
> >
> >
> > Not sure, from the logs VDS_FENCE is the only other VDS lock that is
> being
> > released
>
Would be helpful if I can get the exact flow that is failing and also the
steps if any needed to reproduce the issue