On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori <rnori(a)redhat.com> wrote:
On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
>
> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori(a)redhat.com>
> wrote:
> >
> >
> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <mperina(a)redhat.com>
> > wrote:
> >>
> >>
> >>
> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori <rnori(a)redhat.com>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg
<danken(a)redhat.com>
> >>> wrote:
> >>>>
> >>>> Ravi's patch is in, but a similar problem remains, and the test
> >>>> cannot
> >>>> be put back into its place.
> >>>>
> >>>> It seems that while Vdsm was taken down, a couple of getCapsAsync
> >>>> requests queued up. At one point, the host resumed its connection,
> >>>> before the requests have been cleared of the queue. After the host
is
> >>>> up, the following tests resume, and at a pseudorandom point in
time,
> >>>> an old getCapsAsync request times out and kills our connection.
> >>>>
> >>>> I believe that as long as ANY request is on flight, the monitoring
> >>>> lock should not be released, and the host should not be declared as
> >>>> up.
>
> Would you relate to this analysis ^^^ ?
>
The HostMonitoring lock issue has been fixed by
https://gerrit.ovirt.org/#/c/90189/
Is there still a chance that a host moves to Up while former
getCapsAsync request are still in-flight?
>
> >>>>
> >>>>
> >>>
> >>>
> >>> Hi Dan,
> >>>
> >>> Can I have the link to the job on jenkins so I can look at the logs
> >>
> >>
> >>
> >>
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
> >>
> >
> >
> > From the logs the only VDS lock that is being released twice is
> > VDS_FENCE
> > lock. Opened a BZ [1] for it. Will post a fix
> >
> > [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1571300
>
> Can this possibly cause a surprise termination of host connection?
Not sure, from the logs VDS_FENCE is the only other VDS lock that is being
released