On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori
<rnori(a)redhat.com>
wrote:
>
>
> On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
>>
>> On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori(a)redhat.com>
>> wrote:
>> >
>> >
>> > On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <mperina(a)redhat.com>
>> > wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori
<rnori(a)redhat.com
>
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg
<danken(a)redhat.com>
>> >>> wrote:
>> >>>>
>> >>>> Ravi's patch is in, but a similar problem remains, and the
test
>> >>>> cannot
>> >>>> be put back into its place.
>> >>>>
>> >>>> It seems that while Vdsm was taken down, a couple of
getCapsAsync
>> >>>> requests queued up. At one point, the host resumed its
connection,
>> >>>> before the requests have been cleared of the queue. After the
host
is
>> >>>> up, the following tests resume, and at a pseudorandom point in
time,
>> >>>> an old getCapsAsync request times out and kills our
connection.
>> >>>>
>> >>>> I believe that as long as ANY request is on flight, the
monitoring
>> >>>> lock should not be released, and the host should not be
declared as
>> >>>> up.
>>
>> Would you relate to this analysis ^^^ ?
>>
>
> The HostMonitoring lock issue has been fixed by
>
https://gerrit.ovirt.org/#/c/90189/
Is there still a chance that a host moves to Up while former
getCapsAsync request are still in-flight?
Should not happen. Is there a way to execute/reproduce the failing test on
Dev env?
>
>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> Hi Dan,
>> >>>
>> >>> Can I have the link to the job on jenkins so I can look at the
logs
>> >>
>> >>
>> >>
>> >>
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-
check-patch/346/
>> >>
>> >
>> >
>> > From the logs the only VDS lock that is being released twice is
>> > VDS_FENCE
>> > lock. Opened a BZ [1] for it. Will post a fix
>> >
>> > [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1571300
>>
>> Can this possibly cause a surprise termination of host connection?
>
>
> Not sure, from the logs VDS_FENCE is the only other VDS lock that is
being
> released