On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
>
>
> On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <mperina@redhat.com> wrote:
>>
>>
>>
>> On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori <rnori@redhat.com>
>> wrote:
>>>
>>>
>>>
>>> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg <danken@redhat.com>
>>> wrote:
>>>>
>>>> Ravi's patch is in, but a similar problem remains, and the test cannot
>>>> be put back into its place.
>>>>
>>>> It seems that while Vdsm was taken down, a couple of getCapsAsync
>>>> requests queued up. At one point, the host resumed its connection,
>>>> before the requests have been cleared of the queue. After the host is
>>>> up, the following tests resume, and at a pseudorandom point in time,
>>>> an old getCapsAsync request times out and kills our connection.
>>>>
>>>> I believe that as long as ANY request is on flight, the monitoring
>>>> lock should not be released, and the host should not be declared as
>>>> up.

Would you relate to this analysis ^^^ ?


The HostMonitoring lock issue has been fixed by https://gerrit.ovirt.org/#/c/90189/
 
>>>>
>>>>
>>>
>>>
>>> Hi Dan,
>>>
>>> Can I have the link to the job on jenkins so I can look at the logs
>>
>>
>> http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
>>
>
>
> From the logs the only VDS lock that is being released twice is VDS_FENCE
> lock. Opened a BZ [1] for it. Will post a fix
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1571300

Can this possibly cause a surprise termination of host connection?

Not sure, from the logs VDS_FENCE is the only other VDS lock that is being released