
On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <mperina@redhat.com> wrote:
On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg <danken@redhat.com> wrote:
Ravi's patch is in, but a similar problem remains, and the test cannot be put back into its place.
It seems that while Vdsm was taken down, a couple of getCapsAsync requests queued up. At one point, the host resumed its connection, before the requests have been cleared of the queue. After the host is up, the following tests resume, and at a pseudorandom point in time, an old getCapsAsync request times out and kills our connection.
I believe that as long as ANY request is on flight, the monitoring lock should not be released, and the host should not be declared as up.
Would you relate to this analysis ^^^ ?
Hi Dan,
Can I have the link to the job on jenkins so I can look at the logs
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
From the logs the only VDS lock that is being released twice is VDS_FENCE lock. Opened a BZ [1] for it. Will post a fix
Can this possibly cause a surprise termination of host connection?