
On Tue, Apr 24, 2018 at 10:27 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
On Tue, Apr 24, 2018 at 10:46 AM, Ravi Shankar Nori <rnori@redhat.com> wrote:
On Tue, Apr 24, 2018 at 10:29 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Apr 24, 2018 at 5:09 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
On Tue, Apr 24, 2018 at 9:47 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Apr 24, 2018 at 4:36 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
On Tue, Apr 24, 2018 at 9:24 AM, Martin Perina <mperina@redhat.com> wrote: > > > > On Tue, Apr 24, 2018 at 3:17 PM, Ravi Shankar Nori > <rnori@redhat.com> > wrote: >> >> >> >> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg >> <danken@redhat.com> >> wrote: >>> >>> Ravi's patch is in, but a similar problem remains, and the test >>> cannot >>> be put back into its place. >>> >>> It seems that while Vdsm was taken down, a couple of getCapsAsync >>> requests queued up. At one point, the host resumed its >>> connection, >>> before the requests have been cleared of the queue. After the >>> host is >>> up, the following tests resume, and at a pseudorandom point in >>> time, >>> an old getCapsAsync request times out and kills our connection. >>> >>> I believe that as long as ANY request is on flight, the >>> monitoring >>> lock should not be released, and the host should not be declared >>> as >>> up.
Would you relate to this analysis ^^^ ?
The HostMonitoring lock issue has been fixed by https://gerrit.ovirt.org/#/c/90189/
Is there still a chance that a host moves to Up while former getCapsAsync request are still in-flight?
Should not happen. Is there a way to execute/reproduce the failing test on Dev env?
>>> >>> >> >> >> Hi Dan, >> >> Can I have the link to the job on jenkins so I can look at the >> logs > > > > > http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ >
From the logs the only VDS lock that is being released twice is VDS_FENCE lock. Opened a BZ [1] for it. Will post a fix
Can this possibly cause a surprise termination of host connection?
Not sure, from the logs VDS_FENCE is the only other VDS lock that is being released
Would be helpful if I can get the exact flow that is failing and also the steps if any needed to reproduce the issue
By now the logs of http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/ have been garbage-collected, so I cannot point you to the location in the logs. Maybe Alona has a local copy. According to her analysis the issue manifest itself when setupNetworks follows vdsm restart. Have you tried running OST with prepare_migration_attachments_ipv6 reintroduced? It should always pass. Regards, Dan.