Ravi/Piotr, so what's the connection between non-blocking threads, jsonrpc-java connection closing and failing this network test? Does it mean that non-blocking threads change just revealed the jsonrpc-java issue which we haven't noticed before?
And did the test really works with code prior to non-blocking threads changes and we are missing something else? 


On Wed, 25 Apr 2018, 18:21 Ravi Shankar Nori, <rnori@redhat.com> wrote:


On Wed, Apr 25, 2018 at 10:57 AM, Martin Perina <mperina@redhat.com> wrote:


On Tue, Apr 24, 2018 at 3:28 PM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Apr 24, 2018 at 4:17 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
>
>
> On Tue, Apr 24, 2018 at 7:00 AM, Dan Kenigsberg <danken@redhat.com> wrote:
>>
>> Ravi's patch is in, but a similar problem remains, and the test cannot
>> be put back into its place.
>>
>> It seems that while Vdsm was taken down, a couple of getCapsAsync
>> requests queued up. At one point, the host resumed its connection,
>> before the requests have been cleared of the queue. After the host is
>> up, the following tests resume, and at a pseudorandom point in time,
>> an old getCapsAsync request times out and kills our connection.
>>
>> I believe that as long as ANY request is on flight, the monitoring
>> lock should not be released, and the host should not be declared as
>> up.
>>
>>
>
>
> Hi Dan,
>
> Can I have the link to the job on jenkins so I can look at the logs

We disabled a network test that started failing after getCapsAsync was merged.
Please own its re-introduction to OST: https://gerrit.ovirt.org/#/c/90264/

Its most recent failure
http://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/346/
has been discussed by Alona and Piotr over IRC.

​So https://bugzilla.redhat.com/1571768 was created to cover this issue​ discovered during Alona's and Piotr's conversation. But after further discussion we have found out that this issue is not related to non-blocking thread changes in engine 4.2 and this behavior exists from beginning of vdsm-jsonrpc-java. Ravi will continue verify the fix for BZ1571768 along with other locking changes he already posted to see if they will help network OST to succeed.

But the fix for BZ1571768 is too dangerous for 4.2.3, let's try to fix that on master and let's see if it doesn't introduce any regressions. If not, then we can backport to 4.2.4.



--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.

Posted a vdsm-jsonrpc-java patch [1] for BZ 1571768 [2] which fixes the OST issue with enabling 006_migrations.prepare_migration_attachments_ipv6.