<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 12 Apr 2018, at 17:15, Dafna Ron &lt;<a href="mailto:dron@redhat.com" class="">dron@redhat.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">hi<span style="font-weight:bold" class="">,<br class=""><br class=""></span></div><div class="">we are failing randomly on test 006_migrations.migrate_vm with what seems to be the same issue. <br class=""></div><div class=""><br class=""></div><div class="">the vm seems to be migrated successfully but engine thinks that it failed and re-calls migration getting a response of vm already exists. <br class=""><br class=""></div><div class="">I don't think this is an issue with the test but rather a regression so I opened a bug: <br class=""></div></div></div></blockquote><div><br class=""></div>i do not think so, I’ve heard someone removed a test in between migrating A-&gt;B and B-&gt;A?</div><div>If that’s the case that is the real issue.&nbsp;</div><div>You can’t migrate back to A without waiting for A to be cleared out properly</div><div><a href="https://gerrit.ovirt.org/#/c/90166/" class="">https://gerrit.ovirt.org/#/c/90166/</a>&nbsp;should fix it</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1566594" class="">https://bugzilla.redhat.com/show_bug.cgi?id=1566594</a><br class=""><br class=""></div><div class="">Thanks, <br class=""></div><div class="">Dafna<br class=""><span style="font-weight:bold" class=""><span style="font-weight:bold" class=""><br class=""></span></span></div><span style="font-weight:bold" class=""></span></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Wed, Apr 11, 2018 at 1:52 PM, Milan Zamazal <span dir="ltr" class="">&lt;<a href="mailto:mzamazal@redhat.com" target="_blank" class="">mzamazal@redhat.com</a>&gt;</span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Arik Hadas &lt;<a href="mailto:ahadas@redhat.com" class="">ahadas@redhat.com</a>&gt; writes:<br class="">

<br class="">

&gt; On Wed, Apr 11, 2018 at 12:45 PM, Alona Kaplan &lt;<a href="mailto:alkaplan@redhat.com" class="">alkaplan@redhat.com</a>&gt; wrote:<br class="">

&gt;<br class="">

&gt;&gt;<br class="">

&gt;&gt;<br class="">

&gt;&gt; On Tue, Apr 10, 2018 at 6:52 PM, Gal Ben Haim &lt;<a href="mailto:gbenhaim@redhat.com" class="">gbenhaim@redhat.com</a>&gt; wrote:<br class="">

&gt;&gt;<br class="">

&gt;&gt;&gt; I'm seeing the same error in [1], during 006_migrations.migrate_vm.<br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;&gt; [1] <a href="http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1650/" rel="noreferrer" target="_blank" class="">http://jenkins.ovirt.org/job/<wbr class="">ovirt-4.2_change-queue-tester/<wbr class="">1650/</a><br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;<br class="">

&gt;&gt; Seems like another bug. The migration failed since for some reason the vm<br class="">

&gt;&gt; is already defined on the destination host.<br class="">

&gt;&gt;<br class="">

&gt;&gt; 2018-04-10 11:08:08,685-0400 ERROR (jsonrpc/0) [api] FINISH create<br class="">

&gt;&gt; error=Virtual machine already exists (api:129)<br class="">

&gt;&gt; Traceback (most recent call last):<br class="">

&gt;&gt; File "/usr/lib/python2.7/site-<wbr class="">packages/vdsm/common/api.py", line 122, in<br class="">

&gt;&gt; method<br class="">

&gt;&gt; ret = func(*args, **kwargs)<br class="">

&gt;&gt; File "/usr/lib/python2.7/site-<wbr class="">packages/vdsm/API.py", line 191, in create<br class="">

&gt;&gt; raise exception.VMExists()<br class="">

&gt;&gt; VMExists: Virtual machine already exists<br class="">

&gt;&gt;<br class="">

&gt;&gt;<br class="">

&gt; Milan, Francesco, could it be that because of [1] that appears on the<br class="">

&gt; destination host right after shutting down the VM, it remained defined on<br class="">

&gt; that host?<br class="">

<br class="">

</span>I can't see any destroy call in the logs after the successful preceding<br class="">

migration from the given host.&nbsp; That would explain “VMExists” error.<br class="">

<span class=""><br class="">

&gt; [1] 2018-04-10 11:01:40,005-0400 ERROR (libvirt/events) [vds] Error running<br class="">

&gt; VM callback (clientIF:683)<br class="">

&gt;<br class="">

&gt; Traceback (most recent call last):<br class="">

&gt;<br class="">

&gt;&nbsp; &nbsp;File "/usr/lib/python2.7/site-<wbr class="">packages/vdsm/clientIF.py", line 646, in<br class="">

&gt; dispatchLibvirtEvents<br class="">

&gt;<br class="">

&gt;&nbsp; &nbsp; &nbsp;v.onLibvirtLifecycleEvent(<wbr class="">event, detail, None)<br class="">

&gt;<br class="">

&gt; AttributeError: 'NoneType' object has no attribute 'onLibvirtLifecycleEvent'<br class="">

<br class="">

</span>That means that a life cycle event on an unknown VM has arrived, in this<br class="">

case apparently destroy event, following the destroy call after the<br class="">

failed incoming migration.&nbsp; The reported AttributeError is a minor bug,<br class="">

already fixed in master.&nbsp; So it's most likely unrelated to the discussed<br class="">

problem.<br class="">

<span class=""><br class="">

&gt;&gt;&gt; On Tue, Apr 10, 2018 at 4:14 PM, Alona Kaplan &lt;<a href="mailto:alkaplan@redhat.com" class="">alkaplan@redhat.com</a>&gt;<br class="">

&gt;&gt;&gt; wrote:<br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; Hi all,<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; Looking at the log it seems that the new GetCapabilitiesAsync is<br class="">

&gt;&gt;&gt;&gt; responsible for the mess.<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; -<br class="">

</span>&gt;&gt;&gt;&gt; * 08:29:47 - engine loses connectivity to host 'lago-basic-suite-4-2-host-0'.<wbr class="">*<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; *- Every 3 seconds a getCapabalititiesAsync request is sent to the host (unsuccessfully).*<br class="">

<span class="">&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; * before each "getCapabilitiesAsync" the monitoring lock is taken (VdsManager,refreshImpl)<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; * "getCapabilitiesAsync" immediately fails and throws 'VDSNetworkException: java.net.ConnectException: Connection refused'. The exception is caught by '<wbr class="">GetCapabilitiesAsyncVDSCommand<wbr class="">.executeVdsBrokerCommand' which calls 'onFailure' of the callback and re-throws the exception.<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; catch (Throwable t) {<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;getParameters().getCallback().<wbr class="">onFailure(t);<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;throw t;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp;* The 'onFailure' of the callback releases the "monitoringLock" ('postProcessRefresh()-&gt;<wbr class="">afterRefreshTreatment()-&gt; if (!succeeded) lockManager.releaseLock(<wbr class="">monitoringLock);')<br class="">

&gt;&gt;&gt;&gt;<br class="">

</span>&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp;* 'VdsManager,refreshImpl' catches the network exception, marks 'releaseLock = true' and *tries to release the already released lock*.<br class="">

<span class="">&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;The following warning is printed to the log -<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;WARN&nbsp; [org.ovirt.engine.core.bll.<wbr class="">lock.InMemoryLockManager] (EE-ManagedThreadFactory-<wbr class="">engineScheduled-Thread-53) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-<wbr class="">c4aa4e19bd93VDS_INIT'<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

</span>&gt;&gt;&gt;&gt; *- 08:30:51 a successful getCapabilitiesAsync is sent.*<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; *- 08:32:55 - The failing test starts (Setup Networks for setting ipv6).&nbsp; &nbsp; *<br class="">

<span class="">&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp;* SetupNetworks takes the monitoring lock.<br class="">

&gt;&gt;&gt;&gt;<br class="">

</span>&gt;&gt;&gt;&gt; *- 08:33:00 - ResponseTracker cleans the getCapabilitiesAsync requests from 4 minutes ago from its queue and prints a VDSNetworkException: Vds timeout occured.*<br class="">

<span class="">&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;* When the first request is removed from the queue ('ResponseTracker.remove()'), the<br class="">

</span>&gt;&gt;&gt;&gt; *'Callback.onFailure' is invoked (for the second time) -&gt; monitoring lock is released (the lock taken by the SetupNetworks!).*<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;* *The other requests removed from the queue also try to release the monitoring lock*, but there is nothing to release.<br class="">

<span class="">&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;* The following warning log is printed -<br class="">

&gt;&gt;&gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;WARN&nbsp; [org.ovirt.engine.core.bll.<wbr class="">lock.InMemoryLockManager] (EE-ManagedThreadFactory-<wbr class="">engineScheduled-Thread-14) [] Trying to release exclusive lock which does not exist, lock key: 'ecf53d69-eb68-4b11-8df2-<wbr class="">c4aa4e19bd93VDS_INIT'<br class="">

&gt;&gt;&gt;&gt;<br class="">

</span>&gt;&gt;&gt;&gt; - *08:33:00 - SetupNetwork fails on Timeout ~4 seconds after is started*. Why? I'm not 100% sure but I guess the late processing of the 'getCapabilitiesAsync' that causes losing of the monitoring lock and the late + mupltiple processing of failure is root cause.<br class="">

<div class=""><div class="h5">&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; Ravi, 'getCapabilitiesAsync' failure is treated twice and the lock is trying to be released three times. Please share your opinion regarding how it should be fixed.<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; Thanks,<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; Alona.<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; On Sun, Apr 8, 2018 at 1:21 PM, Dan Kenigsberg &lt;<a href="mailto:danken@redhat.com" class="">danken@redhat.com</a>&gt;<br class="">

&gt;&gt;&gt;&gt; wrote:<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt; On Sun, Apr 8, 2018 at 9:21 AM, Edward Haas &lt;<a href="mailto:ehaas@redhat.com" class="">ehaas@redhat.com</a>&gt; wrote:<br class="">

&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt; On Sun, Apr 8, 2018 at 9:15 AM, Eyal Edri &lt;<a href="mailto:eedri@redhat.com" class="">eedri@redhat.com</a>&gt; wrote:<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Was already done by Yaniv - <a href="https://gerrit.ovirt.org/#/c/89851" rel="noreferrer" target="_blank" class="">https://gerrit.ovirt.org/#/c/<wbr class="">89851</a>.<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Is it still failing?<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Sun, Apr 8, 2018 at 8:59 AM, Barak Korren &lt;<a href="mailto:bkorren@redhat.com" class="">bkorren@redhat.com</a>&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt; wrote:<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 7 April 2018 at 00:30, Dan Kenigsberg &lt;<a href="mailto:danken@redhat.com" class="">danken@redhat.com</a>&gt; wrote:<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; No, I am afraid that we have not managed to understand why setting<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; and<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; ipv6 address too the host off the grid. We shall continue<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; researching<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; this next week.<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; Edy, <a href="https://gerrit.ovirt.org/#/c/88637/" rel="noreferrer" target="_blank" class="">https://gerrit.ovirt.org/#/c/<wbr class="">88637/</a> is already 4 weeks old,<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; but<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; could it possibly be related (I really doubt that)?<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt; Sorry, but I do not see how this problem is related to VDSM.<br class="">

&gt;&gt;&gt;&gt;&gt;&gt; There is nothing that indicates that there is a VDSM problem.<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt; Has the RPC connection between Engine and VDSM failed?<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt; Further up the thread, Piotr noticed that (at least on one failure of<br class="">

&gt;&gt;&gt;&gt;&gt; this test) that the Vdsm host lost connectivity to its storage, and Vdsm<br class="">

&gt;&gt;&gt;&gt;&gt; process was restarted. However, this does not seems to happen in all cases<br class="">

&gt;&gt;&gt;&gt;&gt; where this test fails.<br class="">

&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;&gt; ______________________________<wbr class="">_________________<br class="">

&gt;&gt;&gt;&gt;&gt; Devel mailing list<br class="">

&gt;&gt;&gt;&gt;&gt; <a href="mailto:Devel@ovirt.org" class="">Devel@ovirt.org</a><br class="">

&gt;&gt;&gt;&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/<wbr class="">mailman/listinfo/devel</a><br class="">

&gt;&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;&gt; ______________________________<wbr class="">_________________<br class="">

&gt;&gt;&gt;&gt; Devel mailing list<br class="">

&gt;&gt;&gt;&gt; <a href="mailto:Devel@ovirt.org" class="">Devel@ovirt.org</a><br class="">

&gt;&gt;&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/<wbr class="">mailman/listinfo/devel</a><br class="">

&gt;&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;&gt; --<br class="">

</div></div>&gt;&gt;&gt; *GAL bEN HAIM*<br class="">

<span class="im HOEnZb">&gt;&gt;&gt; RHV DEVOPS<br class="">

&gt;&gt;&gt;<br class="">

&gt;&gt;<br class="">

&gt;&gt;<br class="">

&gt;&gt; ______________________________<wbr class="">_________________<br class="">

&gt;&gt; Devel mailing list<br class="">

&gt;&gt; <a href="mailto:Devel@ovirt.org" class="">Devel@ovirt.org</a><br class="">

&gt;&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/<wbr class="">mailman/listinfo/devel</a><br class="">

&gt;&gt;<br class="">

</span><div class="HOEnZb"><div class="h5">______________________________<wbr class="">_________________<br class="">

Devel mailing list<br class="">

<a href="mailto:Devel@ovirt.org" class="">Devel@ovirt.org</a><br class="">

<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/<wbr class="">mailman/listinfo/devel</a></div></div></blockquote></div><br class=""></div>

_______________________________________________<br class="">Devel mailing list<br class=""><a href="mailto:Devel@ovirt.org" class="">Devel@ovirt.org</a><br class="">http://lists.ovirt.org/mailman/listinfo/devel</div></blockquote></div><br class=""></body></html>