<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Dec 19, 2017 at 12:20 AM, Michal Skrivanek <span dir="ltr">&lt;<a href="mailto:michal.skrivanek@redhat.com" target="_blank">michal.skrivanek@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
&gt; On 18 Dec 2017, at 13:21, Milan Zamazal &lt;<a href="mailto:mzamazal@redhat.com">mzamazal@redhat.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Yedidyah Bar David &lt;<a href="mailto:didi@redhat.com">didi@redhat.com</a>&gt; writes:<br>
&gt;<br>
&gt;&gt; On Mon, Dec 18, 2017 at 10:17 AM, Code Review &lt;<a href="mailto:gerrit@ovirt.org">gerrit@ovirt.org</a>&gt; wrote:<br>
&gt;&gt;&gt; Jenkins CI posted comments on this change.<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;&gt; View Change<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Patch set 3:Continuous-Integration -1<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Build Failed<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2882/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-system-tests_master_<wbr>check-patch-el7-x86_64/2882/</a><br>
&gt;&gt;&gt; : FAILURE<br>
&gt;&gt;<br>
&gt;&gt; Console output of above job says:<br>
&gt;&gt;<br>
&gt;&gt; 08:13:34   # migrate_vm:<br>
&gt;&gt; 08:16:37     * Collect artifacts:<br>
&gt;&gt; 08:16:40     * Collect artifacts: Success (in 0:00:03)<br>
&gt;&gt; 08:16:40   # migrate_vm: Success (in 0:03:06)<br>
&gt;&gt; 08:16:40   # Results located at<br>
&gt;&gt; /dev/shm/ost/deployment-basic-<wbr>suite-master/default/006_<wbr>migrations.py.junit.xml<br>
&gt;&gt; 08:16:40 @ Run test: 006_migrations.py: Success (in 0:03:50)<br>
&gt;&gt; 08:16:40 Error occured, aborting<br>
&gt;&gt;<br>
&gt;&gt; The file 006_migrations.py.junit.xml [1] says:<br>
&gt;&gt;<br>
&gt;&gt; &lt;failure type=&quot;exceptions.<wbr>AssertionError&quot; message=&quot;False != True after<br>
&gt;&gt; 180 seconds&quot;&gt;<br>
&gt;<br>
&gt; Reading the logs, I can see the VM migrates normally and seems to be<br>
&gt; reported to Engine correctly.  When Engine receives end-of-migration<br>
&gt; event, it sends Destroy to the source (which is correct), calls dumpxmls<br>
&gt; on the destination in the meantime (looks fine to me) and then calls<br>
<br>
</div></div>looks like a race between getallvmstats reporting VM as Down (statusTime: 4296271980) being processed, while there is a Down/MigrationSucceeded event arriving (with notify_time 4296273170) at about the same time<br>
Unfortunately the vdsm.log is not in DEBUG level so there’s very little information as to why and what exactly did it send out.<br>
@infra - can you enable debug log level for vdsm by default? </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
It does look like a race to me - does it reproduce? </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div><div class="gmail-h5"><br>
&gt; Destroy on the destination, which is weird and I don&#39;t understand why<br>
&gt; the Destroy is invoked.<br>
&gt;<br>
&gt; Arik, would you like to take a look?  Maybe I overlooked something or<br>
&gt; maybe there&#39;s a bug.  The logs are at<br>
&gt; <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2882/artifact/exported-artifacts/basic-suite-master__logs/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-system-tests_master_<wbr>check-patch-el7-x86_64/2882/<wbr>artifact/exported-artifacts/<wbr>basic-suite-master__logs/test_<wbr>logs/basic-suite-master/post-<wbr>006_migrations.py/</a><br>
&gt; and the interesting things happen around 2017-12-18 03:13:43,758-05.<br></div></div></blockquote><div><br></div><div>So it looks like that:</div><div>1. the engine polls the VMs from the source host</div><div>2. right after #1 we get the down event with proper exit reason (= migration succeeded) but the engine doesn&#39;t process it since the VM is being locked by the monitoring as part of processing that polling (to prevent two analysis of the same VM simultaneously).</div><div>3. the result of the polling is a VM in status Down and must probably exit_status=Normal</div><div>4. the engine decides to abort the migration and thus the monitoring thread of the source host destroys the VM on the destination host.</div><div><br></div><div>Unfortunately we don&#39;t have the exit_reason that is returned by the polling.</div><div>However, the only option I can think of is that it is different than MigrationSucceeded, because otherwise we would have hand-over the VM to the destination host rather than aborting the migration [1].</div><div>That part of the code recently changed as part of [2] - we used to hand-over the VM when we get from the source host:</div><div>status = Down + exit_status = Normal </div><div>And in the database: previous_status = MigrationFrom</div><div>But after that change we require:</div><div>status = Down + exit_status = Normal ** + exit_reason = MigrationSucceeded **<br></div><div>And in the database: previous_status = MigrationFrom</div><div><br></div><div>Long story short, is it possible that VDSM had set the status of the VM to Down and exit_status to Normal but the exit_reason was not updated (yet?) to MigrationSucceeded?</div><div><br></div><div>[1] <a href="https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java#L291">https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java#L291</a></div><div>[2] <a href="https://gerrit.ovirt.org/#/c/84387/">https://gerrit.ovirt.org/#/c/84387/</a></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
&gt;<br>
&gt;&gt; Can someone please have a look? Thanks.<br>
&gt;&gt;<br>
&gt;&gt; As a side note, if indeed this is the cause for the failure for the<br>
&gt;&gt; job, it&#39;s misleading to say &quot;migrate_vm: Success&quot;.<br>
&gt;&gt;<br>
&gt;&gt; [1]<br>
&gt;&gt; <a href="http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/2882/artifact/exported-artifacts/basic-suite-master__logs/006_migrations.py.junit.xml" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-system-tests_master_<wbr>check-patch-el7-x86_64/2882/<wbr>artifact/exported-artifacts/<wbr>basic-suite-master__logs/006_<wbr>migrations.py.junit.xml</a><br>
&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; To view, visit change 85177. To unsubscribe, visit settings.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Gerrit-Project: ovirt-system-tests<br>
&gt;&gt;&gt; Gerrit-Branch: master<br>
&gt;&gt;&gt; Gerrit-MessageType: comment<br>
&gt;&gt;&gt; Gerrit-Change-Id: I7eb386744a2a2faf0acd734e0ba44<wbr>be22dd590b5<br>
&gt;&gt;&gt; Gerrit-Change-Number: 85177<br>
&gt;&gt;&gt; Gerrit-PatchSet: 3<br>
&gt;&gt;&gt; Gerrit-Owner: Yedidyah Bar David &lt;<a href="mailto:didi@redhat.com">didi@redhat.com</a>&gt;<br>
&gt;&gt;&gt; Gerrit-Reviewer: Dafna Ron &lt;<a href="mailto:dron@redhat.com">dron@redhat.com</a>&gt;<br>
&gt;&gt;&gt; Gerrit-Reviewer: Eyal Edri &lt;<a href="mailto:eedri@redhat.com">eedri@redhat.com</a>&gt;<br>
&gt;&gt;&gt; Gerrit-Reviewer: Jenkins CI<br>
&gt;&gt;&gt; Gerrit-Reviewer: Sandro Bonazzola &lt;<a href="mailto:sbonazzo@redhat.com">sbonazzo@redhat.com</a>&gt;<br>
&gt;&gt;&gt; Gerrit-Reviewer: Yedidyah Bar David &lt;<a href="mailto:didi@redhat.com">didi@redhat.com</a>&gt;<br>
&gt;&gt;&gt; Gerrit-Comment-Date: Mon, 18 Dec 2017 08:17:11 +0000<br>
&gt;&gt;&gt; Gerrit-HasComments: No<br>
</div></div>&gt; ______________________________<wbr>_________________<br>
&gt; Devel mailing list<br>
&gt; <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
&gt; <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
&gt;<br>
&gt;<br>
<br>
</blockquote></div><br></div></div>