<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <span dir="ltr"><<a href="mailto:mzamazal@redhat.com" target="_blank">mzamazal@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">Dafna Ron <<a href="mailto:dron@redhat.com">dron@redhat.com</a>> writes:<br>
<br>
> We had a failure in test 006_migrations.migrate_vm<br>
</span>> <<a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4842/testReport/junit/%<wbr>28root%29/006_migrations/<wbr>migrate_vm/</a>>.<br>
<span class="gmail-">><br>
> the migration failed with reason "VMExists"<br>
<br>
</span>There are two migrations in 006_migrations.migrate_vm. The first one<br>
succeeded, but if I'm looking correctly into the logs, Engine didn't<br>
send Destroy to the source host after the migration had finished. Then<br>
the second migration gets rejected by Vdsm, because Vdsm still keeps the<br>
former Vm object instance in Down status.<br>
<br>
Since the test succeeds most of the time, it looks like some timing<br>
issue or border case. Arik, is it a known problem? If not, would you<br>
like to look into the logs, whether you can see what's happening?</blockquote><div><br></div><div>Your analysis is correct. That's a nice one actually!</div><div><br></div><div>The statistics monitoring cycles of both hosts host-0 and host-1 were scheduled in a way that they are executed almost at the same time [1].</div><div><br></div><div>Now, at 6:46:34 the VM was migrated from host-1 to host-0.</div><div>At 6:46:42 the migration succeeded - we got events from both hosts, but only processed the one from the destination so the VM switched to Up.</div><div>The next statistics monitoring cycle was triggered at 6:46:44 - again, the report of that VM from the source host was skipped because we processed the one from the destination.</div><div>At 6:46:59, in the next statistics monitoring cycle, it happened again - the report of the VM from the source host was skipped.</div><div>The next migration was triggered at 6:47:05 - the engine didn't manage to process any report from the source host, so the VM remained Down there. </div><div><br></div><div>The probability of this to happen is extremely low.</div><div>However, I think we can make a little tweak to the monitoring code to avoid this:</div><div>"If we get the VM as Down on an unexpected host (that is, not the host we expect the VM to run on), do not lock the VM"</div><div>It should be safe since we don't update anything in this scenario.</div><div> </div><div>[1] For instance:</div><div><p style="margin:0px;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">2018-01-15 06:46:44,905-05 ... </span>GetAllVmStatsVDSCommand ... VdsIdVDSCommandParametersBase:{hostId='873a4d36-55fe-4be1-acb7-8de9c9123eb2'})</p></div><div><p style="margin:0px;font-stretch:normal;font-size:11px;line-height:normal;font-family:Menlo;color:rgb(0,0,0)"><span style="font-variant-ligatures:no-common-ligatures">2018-01-15 06:46:44,932-05 ... </span>GetAllVmStatsVDSCommand ... VdsIdVDSCommandParametersBase:{hostId='31f09289-ec6c-42ff-a745-e82e8ac8e6b9'})</p></div></div></div></div>