
19 Jan
2018
19 Jan
'18
11:46 a.m.
--Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On 18 Jan 2018, at 17:36, Arik Hadas <ahadas@redhat.com> wrote: >=20 >=20 >=20 > On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzamazal@redhat.com = <mailto:mzamazal@redhat.com>> wrote: > Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> writes: >=20 > > We had a failure in test 006_migrations.migrate_vm > > = <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testRe= port/junit/%28root%29/006_migrations/migrate_vm/ = <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testRe= port/junit/%28root%29/006_migrations/migrate_vm/>>. > > > > the migration failed with reason "VMExists" >=20 > There are two migrations in 006_migrations.migrate_vm. The first one > succeeded, but if I'm looking correctly into the logs, Engine didn't > send Destroy to the source host after the migration had finished. = Then > the second migration gets rejected by Vdsm, because Vdsm still keeps = the > former Vm object instance in Down status. >=20 > Since the test succeeds most of the time, it looks like some timing > issue or border case. Arik, is it a known problem? If not, would you > like to look into the logs, whether you can see what's happening? >=20 > Your analysis is correct. That's a nice one actually! >=20 > The statistics monitoring cycles of both hosts host-0 and host-1 were = scheduled in a way that they are executed almost at the same time [1]. >=20 > Now, at 6:46:34 the VM was migrated from host-1 to host-0. > At 6:46:42 the migration succeeded - we got events from both hosts, = but only processed the one from the destination so the VM switched to = Up. > The next statistics monitoring cycle was triggered at 6:46:44 - again, = the report of that VM from the source host was skipped because we = processed the one from the destination. > At 6:46:59, in the next statistics monitoring cycle, it happened again = - the report of the VM from the source host was skipped. > The next migration was triggered at 6:47:05 - the engine didn't manage = to process any report from the source host, so the VM remained Down = there.=20 >=20 > The probability of this to happen is extremely low. Why wasn't the migration rerun? > However, I think we can make a little tweak to the monitoring code to = avoid this: > "If we get the VM as Down on an unexpected host (that is, not the host = we expect the VM to run on), do not lock the VM" > It should be safe since we don't update anything in this scenario. > =20 > [1] For instance: > 2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'873a4d36-55fe-4be1-acb7-8de9c9123= eb2'}) > 2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'31f09289-ec6c-42ff-a745-e82e8ac8e= 6b9'}) > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel --Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br = class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D"">On 18 Jan 2018, at 17:36, Arik Hadas <<a = href=3D"mailto:ahadas@redhat.com" class=3D"">ahadas@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><br class=3D""><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">On Wed, Jan 17, 2018 at 9:41 PM, = Milan Zamazal <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:mzamazal@redhat.com" target=3D"_blank" = class=3D"">mzamazal@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(= 204,204,204);padding-left:1ex"><span class=3D"gmail-">Dafna Ron <<a = href=3D"mailto:dron@redhat.com" class=3D"">dron@redhat.com</a>> = writes:<br class=3D""> <br class=3D""> > We had a failure in test 006_migrations.migrate_vm<br class=3D""> </span>> <<a = href=3D"http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842= /testReport/junit/%28root%29/006_migrations/migrate_vm/" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/<wbr = class=3D"">ovirt-master_change-queue-<wbr = class=3D"">tester/4842/testReport/junit/%<wbr = class=3D"">28root%29/006_migrations/<wbr = class=3D"">migrate_vm/</a>>.<br class=3D""> <span class=3D"gmail-">><br class=3D""> > the migration failed with reason "VMExists"<br class=3D""> <br class=3D""> </span>There are two migrations in 006_migrations.migrate_vm. The = first one<br class=3D""> succeeded, but if I'm looking correctly into the logs, Engine didn't<br = class=3D""> send Destroy to the source host after the migration had finished. = Then<br class=3D""> the second migration gets rejected by Vdsm, because Vdsm still keeps = the<br class=3D""> former Vm object instance in Down status.<br class=3D""> <br class=3D""> Since the test succeeds most of the time, it looks like some timing<br = class=3D""> issue or border case. Arik, is it a known problem? If not, = would you<br class=3D""> like to look into the logs, whether you can see what's = happening?</blockquote><div class=3D""><br class=3D""></div><div = class=3D"">Your analysis is correct. That's a nice one = actually!</div><div class=3D""><br class=3D""></div><div class=3D"">The = statistics monitoring cycles of both hosts host-0 and host-1 were = scheduled in a way that they are executed almost at the same time = [1].</div><div class=3D""><br class=3D""></div><div class=3D"">Now, at = 6:46:34 the VM was migrated from host-1 to host-0.</div><div class=3D"">At= 6:46:42 the migration succeeded - we got events from both hosts, but = only processed the one from the destination so the VM switched to = Up.</div><div class=3D"">The next statistics monitoring cycle was = triggered at 6:46:44 - again, the report of that VM from the source host = was skipped because we processed the one from the destination.</div><div = class=3D"">At 6:46:59, in the next statistics monitoring cycle, it = happened again - the report of the VM from the source host was = skipped.</div><div class=3D"">The next migration was triggered at = 6:47:05 - the engine didn't manage to process any report from the source = host, so the VM remained Down there. </div><div class=3D""><br = class=3D""></div><div class=3D"">The probability of this to happen is = extremely low.</div></div></div></div></div></blockquote><div><br = class=3D""></div></div><div>Why wasn't the migration = rerun?</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D"">However, I think we can make a = little tweak to the monitoring code to avoid this:</div><div = class=3D"">"If we get the VM as Down on an unexpected host (that is, not = the host we expect the VM to run on), do not lock the VM"</div><div = class=3D"">It should be safe since we don't update anything in this = scenario.</div><div class=3D""> </div><div class=3D"">[1] For = instance:</div><div class=3D""><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo;" = class=3D""><span style=3D"font-variant-ligatures:no-common-ligatures" = class=3D"">2018-01-15 06:46:44,905-05 = ... </span>GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'873a4d36-55fe-4be1-acb7-8de9c9123= eb2'})</div></div><div class=3D""><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo;" class=3D""><span = style=3D"font-variant-ligatures:no-common-ligatures" class=3D"">2018-01-15= 06:46:44,932-05 ... </span>GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'31f09289-ec6c-42ff-a745-e82e8ac8e= 6b9'})</div></div></div></div></div> _______________________________________________<br class=3D"">Devel = mailing list<br class=3D""><a href=3D"mailto:Devel@ovirt.org" = class=3D"">Devel@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/devel</div></blockquote= ></div><br class=3D""></body></html>= --Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A--