--Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
On 18 Jan 2018, at 17:36, Arik Hadas <ahadas(a)redhat.com>
wrote:
=20
=20
=20
On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzamazal(a)redhat.com =
<mailto:mzamazal@redhat.com>> wrote:
Dafna Ron <dron(a)redhat.com <mailto:dron@redhat.com>>
writes:
=20
> We had a failure in test 006_migrations.migrate_vm
> =
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testRe=
port/junit/%28root%29/006_migrations/migrate_vm/ =
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testRe=
port/junit/%28root%29/006_migrations/migrate_vm/>>.
>
> the migration failed with reason "VMExists"
=20
There are two migrations in 006_migrations.migrate_vm. The first one
succeeded, but if I'm looking correctly into the logs, Engine didn't
send Destroy to the source host after the migration had finished. =
Then
the second migration gets rejected by Vdsm, because Vdsm still keeps
=
the
former Vm object instance in Down status.
=20
Since the test succeeds most of the time, it looks like some timing
issue or border case. Arik, is it a known problem? If not, would you
like to look into the logs, whether you can see what's happening?
=20
Your analysis is correct. That's a nice one actually!
=20
The statistics monitoring cycles of both hosts host-0 and host-1 were =
scheduled
in a way that they are executed almost at the same time [1].
=20
Now, at 6:46:34 the VM was migrated from host-1 to host-0.
At 6:46:42 the migration succeeded - we got events from both hosts, =
but only
processed the one from the destination so the VM switched to =
Up.
The next statistics monitoring cycle was triggered at 6:46:44 -
again, =
the report of that VM from the source host was skipped because we =
processed the one from the destination.
At 6:46:59, in the next statistics monitoring cycle, it happened
again =
- the report of the VM from the source host was skipped.
The next migration was triggered at 6:47:05 - the engine didn't
manage =
to process any report from the source host, so the VM remained Down =
there.=20
=20
The probability of this to happen is extremely low.
Why wasn't the migration rerun?
However, I think we can make a little tweak to the monitoring code to
=
avoid this:
"If we get the VM as Down on an unexpected host (that is, not
the host =
we expect the VM to run on), do not lock the VM"
It should be safe since we don't update anything in this
scenario.
=20
[1] For instance:
2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ... =
VdsIdVDSCommandParametersBase:{hostId=3D'873a4d36-55fe-4be1-acb7-8de9c9123=
eb2'})
2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ... =
VdsIdVDSCommandParametersBase:{hostId=3D'31f09289-ec6c-42ff-a745-e82e8ac8e=
6b9'})
_______________________________________________
Devel mailing list
Devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel
--Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=us-ascii
<html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"
class=3D""><br =
class=3D""><div><br class=3D""><blockquote
type=3D"cite" class=3D""><div =
class=3D"">On 18 Jan 2018, at 17:36, Arik Hadas <<a =
href=3D"mailto:ahadas@redhat.com"
class=3D"">ahadas(a)redhat.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div
class=3D""><div =
dir=3D"ltr" class=3D""><br class=3D""><div
class=3D"gmail_extra"><br =
class=3D""><div class=3D"gmail_quote">On Wed, Jan 17, 2018 at
9:41 PM, =
Milan Zamazal <span dir=3D"ltr" class=3D""><<a =
href=3D"mailto:mzamazal@redhat.com" target=3D"_blank" =
class=3D"">mzamazal(a)redhat.com</a>&gt;</span> wrote:<br =
class=3D""><blockquote class=3D"gmail_quote"
style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(=
204,204,204);padding-left:1ex"><span class=3D"gmail-">Dafna Ron
<<a =
href=3D"mailto:dron@redhat.com"
class=3D"">dron(a)redhat.com</a>&gt; =
writes:<br class=3D"">
<br class=3D"">
> We had a failure in test 006_migrations.migrate_vm<br
class=3D"">
</span>> <<a =
href=3D"http://jenkins.ovirt.org/job/ovirt-master_change-queue-teste...
/testReport/junit/%28root%29/006_migrations/migrate_vm/" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://jenkins.ovirt.org/job/<wbr =
class=3D"">ovirt-master_change-queue-<wbr =
class=3D"">tester/4842/testReport/junit/%<wbr =
class=3D"">28root%29/006_migrations/<wbr =
class=3D"">migrate_vm/</a>>.<br class=3D"">
<span class=3D"gmail-">><br class=3D"">
> the migration failed with reason "VMExists"<br
class=3D"">
<br class=3D"">
</span>There are two migrations in 006_migrations.migrate_vm. The =
first one<br class=3D"">
succeeded, but if I'm looking correctly into the logs, Engine didn't<br =
class=3D"">
send Destroy to the source host after the migration had finished. =
Then<br class=3D"">
the second migration gets rejected by Vdsm, because Vdsm still keeps =
the<br class=3D"">
former Vm object instance in Down status.<br class=3D"">
<br class=3D"">
Since the test succeeds most of the time, it looks like some timing<br =
class=3D"">
issue or border case. Arik, is it a known problem? If not, =
would you<br class=3D"">
like to look into the logs, whether you can see what's =
happening?</blockquote><div class=3D""><br
class=3D""></div><div =
class=3D"">Your analysis is correct. That's a nice one =
actually!</div><div class=3D""><br
class=3D""></div><div class=3D"">The =
statistics monitoring cycles of both hosts host-0 and host-1 were =
scheduled in a way that they are executed almost at the same time =
[1].</div><div class=3D""><br
class=3D""></div><div class=3D"">Now, at =
6:46:34 the VM was migrated from host-1 to host-0.</div><div
class=3D"">At=
6:46:42 the migration succeeded - we got events from both hosts, but =
only processed the one from the destination so the VM switched to =
Up.</div><div class=3D"">The next statistics monitoring cycle was =
triggered at 6:46:44 - again, the report of that VM from the source host =
was skipped because we processed the one from the destination.</div><div =
class=3D"">At 6:46:59, in the next statistics monitoring cycle, it =
happened again - the report of the VM from the source host was =
skipped.</div><div class=3D"">The next migration was triggered at =
6:47:05 - the engine didn't manage to process any report from the source =
host, so the VM remained Down there. </div><div
class=3D""><br =
class=3D""></div><div class=3D"">The probability of
this to happen is =
extremely
low.</div></div></div></div></div></blockquote><div><br
=
class=3D""></div></div><div>Why wasn't the migration =
rerun?</div><div><br class=3D""><blockquote
type=3D"cite" class=3D""><div =
class=3D""><div dir=3D"ltr" class=3D""><div
class=3D"gmail_extra"><div =
class=3D"gmail_quote"><div class=3D"">However, I think we can
make a =
little tweak to the monitoring code to avoid this:</div><div =
class=3D"">"If we get the VM as Down on an unexpected host (that is, not
=
the host we expect the VM to run on), do not lock the VM"</div><div =
class=3D"">It should be safe since we don't update anything in this =
scenario.</div><div class=3D""> </div><div
class=3D"">[1] For =
instance:</div><div class=3D""><div style=3D"margin: 0px;
font-stretch: =
normal; font-size: 11px; line-height: normal; font-family: Menlo;" =
class=3D""><span
style=3D"font-variant-ligatures:no-common-ligatures" =
class=3D"">2018-01-15 06:46:44,905-05 =
... </span>GetAllVmStatsVDSCommand ... =
VdsIdVDSCommandParametersBase:{hostId=3D'873a4d36-55fe-4be1-acb7-8de9c9123=
eb2'})</div></div><div class=3D""><div
style=3D"margin: 0px; =
font-stretch: normal; font-size: 11px; line-height: normal; font-family: =
Menlo;" class=3D""><span =
style=3D"font-variant-ligatures:no-common-ligatures"
class=3D"">2018-01-15=
06:46:44,932-05 ... </span>GetAllVmStatsVDSCommand ... =
VdsIdVDSCommandParametersBase:{hostId=3D'31f09289-ec6c-42ff-a745-e82e8ac8e=
6b9'})</div></div></div></div></div>
_______________________________________________<br class=3D"">Devel =
mailing list<br class=3D""><a href=3D"mailto:Devel@ovirt.org"
=
class=3D"">Devel(a)ovirt.org</a><br =
class=3D"">http://lists.ovirt.org/mailman/listinfo/devel<...
</div><br class=3D""></body></html>=
--Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A--