Re: [ovirt-devel] Subject: [ OST Failure Report ] [ oVirt Master ] [ Jan 15th 2018 ] [ 006_migrations.migrate_vm ]

19 Jan 2018


      On Fri, Jan 19, 2018 at 12:46 PM, Michal Skrivanek <
michal.skrivanek@redhat.com> wrote:
...
On 18 Jan 2018, at 17:36, Arik Hadas <ahadas@redhat.com> wrote:
On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzamazal@redhat.com>
wrote:
...
Dafna Ron <dron@redhat.com> writes:
...
We had a failure in test 006_migrations.migrate_vm
<http://jenkins.ovirt.org/job/ovirt-master_change-queue-test
er/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>.
the migration failed with reason "VMExists"
There are two migrations in 006_migrations.migrate_vm.  The first one
succeeded, but if I'm looking correctly into the logs, Engine didn't
send Destroy to the source host after the migration had finished.  Then
the second migration gets rejected by Vdsm, because Vdsm still keeps the
former Vm object instance in Down status.
Since the test succeeds most of the time, it looks like some timing
issue or border case.  Arik, is it a known problem?  If not, would you
like to look into the logs, whether you can see what's happening?
Your analysis is correct. That's a nice one actually!
The statistics monitoring cycles of both hosts host-0 and host-1 were
scheduled in a way that they are executed almost at the same time [1].
Now, at 6:46:34 the VM was migrated from host-1 to host-0.
At 6:46:42 the migration succeeded - we got events from both hosts, but
only processed the one from the destination so the VM switched to Up.
The next statistics monitoring cycle was triggered at 6:46:44 - again, the
report of that VM from the source host was skipped because we processed the
one from the destination.
At 6:46:59, in the next statistics monitoring cycle, it happened again -
the report of the VM from the source host was skipped.
The next migration was triggered at 6:47:05 - the engine didn't manage to
process any report from the source host, so the VM remained Down there.
The probability of this to happen is extremely low.
Why wasn't the migration rerun?
Good question, because a migration to a particular host (MigrateVmToServer)
was requested.
In this particular case, it seems that there are only two hosts defined so
changing it to MigrateVm wouldn't make any difference though.
...
However, I think we can make a little tweak to the monitoring code to
avoid this:
"If we get the VM as Down on an unexpected host (that is, not the host we
expect the VM to run on), do not lock the VM"
It should be safe since we don't update anything in this scenario.
[1] For instance:
2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ...
VdsIdVDSCommandParametersBase:{hostId='873a4d36-55fe-4be1-
acb7-8de9c9123eb2'})
2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ...
VdsIdVDSCommandParametersBase:{hostId='31f09289-ec6c-42ff-
a745-e82e8ac8e6b9'})
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel