[ovirt-users] VMs stuck in migrating state
nicolas at devels.es
nicolas at devels.es
Fri Mar 2 14:25:17 UTC 2018
Hi Milan,
El 2018-03-02 14:10, Milan Zamazal escribió:
> nicolas at devels.es writes:
>
>> We're running 4.1.9 and during the weekend we had a storage issue that
>> seemed
>> to leave some hosts in an strange state. One of the hosts has almost
>> all VMs
>> migrating (although it seems to not actually being migrating them) and
>> the
>> migration state cannot be cancelled.
>>
>> When clicking on one of those machines and selecting 'Cancel
>> migration', in the
>> ovirt-engine log I see:
>>
>> 2018-02-26 08:52:07,588Z INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
>> (org.ovirt.thread.pool-6-thread-36)
>> [887dfbf9-dece-4f7b-90a8-dac02b849b7f]
>> HostName = host2.domain.com
>> 2018-02-26 08:52:07,588Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.CancelMigrateVDSCommand]
>> (org.ovirt.thread.pool-6-thread-36)
>> [887dfbf9-dece-4f7b-90a8-dac02b849b7f]
>> Command 'CancelMigrateVDSCommand(HostName = host2.domain.com,
>> CancelMigrationVDSParameters:{runAsync='true',
>> hostId='e63b9146-10c4-47ad-bd6c-f053a8c5b4eb',
>> vmId='26d37e43-32e2-4e55-9c1e-1438518d5021'})' execution failed:
>> VDSGenericException: VDSErrorException: Failed to CancelMigrateVDS,
>> error =
>> Migration process cancelled, code = 82
>>
>> On the vdsm side I see:
>>
>> 2018-02-26 08:56:19,396+0000 INFO (jsonrpc/0) [vdsm.api] START
>> migrateCancel()
>> from=::ffff:10.X.X.X,54654,
>> flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858
>> (api:46)
>> 2018-02-26 08:56:19,398+0000 INFO (jsonrpc/0) [vdsm.api] FINISH
>> migrateCancel
>> return={'status': {'message': 'Migration process cancelled', 'code':
>> 82},
>> 'progress': 0} from=::ffff:10.X.X.X,54654,
>> flow_id=874d36d7-63f5-4b71-8a4d-6d9f3ec65858 (api:52)
>>
>> So no error on the vdsm side log.
>
> Interesting. The messages above indicate that the VM was attempted to
> migrate, but the migration got temporarily rejected on the destination
> due to the number of already running incoming migrations (the limit is
> 2
> incoming migrations by default). Later, Vdsm was asked to cancel the
> outgoing migration and it successfully set a migration canceling flag.
> However the action was reported as an error to Engine, due to hitting
> the incoming migration limit on the destination. Maybe it's a bug, I'm
> not sure, resulting in minor confusion. Normally it shouldn't matter,
> the migration should be canceled shortly after anyway and Engine should
> be informed about that.
>
> However the migration apparently wasn't canceled here. I can't say
> what
> happened without complete Vdsm log. One of possible reasons is that
> the
> migration has been waiting on completion of another migration outgoing
> from the source (only one outgoing migration at the time is allowed by
> default). In any case it seems the migration either wasn't actually
> started at all or it just started being set up and that has never been
> completely finished.
>
I'm attaching the log. Basically the storage backend was restarted by
fencing and then this issue happened. This was on 26/02 at about 08:52
(log time).
>> I already tried restarting ovirt-engine but it didn't work.
>
> Here the problem is clearly on the Vdsm side.
>
>> Could someone shed some light on how to cancel the migration status
>> for these
>> machines? All of them seem to be running on the same host.
>
> Did the VMs get unblocked in the meantime? I can't know what's the
No, they didn't. They're still in a "Migrating" state.
> actual state of the given VMs without seeing the complete Vdsm log, so
> it's difficult to give a good advice. I think that Vdsm restart on the
> given host would help BUT it's generally not a very good idea to
> restart
> Vdsm if any real migration, outgoing or incoming, is running on the
> host. VMs that aren't actually being migrated (despite being reported
> as migrating) at all should simply return to Up state after the
> restart,
> but VMs with any real migration action pending might get return to Up
> state without proper cleanup, resulting in a different kind of mess or
> maybe something even worse (things should improve in oVirt 4.2, but
> it's
> still good to avoid Vdsm restarts with migrations running).
>
I assume this is not a real migration as it has been in this state for
several days. Would you advice restarting vdsm in this case then?
Thank you.
> Regards,
> Milan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log.20.xz
Type: application/x-xz
Size: 963208 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180302/cd436252/attachment.xz>
More information about the Users
mailing list