oVirt 3.6: Node went into Error state while migrations happening

On the node in question, the metadata isn't coming across (state) wise. It shows VMs being in an unknown state (some are up and some are down), some show as migrating and there are 9 forever hung migrating tasks. We tried to bring up some of the down VMs that had a state of Down, but that ended up getting them the state of "Wait for Lauch", though those VMs are actually started. Right now, my plan is attempt a restart of vdsmd on the node in question. Just trying to get the node to a working state again. There a total of 9 nodes in our cluster, but we can't manage any VMs on the affected node right now. Is there a way in 3.6 to cancel the hung tasks? I'm worried that if vdsmd is restarted on the node, the tasks might be "attempted"... I really need them to be forgotten if possible. Ideally want all "Unknown" to return to either an "up" or "down" state (depending if the VM is up or down) and for "Wait for Launch" for those, to go to "up" and for all the "Migrating" to go to "up" or "down" (I think only one is actually down). I'm concerned that any attempt manually maniplate the state in the ovirt mgmt head db will be moot because the node will be queried for state and that state will be taken and override anything I attempt to do. Thoughts??

Il giorno lun 8 lug 2019 alle ore 21:44 Christopher Cox <ccox@endlessnow.com> ha scritto:
On the node in question, the metadata isn't coming across (state) wise. It shows VMs being in an unknown state (some are up and some are down), some show as migrating and there are 9 forever hung migrating tasks. We tried to bring up some of the down VMs that had a state of Down, but that ended up getting them the state of "Wait for Lauch", though those VMs are actually started.
Right now, my plan is attempt a restart of vdsmd on the node in question. Just trying to get the node to a working state again. There a total of 9 nodes in our cluster, but we can't manage any VMs on the affected node right now.
Is there a way in 3.6 to cancel the hung tasks? I'm worried that if vdsmd is restarted on the node, the tasks might be "attempted"... I really need them to be forgotten if possible.
Ideally want all "Unknown" to return to either an "up" or "down" state (depending if the VM is up or down) and for "Wait for Launch" for those, to go to "up" and for all the "Migrating" to go to "up" or "down" (I think only one is actually down).
I'm concerned that any attempt manually maniplate the state in the ovirt mgmt head db will be moot because the node will be queried for state and that state will be taken and override anything I attempt to do.
Thoughts??
Hi, please note 3.6 reached End Of Life long time ago. While someone may still be able to provide help to this specific issue I would recommend to plan an upgrade as soon as practical.
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Q3AG2HTDIUWLZI...
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*
participants (2)
-
Christopher Cox
-
Sandro Bonazzola