[Users] VM stuck in state Not Responding

Hi List, in my test lab the iSCSI SAN crashed and caused some mess. My cluster has 3 hosts running VMs. The SPM node was fenced and automatically shutdown due to the storage crash. All VMs running on the other 2 hosts were put to pause. I recovered the storage and powered on the fenced node. All VMs were restarted or coming back to live except one. Since this incident I am no longer able to start oder stop it. It is stuck in state "Not Responding" and it seems I cannot revive it anymore. The engine only provides the stop or shutdown operations, but none works. The following is logged when trying to stop it: 2012-09-28 12:29:08,415 INFO [org.ovirt.engine.core.bll.StopVmCommand] (pool-3-thread-50) [49165a9b] Running command: StopVmCommand internal: false. Entities affected : ID: 0e95f511-62c5-438c-91fe-01c206ceb78f Type: VM2012-09-28 12:29:08,416 WARN [org.ovirt.engine.core.bll.VmOperationCommandBase] (pool-3-thread-50) [49165a9b] Strange, according to the status "NotResponding" virtual machine "0e95f511-62c5-438c-91fe-01c206ceb78f" should be running in a host but it isnt. 2012-09-28 12:29:08,420 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (pool-3-thread-50) [49165a9b] Transaction rolled-back for command: org.ovirt.engine.core.bll.StopVmCommand. and when trying to shutdown: 2012-09-28 12:30:16,213 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (pool-3-thread-48) [42788145] Running command: ShutdownVmCommand internal: false. Entities affected : ID: 0e95f511-62c5-438c-91fe-01c206ceb78f Type: VM 2012-09-28 12:30:16,214 WARN [org.ovirt.engine.core.bll.VmOperationCommandBase] (pool-3-thread-48) [42788145] Strange, according to the status "NotResponding" virtual machine "0e95f511-62c5-438c-91fe-01c206ceb78f" should be running in a host but it isnt. 2012-09-28 12:30:16,218 ERROR [org.ovirt.engine.core.bll.ShutdownVmCommand] (pool-3-thread-48) [42788145] Transaction rolled-back for command: org.ovirt.engine.core.bll.ShutdownVmCommand. Is there anything I can do to reset that stuck state and bring the VM back to live? Best regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

On 09/28/2012 12:34 PM, Patrick Hurrelmann wrote:
Hi List,
in my test lab the iSCSI SAN crashed and caused some mess. My cluster has 3 hosts running VMs. The SPM node was fenced and automatically shutdown due to the storage crash. All VMs running on the other 2 hosts were put to pause. I recovered the storage and powered on the fenced node. All VMs were restarted or coming back to live except one. Since this incident I am no longer able to start oder stop it. It is stuck in state "Not Responding" and it seems I cannot revive it anymore. The engine only provides the stop or shutdown operations, but none works.
The following is logged when trying to stop it:
2012-09-28 12:29:08,415 INFO [org.ovirt.engine.core.bll.StopVmCommand] (pool-3-thread-50) [49165a9b] Running command: StopVmCommand internal: false. Entities affected : ID: 0e95f511-62c5-438c-91fe-01c206ceb78f Type: VM2012-09-28 12:29:08,416 WARN [org.ovirt.engine.core.bll.VmOperationCommandBase] (pool-3-thread-50) [49165a9b] Strange, according to the status "NotResponding" virtual machine "0e95f511-62c5-438c-91fe-01c206ceb78f" should be running in a host but it isnt. 2012-09-28 12:29:08,420 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (pool-3-thread-50) [49165a9b] Transaction rolled-back for command: org.ovirt.engine.core.bll.StopVmCommand.
and when trying to shutdown:
2012-09-28 12:30:16,213 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (pool-3-thread-48) [42788145] Running command: ShutdownVmCommand internal: false. Entities affected : ID: 0e95f511-62c5-438c-91fe-01c206ceb78f Type: VM 2012-09-28 12:30:16,214 WARN [org.ovirt.engine.core.bll.VmOperationCommandBase] (pool-3-thread-48) [42788145] Strange, according to the status "NotResponding" virtual machine "0e95f511-62c5-438c-91fe-01c206ceb78f" should be running in a host but it isnt. 2012-09-28 12:30:16,218 ERROR [org.ovirt.engine.core.bll.ShutdownVmCommand] (pool-3-thread-48) [42788145] Transaction rolled-back for command: org.ovirt.engine.core.bll.ShutdownVmCommand.
Is there anything I can do to reset that stuck state and bring the VM back to live?
Best regards Patrick
try moving all vm's from that host (migrate them to the other hosts), then fence it (or shutdown manually and right click, confirm shutdown) to try and release the vm from it.

Is there anything I can do to reset that stuck state and bring the VM back to live?
Best regards Patrick
try moving all vm's from that host (migrate them to the other hosts), then fence it (or shutdown manually and right click, confirm shutdown) to try and release the vm from it.
In the web interface it is shown with icon for stopped VMs and an empty host, but status is showing "Not Responding". So the stuck VM is not assigned to any host? All 3 hosts and the engine itself have already been rebooted since the storage crash (The hosts one by one and going to maintenance before). Regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

On 09/28/2012 03:04 PM, Patrick Hurrelmann wrote:
Is there anything I can do to reset that stuck state and bring the VM back to live?
Best regards Patrick
try moving all vm's from that host (migrate them to the other hosts), then fence it (or shutdown manually and right click, confirm shutdown) to try and release the vm from it.
In the web interface it is shown with icon for stopped VMs and an empty host, but status is showing "Not Responding". So the stuck VM is not assigned to any host? All 3 hosts and the engine itself have already been rebooted since the storage crash (The hosts one by one and going to maintenance before).
shortest solution is for you to change the status of the VM in the db to unlock it, but would be nice to try and understand why this specific vm got into this state to fix the bug.

On 28.09.2012 15:10, Itamar Heim wrote:
On 09/28/2012 03:04 PM, Patrick Hurrelmann wrote:
Is there anything I can do to reset that stuck state and bring the VM back to live?
Best regards Patrick
try moving all vm's from that host (migrate them to the other hosts), then fence it (or shutdown manually and right click, confirm shutdown) to try and release the vm from it.
In the web interface it is shown with icon for stopped VMs and an empty host, but status is showing "Not Responding". So the stuck VM is not assigned to any host? All 3 hosts and the engine itself have already been rebooted since the storage crash (The hosts one by one and going to maintenance before).
shortest solution is for you to change the status of the VM in the db to unlock it, but would be nice to try and understand why this specific vm got into this state to fix the bug.
yes, sure. If there is more information that I can provide to help, please let me now. My setup is based on CentOS 6.3 using dreyou's repo. On hosts: vdsm 4.10.0-0.42.13.el6 On engine: ovirt-engine 3.1.0-3.19.el6 Regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

On 28.09.2012 15:10, Itamar Heim wrote:
On 09/28/2012 03:04 PM, Patrick Hurrelmann wrote:
Is there anything I can do to reset that stuck state and bring the VM back to live?
Best regards Patrick
try moving all vm's from that host (migrate them to the other hosts), then fence it (or shutdown manually and right click, confirm shutdown) to try and release the vm from it. In the web interface it is shown with icon for stopped VMs and an empty host, but status is showing "Not Responding". So the stuck VM is not assigned to any host? All 3 hosts and the engine itself have already been rebooted since the storage crash (The hosts one by one and going to maintenance before). shortest solution is for you to change the status of the VM in the db to unlock it, but would be nice to try and understand why this specific vm got into this state to fix the bug.
yes, sure. If there is more information that I can provide to help, please let me now. My setup is based on CentOS 6.3 using dreyou's repo.
On hosts: vdsm 4.10.0-0.42.13.el6 On engine: ovirt-engine 3.1.0-3.19.el6
On 09/28/2012 04:20 PM, Patrick Hurrelmann wrote: please attach also vdsm log (from the host running that orphan VM) and the engine log
Regards Patrick
participants (3)
-
Itamar Heim
-
Patrick Hurrelmann
-
Roy Golan