HostedEngine VM Paused after power failure

Hello Users, We have an oVirt (4.4) environment that had 2 hosts in the cluster. We suffered from a power failure that caused the servers to be offline for some time. Once restored, one of the hosts from the cluster lost its OS raid and is not accessible. The other server has the HostedEngine vm on it but in a paused state. I have tried to manually start the vm with the hosted-engine CLI tool but it indicates that HostedEngine is running on another host. Is there any manual intervention I can accomplish here to start the HostedEngine on the second, active host server? *Thank you,* *Ian Easter* *DevOps Engineer* *TelVue Support* https://www.telvue.com/support/

If the enginve VM is in a paused state, ssh into host where its paused and try virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume HostedEngine On Tue, Feb 9, 2021 at 2:15 PM Ian Easter <ieaster@telvue.com> wrote:
Hello Users,
We have an oVirt (4.4) environment that had 2 hosts in the cluster. We suffered from a power failure that caused the servers to be offline for some time. Once restored, one of the hosts from the cluster lost its OS raid and is not accessible.
The other server has the HostedEngine vm on it but in a paused state. I have tried to manually start the vm with the hosted-engine CLI tool but it indicates that HostedEngine is running on another host.
Is there any manual intervention I can accomplish here to start the HostedEngine on the second, active host server?
*Thank you,* *Ian Easter* *DevOps Engineer* *TelVue Support* https://www.telvue.com/support/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VFE52Q3ILURK2C...

Attempting to resume or start the VM doesn't yield any results. Here is the status of the VM: Host ID : 1 Host timestamp : 115601 Score : 3400 Engine status : {"vm": "up", "health": "bad", "detail": "Paused", "reason": "bad vm status"} Hostname : Local maintenance : False stopped : False crc32 : 68efbf40 conf_on_shared_storage : True local_conf_timestamp : 115601 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=115601 (Tue Feb 9 18:25:48 2021) host-id=1 score=3400 vm_conf_refresh_time=115601 (Tue Feb 9 18:25:48 2021) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Here is a chunk in agent.log that is a bit perplexing. I'm not too sure what it means that the VM doesn't exist. Storage is correctly mounted, everything looks fully operational. I can see the HostedEngine disk available to the Host. MainThread::INFO::2021-02-09 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) MainThread::INFO::2021-02-09 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2021-02-09 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored MainThread::INFO::2021-02-09 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::INFO::2021-02-09 18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: VM in WaitForLaunch MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'}) MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2021-02-09 18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored MainThread::INFO::2021-02-09 18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2021-02-09 18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2021-02-09 18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Tue Feb 9 18:18:34 2021 while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'>

I've seen this happen with the VM disk itself becoming corrupt. If you try to read the contents of the file, and it gives you "Input/Output Error", then it is not good news. I've been testing oVirt recently, and these issues alone are preventing me from using it full time. I cannot help further, unfortunately, as I have no idea how to fix it. So best I can say is, hopefully someone else chimes in and helps both of us. -phunyguy ________________________________ From: ieaster@telvue.com <ieaster@telvue.com> Sent: Tuesday, February 9, 2021 6:25 PM To: users@ovirt.org <users@ovirt.org> Subject: [ovirt-users] Re: HostedEngine VM Paused after power failure Attempting to resume or start the VM doesn't yield any results. Here is the status of the VM: Host ID : 1 Host timestamp : 115601 Score : 3400 Engine status : {"vm": "up", "health": "bad", "detail": "Paused", "reason": "bad vm status"} Hostname : Local maintenance : False stopped : False crc32 : 68efbf40 conf_on_shared_storage : True local_conf_timestamp : 115601 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=115601 (Tue Feb 9 18:25:48 2021) host-id=1 score=3400 vm_conf_refresh_time=115601 (Tue Feb 9 18:25:48 2021) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Here is a chunk in agent.log that is a bit perplexing. I'm not too sure what it means that the VM doesn't exist. Storage is correctly mounted, everything looks fully operational. I can see the HostedEngine disk available to the Host. MainThread::INFO::2021-02-09 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) MainThread::INFO::2021-02-09 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2021-02-09 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored MainThread::INFO::2021-02-09 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::INFO::2021-02-09 18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: VM in WaitForLaunch MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'}) MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2021-02-09 18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored MainThread::INFO::2021-02-09 18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2021-02-09 18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2021-02-09 18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Tue Feb 9 18:18:34 2021 while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDKODQL5A4NNIW...
participants (4)
-
Edward Berger
-
Ian Easter
-
ieaster@telvue.com
-
Robert Tongue