Robert,

I understand the sentiment of the difficulty here.  The recovery feels brutal but the monolithic nature and the dense ecosystem is understandable for the purpose it serves.

I am able to mount the raw disk image for the HostedEngine VM cleanly without any errors and it seems to check out, so I don't believe there is any corruption.

Everything looks to operate as expected and then it just seems to snag somewhere through the startup.  I suppose I'm just trying to trace down the hiccup to clear it out of the way and let the VM boot up.  My knowledge is a bit limited digging in and troubleshooting the components here.

Additional snippet:
MainThread::INFO::2021-02-09 21:00:07,357::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'})

MainThread::INFO::2021-02-09 21:00:07,357::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost
MainThread::INFO::2021-02-09 21:00:07,389::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored
MainThread::INFO::2021-02-09 21:00:07,406::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
MainThread::INFO::2021-02-09 21:00:17,427::states::740::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Another host already took over..


Thank you,
Ian Easter





On Tue, Feb 9, 2021 at 6:31 PM Robert Tongue <phunyguy@neverserio.us> wrote:
I've seen this happen with the VM disk itself becoming corrupt.  If you try to read the contents of the file, and it gives you "Input/Output Error", then it is not good news.  I've been testing oVirt recently, and these issues alone are preventing me from using it full time.  I cannot help further, unfortunately, as I have no idea how to fix it.  So best I can say is, hopefully someone else chimes in and helps both of us. 

-phunyguy

From: ieaster@telvue.com <ieaster@telvue.com>
Sent: Tuesday, February 9, 2021 6:25 PM
To: users@ovirt.org <users@ovirt.org>
Subject: [ovirt-users] Re: HostedEngine VM Paused after power failure
 
Attempting to resume or start the VM doesn't yield any results.

Here is the status of the VM:
Host ID                            : 1
Host timestamp                     : 115601
Score                              : 3400
Engine status                      : {"vm": "up", "health": "bad", "detail": "Paused", "reason": "bad vm status"}
Hostname                           :
Local maintenance                  : False
stopped                            : False
crc32                              : 68efbf40
conf_on_shared_storage             : True
local_conf_timestamp               : 115601
Status up-to-date                  : True
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=115601 (Tue Feb  9 18:25:48 2021)
        host-id=1
        score=3400
        vm_conf_refresh_time=115601 (Tue Feb  9 18:25:48 2021)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineStarting
        stopped=False


Here is a chunk in agent.log that is a bit perplexing.  I'm not too sure what it means that the VM doesn't exist.  Storage is correctly mounted, everything looks fully operational.  I can see the HostedEngine disk available to the Host.

MainThread::INFO::2021-02-09 18:08:13,843::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400)
MainThread::INFO::2021-02-09 18:08:23,864::states::467::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM
MainThread::INFO::2021-02-09 18:08:23,894::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored
MainThread::INFO::2021-02-09 18:08:23,983::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400)
MainThread::INFO::2021-02-09 18:08:24,000::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM
MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean
MainThread::INFO::2021-02-09 18:08:24,005::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start`
MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: VM in WaitForLaunch

MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Command VM.getStats with args {'vmID': '74b3c839-c89c-4857-ada0-95715672348a'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': '74b3c839-c89c-4857-ada0-95715672348a'})

MainThread::INFO::2021-02-09 18:08:24,519::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost
MainThread::INFO::2021-02-09 18:08:24,552::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored
MainThread::INFO::2021-02-09 18:08:24,565::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400)
MainThread::INFO::2021-02-09 18:08:34,585::states::736::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up..
MainThread::INFO::2021-02-09 18:08:34,590::state_decorators::99::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Tue Feb  9 18:18:34 2021 while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDKODQL5A4NNIWJMONVYTFIGC3256URS/