Hi!
In the last few days I am having problem with Hosted-Engine, it keeps restarting.
Sometimes after few minutes, sometimes after few hours..
I haven't done any changes on oVirt or network in that time.
Version is 4.4.10.7-1.el8. (this was also installation version)
Here are the logs:
Agent.log------------------------------
MainThread::INFO::2022-08-21
09:48:36,200::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineUp (score: 2440)
MainThread::INFO::2022-08-21
09:48:36,200::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Best remote host
node3.ovirt.example.org (id: 3, score: 2440)
MainThread::ERROR::2022-08-21
09:48:46,212::states::398::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Host
node3.ovirt.example.org (id 3) score is significantly better than local score,
shutting down VM on this host
MainThread::INFO::2022-08-21
09:48:46,641::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineUp-EngineStop) sent? ignored
MainThread::INFO::2022-08-21
09:48:46,706::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStop (score: 3400)
MainThread::INFO::2022-08-21
09:48:46,706::hosted_engine::525::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Best remote host
node3.ovirt.example.org (id: 3, score: 3400)
MainThread::INFO::2022-08-21
09:48:56,714::hosted_engine::934::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
Shutting down vm using `/usr/sbin/hosted-engine --vm-shutdown`
MainThread::INFO::2022-08-21
09:48:56,871::hosted_engine::941::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
stdout:
MainThread::INFO::2022-08-21
09:48:56,871::hosted_engine::942::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
stderr:
MainThread::ERROR::2022-08-21
09:48:56,871::hosted_engine::950::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
Engine VM stopped on localhost
MainThread::INFO::2022-08-21
09:48:56,880::state_decorators::102::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout set to Sun Aug 21 09:53:56 2022 while transitioning <class
'ovirt_hosted_engine_ha.agent.states.EngineStop'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineStop'>
MainThread::INFO::2022-08-21
09:48:56,959::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStop (score: 3400)
MainThread::INFO::2022-08-21
09:49:06,977::states::537::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine vm not running on local host
MainThread::INFO::2022-08-21
09:49:06,983::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout cleared while transitioning <class
'ovirt_hosted_engine_ha.agent.states.EngineStop'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineDown'>
MainThread::INFO::2022-08-21
09:49:07,173::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineStop-EngineDown) sent? ignored
MainThread::INFO::2022-08-21
09:49:07,795::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineDown (score: 3400)
MainThread::INFO::2022-08-21
09:49:16,811::states::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine down and local host has best score (3400), attempting to start engine VM
MainThread::INFO::2022-08-21
09:49:16,998::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineDown-EngineStart) sent? ignored
MainThread::INFO::2022-08-21
09:49:17,179::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStart (score: 3400)
MainThread::INFO::2022-08-21
09:49:17,195::hosted_engine::895::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
Ensuring VDSM state is clear for engine VM
MainThread::INFO::2022-08-21
09:49:17,200::hosted_engine::915::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
Cleaning state for non-running VM
MainThread::INFO::2022-08-21
09:49:18,211::hosted_engine::907::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state)
Vdsm state for VM clean
MainThread::INFO::2022-08-21
09:49:18,212::hosted_engine::853::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
Starting vm using `/usr/sbin/hosted-engine --vm-start`
MainThread::INFO::2022-08-21
09:49:18,814::hosted_engine::862::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
stdout: VM in WaitForLaunch
MainThread::INFO::2022-08-21
09:49:18,814::hosted_engine::863::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
stderr: Command VM.getStats with args {'vmID':
'cc7931ff-8124-4724-9242-abea2ab5bf42'} failed:
(code=1, message=Virtual machine does not exist: {'vmId':
'cc7931ff-8124-4724-9242-abea2ab5bf42'})
MainThread::INFO::2022-08-21
09:49:18,814::hosted_engine::875::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
Engine VM started on localhost
MainThread::INFO::2022-08-21
09:49:18,999::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineStart-EngineStarting) sent? ignored
MainThread::INFO::2022-08-21
09:49:19,008::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStarting (score: 3400)
MainThread::INFO::2022-08-21
09:49:29,027::states::741::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
VM is powering up..
MainThread::INFO::2022-08-21
09:49:29,033::state_decorators::102::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout set to Sun Aug 21 09:59:29 2022 while transitioning <class
'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineStarting'>
MainThread::INFO::2022-08-21
09:49:29,109::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStarting (score: 3400)
MainThread::INFO::2022-08-21
09:49:38,121::states::741::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
VM is powering up..
MainThread::INFO::2022-08-21
09:49:38,195::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineStarting (score: 3400)
MainThread::INFO::2022-08-21
09:49:48,218::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout cleared while transitioning <class
'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineUp'>
MainThread::INFO::2022-08-21
09:49:48,403::brokerlink::73::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineStarting-EngineUp) sent? ignored
MainThread::INFO::2022-08-21
09:49:48,713::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
Current state EngineUp (score: 3400)
MainThread::INFO::2022-08-21
09:49:58,725::states::406::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine vm running on localhost
Broker.log------------------------------
Thread-4::INFO::2022-08-21
09:47:59,342::cpu_load_no_engine::142::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load)
System load total=0.0241, engine=0.0013, non-engine=0.0228
Thread-3::INFO::2022-08-21 09:48:01,311::mem_free::51::mem_free.MemFree::(action) memFree:
96106
Thread-5::INFO::2022-08-21
09:48:05,612::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is
up on this host with healthy engine
Thread-2::INFO::2022-08-21 09:48:08,591::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action)
Found bridge ovirtmgmt in up state
Thread-1::WARNING::2022-08-21 09:48:10,352::network::121::network.Network::(_dns) DNS
query failed:
; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5
+tcp
;; global options: +cmd
;; connection timed out; no servers could be reached
Thread-1::WARNING::2022-08-21 09:48:10,352::network::92::network.Network::(action) Failed
to verify network status, (4 out of 5)
Thread-3::INFO::2022-08-21 09:48:11,389::mem_free::51::mem_free.MemFree::(action) memFree:
96089
Thread-5::INFO::2022-08-21
09:48:15,707::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is
up on this host with healthy engine
Thread-2::INFO::2022-08-21 09:48:18,662::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action)
Found bridge ovirtmgmt in up state
Thread-1::WARNING::2022-08-21 09:48:18,879::network::121::network.Network::(_dns) DNS
query failed:
; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5
+tcp
;; global options: +cmd
;; connection timed out; no servers could be reached
Thread-3::INFO::2022-08-21 09:48:21,467::mem_free::51::mem_free.MemFree::(action) memFree:
96072
Thread-1::WARNING::2022-08-21 09:48:24,904::network::121::network.Network::(_dns) DNS
query failed:
; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5
+tcp
;; global options: +cmd
;; connection timed out; no servers could be reached
Thread-5::INFO::2022-08-21
09:48:25,808::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is
up on this host with healthy engine
Thread-2::INFO::2022-08-21 09:48:28,740::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action)
Found bridge ovirtmgmt in up state
Thread-1::WARNING::2022-08-21 09:48:30,416::network::121::network.Network::(_dns) DNS
query failed:
; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5
+tcp
;; global options: +cmd
;; connection timed out; no servers could be reached
Thread-1::WARNING::2022-08-21 09:48:30,416::network::92::network.Network::(action) Failed
to verify network status, (2 out of 5)
Thread-3::INFO::2022-08-21 09:48:31,545::mem_free::51::mem_free.MemFree::(action) memFree:
96064
Thread-5::INFO::2022-08-21
09:48:35,909::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is
up on this host with healthy engine
Thread-1::WARNING::2022-08-21 09:48:35,940::network::121::network.Network::(_dns) DNS
query failed:
; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> +tries=1 +time=5
+tcp
;; global options: +cmd
;; connection timed out; no servers could be reached
Thread-1::WARNING::2022-08-21 09:48:37,480::network::92::network.Network::(action) Failed
to verify network status, (4 out of 5)
Thread-2::INFO::2022-08-21 09:48:38,809::mgmt_bridge::65::mgmt_bridge.MgmtBridge::(action)
Found bridge ovirtmgmt in up state
Thread-3::INFO::2022-08-21 09:48:41,623::mem_free::51::mem_free.MemFree::(action) memFree:
96014
Thread-1::INFO::2022-08-21 09:48:42,549::network::88::network.Network::(action)
Successfully verified network status
Thread-5::INFO::2022-08-21
09:48:46,011::engine_health::246::engine_health.EngineHealth::(_result_from_stats) VM is
up on this host with healthy engine
Listener::ERROR::2022-08-21
09:48:46,639::notifications::42::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
(530, b'5.7.1 Authentication required', 'alerts(a)example.org.hr')
At first I thought that it is related to this bugs.
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/2HTD5WR43M5M...
https://bugzilla.redhat.com/show_bug.cgi?id=1984356
But in this oVirt version that bug should already be solved.
I was trying to monitor network, but this error keeps happening even if network load is
low.
I did try to do continuous dig and ping commands on VM-s running on same host as Hosted
Engine, and did not have any network problems, not even one connection drop.
Any solutions or next steps i should try?