[ovirt-users] ovirt-ha-agent service errors

Vadim ovirt at qip.ru
Mon Aug 21 11:29:25 UTC 2017


Hi, All

ovirt 4.1.4 fresh install on two hosts with hosted-engine on both.
gluster volume is replica 3 with two vdsm hosts and one VM under esxi

working only one vm for HE 

sometimes have such errors in ha-agent

# service ovirt-ha-agent status
Redirecting to /bin/systemctl status  ovirt-ha-agent.service
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-08-17 11:43:44 MSK; 3 days ago
 Main PID: 2534 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─2534 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

Aug 21 00:29:11 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 00:48:32 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 01:12:05 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 02:12:09 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 03:55:08 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 08:14:05 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 08:25:06 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 08:46:05 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 09:20:06 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds
Aug 21 09:21:40 kvm03 ovirt-ha-agent[2534]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM has bad health status, timeout in 300 seconds


in agent log:

MainThread::INFO::2017-08-21 09:21:40,314::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:c7b33006-e7c7-4e39-8d80-2301149ac8f9, volUUID:184f9e45-ab1b-44b8-8a68-238042dba1a7
MainThread::INFO::2017-08-21 09:21:40,594::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:e7a3c173-9f87-4f6c-a807-63118b9b7cb2, volUUID:92317b81-1bb0-43e6-b029-8931aa5d0af0
MainThread::INFO::2017-08-21 09:21:40,716::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2017-08-21 09:21:40,749::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/glusterSD/localhost:_ovha/7b6badfb-4986-4983-9f62-ae55da33d15e/images/e7a3c173-9f87-4f6c-a807-63118b9b7cb2/92317b81-1bb0-43e6-b029-8931aa5d0af0
MainThread::INFO::2017-08-21 09:21:40,787::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM, trying to convert
MainThread::INFO::2017-08-21 09:21:40,792::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
MainThread::ERROR::2017-08-21 09:21:40,792::states::606::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine VM has bad health status, timeout in 300 seconds
MainThread::INFO::2017-08-21 09:21:40,792::states::430::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost
MainThread::INFO::2017-08-21 09:21:40,796::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout cleared while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineUpBadHealth'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineUp'>
MainThread::INFO::2017-08-21 09:21:40,800::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1503296500.8 type=state_transition detail=EngineUpBadHealth-EngineUp hostname='kvm03'
MainThread::INFO::2017-08-21 09:21:40,853::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineUpBadHealth-EngineUp) sent? sent
MainThread::INFO::2017-08-21 09:21:40,853::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM


There is no errors in web on dashboard
How is it possible to find the reason for an error?

--
Thanks,
Vadim


More information about the Users mailing list