I have a dual host, self hosted engine, cluster/datacenter. I call these Node1 and Node2.
They are running CentOS 7 and I have been running the updates regular by putting each host
into maintenance mode and installing the update via the web gui.
Node1 is restarting at odd times, causing VMs on that node to start up on Node2 (which is
by design). However the restarts are becoming more frequent and I am having a hard time
figuring out what is causing it. Below is a snippet from the
/var/log/ovirt-hosted-engine-ha/agent.log on Node1. My current plan is to simply rebuild
Node1 by re-installing CentOS 7, and using the cluster tools to import it.
MainThread::INFO::2018-09-10
10:04:25,907::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name:
node1.***.com
MainThread::INFO::2018-09-10
10:04:34,200::hosted_engine::522::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2018-09-10
10:04:34,202::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor ping, options {'addr': '192.168.4.1'}
MainThread::ERROR::2018-09-10
10:04:34,203::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2018-09-10
10:04:34,203::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback
(most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 413, in start_monitoring
self._initialize_broker()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 535, in _initialize_broker
m.get('options', {}))
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 83, in start_monitor
.format(type, options, e))
RequestError: Failed to start monitor ping, options {'addr':
'192.168.4.1'}: [Errno 2] No such file or directory
MainThread::ERROR::2018-09-10
10:04:34,204::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to
restart agent
MainThread::INFO::2018-09-10
10:04:34,204::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting
down
MainThread::INFO::2018-09-10
10:04:44,545::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.2.16 started
MainThread::INFO::2018-09-10
10:04:44,583::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name:
node1.***.com
MainThread::INFO::2018-09-10
10:04:44,744::hosted_engine::522::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2018-09-10
10:04:44,745::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor ping, options {'addr': '192.168.4.1'}
MainThread::ERROR::2018-09-10
10:04:44,746::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2018-09-10
10:04:44,746::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback
(most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 413, in start_monitoring
self._initialize_broker()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 535, in _initialize_broker
m.get('options', {}))
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 83, in start_monitor
.format(type, options, e))
RequestError: Failed to start monitor ping, options {'addr':
'192.168.4.1'}: [Errno 2] No such file or directory