Ping looks fine from both engine-host and host-engine.
While troubleshooting more from logs, found the below errors from various files:
###################
VDSM <host> command Get Host Capabilities failed: Not enough resources:
{'reason': 'Too many tasks', 'resource': 'jsonrpc',
'current_tasks': 80}
############
May 9 03:54:04 <host> vdsm[26934]: WARN Worker blocked: <Worker name=jsonrpc/4
running <Task <JsonRpcTask {'params': {u'volumeName':
u'vm_gv0'}, 'jsonrpc': '2.0', 'method':
u'GlusterVolume.healInfo', 'id':
u'f4e56ab9-6916-4938-821a-1b9aab2ef162'} at 0x7fb886fd8dd0> timeout=60,
duration=7980 at 0x7fb886edc910> task#=14247 at 0x7fb8a4035450>, traceback:#012File:
"/usr/lib64/python2.7/threading.py", line 785, in __bootstrap#012
self.__bootstrap_inner()#012File: "/usr/lib64/python2.7/threading.py", line 812,
in __bootstrap_inner#012 self.run()#012File:
"/usr/lib64/python2.7/threading.py", line 765, in run#012
self.__target(*self.__args, **self.__kwargs)#012File:
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in
run#012 ret = func(*args, **kwargs)#012File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run#012
self._execute_task()#012File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
_execute_task#012 task()#012File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__#012
self._callable()#012File:
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523, in
__call__#012 self._handler(self._ctx, self._req)#012File:
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566, in
_serveRequest#012 response = self._handle_request(req, ctx)#012File:
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in
_handle_request#012 res = method(**params)#012File:
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in
_dynamicMethod#012 result = fn(*methodArgs)#012File:
"/usr/lib/python2.7/site-packages/vdsm/gluster/apiwrapper.py", line 129, in
healInfo#012 return self._gluster.volumeHealInfo(volumeName)#012File:
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 90, in wrapper#012
rv = func(*args, **kwargs)#012File:
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 776, in
volumeHealInfo#012 return {'healInfo':
self.svdsmProxy.glusterVolumeHealInfo(volumeName)}#012File:
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in
__call__#012 return callMethod()#012File:
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in
<lambda>#012 **kwargs)#012File: "<string>", line 2, in
glusterVolumeHealInfo#012File:
"/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod#012
kind, result = conn.recv()
#########
cat /var/log/messages | grep 'database connection failed'
May 9 07:25:59 <host> ovs-vsctl:
ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed
(No such file or directory)
#######
/var/log/ovirt-hosted-engine-ha/agent.log
MainThread::ERROR::2020-05-09
11:32:33,089::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to
restart agent
MainThread::INFO::2020-05-09
11:32:33,089::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting
down
MainThread::INFO::2020-05-09
11:32:43,926::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.2.16 started
MainThread::INFO::2020-05-09
11:32:43,984::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: <hostname>
MainThread::ERROR::2020-05-09
11:33:49,369::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback
(most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
return he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 412, in start_monitoring
self._initialize_vdsm()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 569, in _initialize_vdsm
logger=self._log
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py",
line 468, in connect_vdsm_json_rpc
__vdsm_json_rpc_connect(logger, timeout)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py",
line 411, in __vdsm_json_rpc_connect
timeout=VDSM_MAX_RETRY * VDSM_DELAY
RuntimeError: Couldn't connect to VDSM within 60 seconds
MainThread::ERROR::2020-05-09
11:33:49,371::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to
restart agent
MainThread::INFO::2020-05-09
11:33:49,371::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting
down
MainThread::INFO::2020-05-09
11:34:00,216::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 2.2.16 started
MainThread::INFO::2020-05-09
11:34:00,326::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: <hostname>