Hi All,

I have a 3 server ovirt 4.1 selft hosted setup with gluster replica 3.

I see that suddenly one of the hosts reported as unresponsive and at same time the /var/log/messages logged:

ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=ad7b9e2a-7ae3-46ad-9429-5f5ef452eac8'#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle#012    data)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch#012    .set_storage_domain(client, sd_type, **options)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain#012    self._backends[client].connect()#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect#012    self._dom_type)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path#012    " in {1}".format(sd_uuid, parent))#012BackendFailureException: path to storage domain ad7b9e2a-7ae3-46ad-9429-5f5ef452eac8 not found in /rhev/data-center/mnt/glusterSD
Jan 15 11:04:56 v1 journal: vdsm root ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo#012    stats = instance.get_all_stats()#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats#012    self._configure_broker_conn(broker)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn#012    dom_type=dom_type)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain#012    .format(sd_type, options, e))#012RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': 'ad7b9e2a-7ae3-46ad-9429-5f5ef452eac8'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>



At VDSM logs i see the following continuously logged:
[jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:539)

No errors seen at gluster at same time frame.

Any hints on what is causing this issue? It seems a storage access issue but gluster was up and volumes ok. The VMs that I am running on top are Windows 10 and Windows 2016 64 bit.


Thanx,
Alex