and the following at /var/log/messages:When I shutdown host3 then no split brain is reported from the remaining two hosts. When I power up host3 then I receive the mentioned split brain and host3 logs the following at ovirt-hosted-engine-ha/agent.Hi Denis,I receive permission denied as below:
gluster volume heal engine split-brain latest-mtime /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent Healing /e1c80750-b880-495e-9609-b8bc7760d101/ha_agent failed:Operation not permitted.
Volume heal failed.log
MainThread::INFO::2017-06-23 16:18:06,067::hosted_engine::594::ovirt_hosted_engine_ha. agent.hosted_engine. HostedEngine::(_initialize_ broker) Failed set the storage domain: 'Failed to set storage domain VdsmBackend, options {'hosted-engine.lockspace': ' 7B22696D6167655F75756964223A20 2238323132626637382D663933332D 346465652D616333372D3462656337 34353035366235222C202270617468 223A206E756C6C2C2022766F6C756D 655F75756964223A20223632373930 3162652D666261332D346263342D39 3037632D3939313561383336326335 37227D', 'sp_uuid': '00000000-0000-0000-0000- 000000000000', 'dom_type': 'glusterfs', 'hosted-engine.metadata': ' 7B22696D6167655F75756964223A20 2263353930633034372D613462322D 346539312D613832362D6434386239 61643537323330222C202270617468 223A206E756C6C2C2022766F6C756D 655F75756964223A20223035316665 3865612D333339632D346134302D38 3438382D3863353131386664383732 38227D', 'sd_uuid': 'e1c80750-b880-495e-9609- b8bc7760d101'}: Request failed: <type 'exceptions.OSError'>'. Waiting '5's before the next attempt
Jun 23 16:19:43 v2 journal: vdsm root ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo#012 stats = instance.get_all_stats()#012 File "/usr/lib/python2.7/site- packages/ovirt_hosted_engine_ ha/client/client.py", line 105, in get_all_stats#012 stats = broker.get_stats_from_storage( service)#012 File "/usr/lib/python2.7/site- packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 233, in get_stats_from_storage#012 result = self._checked_communicate( request)#012 File "/usr/lib/python2.7/site- packages/ovirt_hosted_engine_ ha/lib/brokerlink.py", line 261, in _checked_communicate#012 .format(message or response))#012RequestError: Request failed: failed to read metadata: [Errno 5] Input/output error: '/rhev/data-center/mnt/ glusterSD/10.100.100.1:_ engine/e1c80750-b880-495e- 9609-b8bc7760d101/ha_agent/ hosted-engine.metadata' ThanxOn Fri, Jun 23, 2017 at 6:05 PM, Denis Chaplygin <dchaplyg@redhat.com> wrote:Hello Abi,On Fri, Jun 23, 2017 at 4:47 PM, Abi Askushi <rightkicktech@gmail.com> wrote:Hi All,I have a 3 node ovirt 4.1 setup. I lost one node due to raid controller issues. Upon restoration I have the following split brain, although the hosts have mounted the storage domains:
gluster volume heal engine info split-brain
Brick gluster0:/gluster/engine/brick
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Status: Connected
Number of entries in split-brain: 1
Brick gluster1:/gluster/engine/brick
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Status: Connected
Number of entries in split-brain: 1
Brick gluster2:/gluster/engine/brick
/e1c80750-b880-495e-9609-b8bc7760d101/ha_agent
Status: Connected
Number of entries in split-brain: 1It is definitely on gluster side. You could try to usegluster volume heal engine split-brain latest-mtime /e1c80750-b880-495e-9609- b8bc7760d101/ha_agentI also added gluster developers to that thread, so they may provide you with better advices.