Hi Alexei,
>> 1.2 All bricks healed (gluster volume heal data info summary) and no split-brain
>
>
>
> gluster volume heal data info
>
> Brick node-msk-gluster203:/opt/gluster/data
> Status: Connected
> Number of entries: 0
>
> Brick node-msk-gluster205:/opt/gluster/data
> <gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
> <gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
> <gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
> <gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
> <gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
> <gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
> <gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
> Status: Connected
> Number of entries: 7
>
> Brick node-msk-gluster201:/opt/gluster/data
> <gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
> <gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
> <gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
> <gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
> <gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
> <gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
> <gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
> Status: Connected
> Number of entries: 7
>
Data needs healing.
Run: cluster volume heal data full
If it still doesn't heal (check in 5 min),go to /rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx_data
And run 'find . -exec stat {}\;' without the quotes.
As I have understood you, ovirt Hosted Engine is running and can be started on all nodes except 1.
>>
>> 2. Go to the problematic host and check the mount point is there
>
>
>
> No mount point on problematic node /rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data
> If I create a mount point manually, it is deleted after the node is activated.
>
> Other nodes can mount this volume without problems. Only this node have connection problems after update.
>
> Here is a part of the log at the time of activation of the node:
>
> vdsm log
>
> 2019-03-18 16:46:00,548+0300 INFO (jsonrpc/5) [vds] Setting Hosted Engine HA local maintenance to False (API:1630)
> 2019-03-18 16:46:00,549+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)
> 2019-03-18 16:46:00,581+0300 INFO (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-00000000025f', conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': u'msk-gluster-facility.xxxx:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '********', u'port': u''}], options=None) from=::ffff:10.77.253.210,56630, flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)
> 2019-03-18 16:46:00,621+0300 INFO (jsonrpc/7) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data' (storageServer:167)
> 2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data mode: None (fileUtils:197)
> 2019-03-18 16:46:00,622+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.xxxx' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)
This seems very strange. As you have hidden the hostname, I'm not use which on is this.
Check that DNS can be resolved from all hosts and the hostname of this Host is resolvable.
Also check if it in the peer list.
Try to manually mount the cluster volume:
mount -t glusterfs msk-gluster-facility.xxxx:/data /mnt
Is this a second FQDN/IP of this server?
If so, gluster accepts that via gluster peer probe IP2
>> 2.1. Check permissions (should be vdsm:kvm) and fix with chown -R if needed
>> 2.2. Check the OVF_STORE from the logs that it exists
>
>
> How can i do this?
Go to /rhev/data-center/mnt/glusterSD/host_engine and use find inside the domain UUID for files that are not owned by vdsm:KVM.
I usually run 'chown -R vdsm:KVM 823xx-xxxx-yyyy-zzz' and it will fix any misconfiguration.
Best Regards,
Strahil Nikolov