Hi Alexei,
> 1.2 All bricks healed (gluster volume heal data info summary) and
no split-brain
gluster volume heal data info
Brick node-msk-gluster203:/opt/gluster/data
Status: Connected
Number of entries: 0
Brick node-msk-gluster205:/opt/gluster/data
<gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
<gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
<gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
<gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
<gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
<gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
<gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
Status: Connected
Number of entries: 7
Brick node-msk-gluster201:/opt/gluster/data
<gfid:18c78043-0943-48f8-a4fe-9b23e2ba3404>
<gfid:b6f7d8e7-1746-471b-a49d-8d824db9fd72>
<gfid:6db6a49e-2be2-4c4e-93cb-d76c32f8e422>
<gfid:e39cb2a8-5698-4fd2-b49c-102e5ea0a008>
<gfid:5fad58f8-4370-46ce-b976-ac22d2f680ee>
<gfid:7d0b4104-6ad6-433f-9142-7843fd260c70>
<gfid:706cd1d9-f4c9-4c89-aa4c-42ca91ab827e>
Status: Connected
Number of entries: 7
Data needs healing.
Run: cluster volume heal data full
If it still doesn't heal (check in 5 min),go to
/rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx_data
And run 'find . -exec stat {}\;' without the quotes.
As I have understood you, ovirt Hosted Engine is running and can be started on all nodes
except 1.
>
> 2. Go to the problematic host and check the mount point is there
No mount point on problematic node
/rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data
If I create a mount point manually, it is deleted after the node is activated.
Other nodes can mount this volume without problems. Only this node have connection
problems after update.
Here is a part of the log at the time of activation of the node:
vdsm log
2019-03-18 16:46:00,548+0300 INFO (jsonrpc/5) [vds] Setting Hosted Engine HA local
maintenance to False (API:1630)
2019-03-18 16:46:00,549+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call
Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)
2019-03-18 16:46:00,581+0300 INFO (jsonrpc/7) [vdsm.api] START
connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-00000000025f',
conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4',
u'connection': u'msk-gluster-facility.xxxx:/data', u'iqn':
u'', u'user': u'', u'tpgt': u'1',
u'vfs_type': u'glusterfs', u'password': '********',
u'port': u''}], options=None) from=::ffff:10.77.253.210,56630,
flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)
2019-03-18 16:46:00,621+0300 INFO (jsonrpc/7) [storage.StorageServer.MountConnection]
Creating directory
u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data'
(storageServer:167)
2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.fileUtils] Creating directory:
/rhev/data-center/mnt/glusterSD/msk-gluster-facility.xxxx:_data mode: None
(fileUtils:197)
2019-03-18 16:46:00,622+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection]
gluster server u'msk-gluster-facility.xxxx' is not in bricks
['node-msk-gluster203', 'node-msk-gluster205',
'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)
This seems very strange. As you have hidden the hostname, I'm not use which on is
this.
Check that DNS can be resolved from all hosts and the hostname of this Host is
resolvable.
Also check if it in the peer list.
Try to manually mount the cluster volume:
mount -t glusterfs msk-gluster-facility.xxxx:/data /mnt
Is this a second FQDN/IP of this server?
If so, gluster accepts that via gluster peer probe IP2
> 2.1. Check permissions (should be vdsm:kvm) and fix with chown -R
if needed
> 2.2. Check the OVF_STORE from the logs that it exists
How can i do this?
Go to /rhev/data-center/mnt/glusterSD/host_engine and use find inside the domain UUID for
files that are not owned by vdsm:KVM.
I usually run 'chown -R vdsm:KVM 823xx-xxxx-yyyy-zzz' and it will fix any
misconfiguration.
Best Regards,
Strahil Nikolov