Hello everyone,
I'm running an oVirt cluster (4.3.10.4-1.el7) on a bunch of physical nodes with Centos
7.9.2009 and the Hosted Engine is running as a virtual machine on one of these nodes. As
for the storage, I'm running GlusterFS 6.7 on three separate physical storage nodes
(also Centos 7). Gluster itself has three different volumes of the type
"Replicate" or "Distributed-Replicate".
I recently updated both the system packages and the GlusterFS version to 6.10 on the first
storage node (storage1) and now I'm seeing a potential split-brain situation for one
of the three volumes when running "gluster volume heal info":
Brick storage1:/data/glusterfs/nvme/brick1/brick
Status: Connected
Number of entries: 0
Brick storage2:/data/glusterfs/nvme/brick1/brick
/c32d664d-69ba-4c3f-8ea1-240133963815/dom_md/ids
/
/.shard/.remove_me
Status: Connected
Number of entries: 3
Brick storage3:/data/glusterfs/nvme/brick1/brick
/c32d664d-69ba-4c3f-8ea1-240133963815/dom_md/ids
/
/.shard/.remove_me
Status: Connected
Number of entries: 3
I checked the hashes and the "dom_md/ids" file has a different md5 on every
node. Using the heal on the volume command doesn't do anything and the entries remain.
The heal info for the other two volumes shows no entries.
The affected gluster volume (type: replicate) is mounted as a Storage Domain using the
path "storage1:/nvme" inside of oVirt and is used to store the root partitions
of all virtual machines, which were running at the time of the upgrade and reboot of
storage1. The volume has three bricks, with one brick being stored on each storage node.
For the upgrade process I followed the steps shown at
https://docs.gluster.org/en/latest/Upgrade-Guide/generic-upgrade-procedure/. I stopped and
killed all gluster related services, upgraded both system and gluster packages and
rebooted storage1.
Is this a split brain situation and how can I solve this? I would be very grateful for any
help.
Please let me know if you require any additional information.
Best regards