gluster heal success but a directory doesn't heal

Hello list, I have a problem derived from some hangs in ovirt during upgrade procedures, I have a gluster based self hosted engine deploy with "glen" as the gluster based hosted engine volume: This is the situation I'm facing: [root@ovirt-node3 master]# gluster volume heal glen info Brick ovirt-node2.ovirt:/brickhe/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks Status: Connected Number of entries: 1 Brick ovirt-node3.ovirt:/brickhe/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks Status: Connected Number of entries: 1 Brick ovirt-node4.ovirt:/dati/glen <- arbiter /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks Status: Connected Number of entries: 1 so, as manual suggest, I issued a heal operation: [root@ovirt-node3 master]# gluster volume heal glen Launching heal operation to perform index self heal on volume glen has been successful Use heal info commands to check status. The Heal operation produces no results as the successive heal info report the same from wich I started. But this is the situation in the log files: [root@ovirt-node2 ~]# less /var/log/glusterfs/glfsheal-glen.log <- no errors [root@ovirt-node3 ~]# less /var/log/glusterfs/glfsheal-glen.log <- inside the log I have error entries: [2022-06-20 07:33:05.891367 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-glen-client-2: remote operation failed. [{path=<gfid:44d74dba-19e8-47a3-89e8-f4a6cb37d5ec>}, {gfid=44d74dba-19e8-47a3-89e8-f4a6cb37d5ec}, {errno=2}, {error=No such file or directory}] [root@ovirt-node4 ~]# less /var/log/glusterfs/glfsheal-glen.log <- same kind of errors [2022-06-20 07:27:10.486822 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2620:client4_0_lookup_cbk] 0-glen-client-1: remote operation failed. [{path=<gfid:b7b1fec5-8246-46eb-afde-ba06f52897d2>}, {gfid=b7b1fec5-8246-46eb-afde-ba06f52897d2}, {errno=2}, {error=No such file or directory}] On the nodes the glen volume is correctly mounted and: [root@ovirt-node2 localhost:_glen]# ls -l 3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks/ [root@ovirt-node3 localhost:_glen]# ls -l 3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks/ [root@ovirt-node4 localhost:_glen]# ls -l 3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks/ all return no files and issuing a ls on the brick source: [root@ovirt-node2 glen]# ls -l 3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks/ total 0 [root@ovirt-node3 glen]# ls -l 3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks/ total 0 drwxr-xr-x. 2 vdsm kvm 156 Jun 9 17:20 ccb6fd19-1b67-42b9-a032-31e12d62ed0e [root@ovirt-node4 glen]# ls -l 3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks/ total 0 so it turn out there is some difference between bricks... Can you please help address this issue? Thank you

My Environment is ovirt-host-4.5.0-3.el8.x86_64 and glusterfs-server-10.2-1.el8s.x86_64

Anyone can me address to somewhere where I can read some "in deep" throubleshots for Glusterfs? I cannot find a "quick" manual

I've done something but the problem remain: [root@ovirt-node2 ~]# gluster volume heal glen info Brick ovirt-node2.ovirt:/brickhe/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks Status: Connected Number of entries: 1 Brick ovirt-node3.ovirt:/brickhe/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/images Status: Connected Number of entries: 1 Brick ovirt-node4.ovirt:/dati/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks /3577c21e-f757-4405-97d1-0f827c9b4e22/images Status: Connected Number of entries: 2 And I cannot invoke healing: [root@ovirt-node2 ~]# gluster volume heal glen full Launching heal operation to perform full self heal on volume glen has been successful Use heal info commands to check status. [root@ovirt-node2 ~]# gluster volume heal glen split-brain source-brick ovirt-node3.ovirt:/brickhe/glen 'source-brick' option used on a directory (gfid:95e5075e-720b-4bc0-affe-81d1792e09a6). Performing conservative merge. Healing gfid:95e5075e-720b-4bc0-affe-81d1792e09a6 failed:Is a directory. Lookup failed on gfid:75441538-fc18-4da3-9da7-e1c59a84d950:No such file or directory. Status: Connected Number of healed entries: 0

Have you tried using the gluster ML? https://lists.gluster.org/mailman/listinfo/gluster-users - Gilboa On Tue, Jun 28, 2022 at 11:20 AM Diego Ercolani <diego.ercolani@ssis.sm> wrote:
I've done something but the problem remain: [root@ovirt-node2 ~]# gluster volume heal glen info Brick ovirt-node2.ovirt:/brickhe/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks Status: Connected Number of entries: 1
Brick ovirt-node3.ovirt:/brickhe/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/images Status: Connected Number of entries: 1
Brick ovirt-node4.ovirt:/dati/glen /3577c21e-f757-4405-97d1-0f827c9b4e22/master/tasks /3577c21e-f757-4405-97d1-0f827c9b4e22/images Status: Connected Number of entries: 2
And I cannot invoke healing: [root@ovirt-node2 ~]# gluster volume heal glen full Launching heal operation to perform full self heal on volume glen has been successful Use heal info commands to check status. [root@ovirt-node2 ~]# gluster volume heal glen split-brain source-brick ovirt-node3.ovirt:/brickhe/glen 'source-brick' option used on a directory (gfid:95e5075e-720b-4bc0-affe-81d1792e09a6). Performing conservative merge. Healing gfid:95e5075e-720b-4bc0-affe-81d1792e09a6 failed:Is a directory. Lookup failed on gfid:75441538-fc18-4da3-9da7-e1c59a84d950:No such file or directory. Status: Connected Number of healed entries: 0
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YN3TOB45KTKXAM...

Cross posted here: https://lists.gluster.org/pipermail/gluster-users/2022-June/039957.html
participants (2)
-
Diego Ercolani
-
Gilboa Davara