
Hey folks, in our production setup with 3 nodes (HCI) we took one host down (maintenance, stop gluster, poweroff via ssh/ovirt engine). Once it was up the gluster hat 2k healing entries that went down in a matter on 10 minutes to 2. Those two give me a headache: [root@node03:~] # gluster vol heal ssd_storage info Brick node01:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2 Brick node02:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0 Brick node03:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2 No paths, only gfid. We took down node2, so it does not have the file: [root@node01:~] # md5sum /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 75c4941683b7eabc223fc9d5f022a77c /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 [root@node02:~] # md5sum /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 md5sum: /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6: No such file or directory [root@node03:~] # md5sum /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 75c4941683b7eabc223fc9d5f022a77c /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 The other two files are md5-identical. These flags are identical, too: [root@node01:~] # getfattr -d -m . -e hex /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6 trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030 trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6 trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030 trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2 Now, I dont dare simply proceeding withouth some advice. Anyone got a clue on who to resolve this issue? File #2 is identical to this one, from a problem point of view. Have a great weekend! -Chris. -- with kind regards, mit freundlichen Gruessen, Christian Reiss

I have run into this exact issue before and resolved it by simply syncing over the missing files and running a heal on the volume (can take a little time to correct) On Fri, Jan 31, 2020 at 7:05 PM Christian Reiss <email@christian-reiss.de> wrote:
Hey folks,
in our production setup with 3 nodes (HCI) we took one host down (maintenance, stop gluster, poweroff via ssh/ovirt engine). Once it was up the gluster hat 2k healing entries that went down in a matter on 10 minutes to 2.
Those two give me a headache:
[root@node03:~] # gluster vol heal ssd_storage info Brick node01:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2
Brick node02:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0
Brick node03:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2
No paths, only gfid. We took down node2, so it does not have the file:
[root@node01:~] # md5sum
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 75c4941683b7eabc223fc9d5f022a77c
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
[root@node02:~] # md5sum
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 md5sum: /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6:
No such file or directory
[root@node03:~] # md5sum
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 75c4941683b7eabc223fc9d5f022a77c
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
The other two files are md5-identical.
These flags are identical, too:
[root@node01:~] # getfattr -d -m . -e hex
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 getfattr: Removing leading '/' from absolute path names # file:
gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2
getfattr: Removing leading '/' from absolute path names # file:
gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2
Now, I dont dare simply proceeding withouth some advice. Anyone got a clue on who to resolve this issue? File #2 is identical to this one, from a problem point of view.
Have a great weekend! -Chris.
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FGIQFIRC6QYN4A...

On February 1, 2020 1:34:30 AM GMT+02:00, Jayme <jaymef@gmail.com> wrote:
I have run into this exact issue before and resolved it by simply syncing over the missing files and running a heal on the volume (can take a little time to correct)
On Fri, Jan 31, 2020 at 7:05 PM Christian Reiss <email@christian-reiss.de> wrote:
Hey folks,
in our production setup with 3 nodes (HCI) we took one host down (maintenance, stop gluster, poweroff via ssh/ovirt engine). Once it was up the gluster hat 2k healing entries that went down in a matter on 10 minutes to 2.
Those two give me a headache:
[root@node03:~] # gluster vol heal ssd_storage info Brick node01:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2
Brick node02:/gluster_bricks/ssd_storage/ssd_storage Status: Connected Number of entries: 0
Brick node03:/gluster_bricks/ssd_storage/ssd_storage <gfid:a121e4fb-0984-4e41-94d7-8f0c4f87f4b6> <gfid:6f8817dc-3d92-46bf-aa65-a5d23f97490e> Status: Connected Number of entries: 2
No paths, only gfid. We took down node2, so it does not have the file:
[root@node01:~] # md5sum
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
75c4941683b7eabc223fc9d5f022a77c
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
[root@node02:~] # md5sum
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
md5sum:
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6:
No such file or directory
[root@node03:~] # md5sum
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
75c4941683b7eabc223fc9d5f022a77c
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
The other two files are md5-identical.
These flags are identical, too:
[root@node01:~] # getfattr -d -m . -e hex
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
getfattr: Removing leading '/' from absolute path names # file:
gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2
getfattr: Removing leading '/' from absolute path names # file:
gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000 trusted.afr.ssd_storage-client-1=0x0000004f0000000100000000 trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
trusted.glusterfs.mdata=0x010000000000000000000000005e349b1e000000001139aa2a000000005e349b1e000000001139aa2a000000005e34994900000000304a5eb2
Now, I dont dare simply proceeding withouth some advice. Anyone got a clue on who to resolve this issue? File #2 is identical
to
this one, from a problem point of view.
Have a great weekend! -Chris.
-- with kind regards, mit freundlichen Gruessen,
Christian Reiss _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FGIQFIRC6QYN4A...
There is an active thread in gluster-users , so it will be nice to mention this there. About the sync, you can find the paths via: 1. Mount mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol 2. Find the path of files: getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/<GFID> I bet it's the same file that causes me problems. Just verify the contents and you will see that one of them is newer -> just rsync it to the bricks node01 & node03 and run 'gluster volume heal <volume> full.' Best Regards, Strahil Nikolov

Hey Strahil, thanks for your answer. On 01/02/2020 08:18, Strahil Nikolov wrote:
There is an active thread in gluster-users , so it will be nice to mention this there.
About the sync, you can find the paths via: 1. Mount mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol 2. Find the path of files: getfattr -n trusted.glusterfs.pathinfo -e text/mnt/testvol/.gfid/<GFID>
I bet it's the same file that causes me problems. Just verify the contents and you will see that one of them is newer -> just rsync it to the bricks node01 & node03 and run 'gluster volume heal <volume> full.'
I did a cross-post to gluster-users just now. You are right, the brick-files have a slightly different timestamp: [root@node01:~] # stat /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 File: ‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’ Size: 67108864 Blocks: 54576 IO Block: 4096 regular file Device: fd09h/64777d Inode: 16152829909 Links: 2 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2020-01-31 22:16:57.812620635 +0100 Modify: 2020-02-01 07:19:24.183045141 +0100 Change: 2020-02-01 07:19:24.186045203 +0100 Birth: - [root@node03:~] # stat /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 File: ‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’ Size: 67108864 Blocks: 54576 IO Block: 4096 regular file Device: fd09h/64777d Inode: 16154259424 Links: 2 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2020-01-31 22:16:57.811800217 +0100 Modify: 2020-02-01 07:19:24.180939487 +0100 Change: 2020-02-01 07:19:24.184939586 +0100 Birth: - Contents (getfattr, md5) are still identical. I am unsure about your suggested rsync, tho: - node1 has file, - node2 does not, - node3 has file. So I can rsync node1 to node2 and node3 or node3 to node1 and node2. Sync to node1 and node3 cant be done as node2 does not have the file. Can I do the rsync on a live, running Gluster? -- with kind regards, mit freundlichen Gruessen, Christian Reiss

Hi! I did it with working Gluster. Just copy missing files from one of the host and start hesl volume after this. But the main - i dont understand, why this is happening with this issue. I saw this many times after maintanance of one host, for example.

On February 1, 2020 12:00:43 PM GMT+02:00, asm@pioner.kz wrote:
Hi! I did it with working Gluster. Just copy missing files from one of the host and start hesl volume after this. But the main - i dont understand, why this is happening with this issue. I saw this many times after maintanance of one host, for example. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/O5UALK2HXLQZDD...
Definately a bug - but I'm not sure if it's FUSE or server-side. Best Regards, Strahil Nikolov

On February 1, 2020 10:53:59 AM GMT+02:00, Christian Reiss <email@christian-reiss.de> wrote:
Hey Strahil,
thanks for your answer.
On 01/02/2020 08:18, Strahil Nikolov wrote:
There is an active thread in gluster-users , so it will be nice to mention this there.
About the sync, you can find the paths via: 1. Mount mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol 2. Find the path of files: getfattr -n trusted.glusterfs.pathinfo -e text/mnt/testvol/.gfid/<GFID>
I bet it's the same file that causes me problems. Just verify the contents and you will see that one of them is newer -> just rsync it to the bricks node01 & node03 and run 'gluster volume heal <volume> full.'
I did a cross-post to gluster-users just now. You are right, the brick-files have a slightly different timestamp:
[root@node01:~] # stat /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 File: ‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’ Size: 67108864 Blocks: 54576 IO Block: 4096 regular file Device: fd09h/64777d Inode: 16152829909 Links: 2 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2020-01-31 22:16:57.812620635 +0100 Modify: 2020-02-01 07:19:24.183045141 +0100 Change: 2020-02-01 07:19:24.186045203 +0100 Birth: -
[root@node03:~] # stat /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6 File: ‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’ Size: 67108864 Blocks: 54576 IO Block: 4096 regular file Device: fd09h/64777d Inode: 16154259424 Links: 2 Access: (0660/-rw-rw----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2020-01-31 22:16:57.811800217 +0100 Modify: 2020-02-01 07:19:24.180939487 +0100 Change: 2020-02-01 07:19:24.184939586 +0100 Birth: -
Contents (getfattr, md5) are still identical.
I am unsure about your suggested rsync, tho:
- node1 has file, - node2 does not, - node3 has file.
So I can rsync node1 to node2 and node3 or node3 to node1 and node2. Sync to node1 and node3 cant be done as node2 does not have the file.
Can I do the rsync on a live, running Gluster?
Hm .. Just copy the file to node2 as it is missing. In my case ovirt2 had a newer file than 1 & 3 Yes, you can sync the files - just place them in the same folder. Best Regards, Strahil Nikolov
participants (4)
-
asm@pioner.kz
-
Christian Reiss
-
Jayme
-
Strahil Nikolov