Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

souvaliotimaria＠mail.com

28 Feb 2021 28 Feb '21

12:29 a.m.

Hello everyone, Any help would be greatly appreciated in the following problem. In my lab, the day before yesterday, we had power issues, with a UPS going off-line and following the power outage of the NFS/DNS server I have set up to serve ovirt with isos and as a DNS server (our other DNS servers are located as VMs within the oVirt environment). We found a broadcast storm on the switch (due to a faulty NIC on the aformentioned UPS) that the ovirt nodes are connected and later on had to re-establish several of the virtual connections as well. The above led to one of the hosts becoming NonResponsive, two machines becoming unresponsive and three VMs shuting down. The oVirt environment, version 4.3.5.2, is a replica 2 + arbiter 1 environment and runs GlusterFS with the recommended volumes of data, engine and vmstore. So far, the times there was some kind of a problem, usually oVirt was able to solve it by its own. This time, however, after we recovered from the above state, the volumes of data and vmstore successfully healing , the volume engine became stuck to the healing process (Up, unsynched entries, needs healing), and from the web GUI I see that the VM HostedEngine is paused due to a storage I/O error while the output of virsh list --all command shows that the HostedEngine is running.. How is that happening? I tried to manually trigger the healing process for the volume but nothing with gluster volume heal engine The command gluster volume heal engine info shows the following [root@ov-no3 ~]# gluster volume heal engine info Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 This morning I came upon this Reddit post https://www.reddit.com/r/gluster/comments/fl3yb7/entries_stuck_in_heal_pendi... where it seems that after a graceful reboot one of the ovirt hosts, the gluster came back online after it completed the appropriate healing processes. The thing is from what I have read that when there are unsynched entries in the gluster a host cannot be put into maintenance mode so that it can be rebooted, correct? Should I try to restart the glusterd service. Could someone tell me what I should do? Thank you all for your time and help, Maria Souvalioti

Show replies by date

souvaliotimaria＠mail.com

1 Mar 1 Mar

3:18 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Hello again, I am back with a brief description of the situation I am in, and questions about the recovery. oVirt environment: 4.3.5.2 Hyperconverged GlusterFS: Replica 2 + Arbiter 1 GlusterFS volumes: data, engine, vmstore The current situation is the following: - The Cluster is in Global Maintenance. - The volume engine is up with comment (in the Web GUI) : Up, unsynched entries, needs healing. - The VM HostedEngine is paused due to a storage I/O error (Web GUI) while the output of virsh list --all command shows that the HostedEngine is running. I tried to issue the gluster heal command (gluster volume heal engine) but nothing changed. I have the following questions: 1. Should I restart the glusterd service? Where from? Is it enough if the glusterd is restarted on one host or should it be restarted on the other two as well? 2. Should the node that was NonResponsive and came back, be rebooted or not? It seems alright now and in good health. 3. Should the HostedEngine be restored with engine-backup or is it not necessary? 4. Could the loss of the DNS server for the oVirt hosts lead to an unresponsive host? The nsswitch file on the ovirt hosts and engine, has the DNS defined as: hosts: files dns myhostname 5. How can we recover/rectify the situation above? Thanks for your help, Maria Souvalioti

Sandro Bonazzola

4:46 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

+Gobinda Das <godas@redhat.com> , +Satheesaran Sundaramoorthi <satheesaran@redhat.com> maybe you can help here Il giorno lun 1 mar 2021 alle ore 14:20 <souvaliotimaria@mail.com> ha scritto:

...

Hello again,

I am back with a brief description of the situation I am in, and questions about the recovery.

oVirt environment: 4.3.5.2 Hyperconverged GlusterFS: Replica 2 + Arbiter 1 GlusterFS volumes: data, engine, vmstore

The current situation is the following:

- The Cluster is in Global Maintenance.

- The volume engine is up with comment (in the Web GUI) : Up, unsynched entries, needs healing.

- The VM HostedEngine is paused due to a storage I/O error (Web GUI) while the output of virsh list --all command shows that the HostedEngine is running.

I tried to issue the gluster heal command (gluster volume heal engine) but nothing changed.

I have the following questions:

1. Should I restart the glusterd service? Where from? Is it enough if the glusterd is restarted on one host or should it be restarted on the other two as well?

2. Should the node that was NonResponsive and came back, be rebooted or not? It seems alright now and in good health.

3. Should the HostedEngine be restored with engine-backup or is it not necessary?

4. Could the loss of the DNS server for the oVirt hosts lead to an unresponsive host? The nsswitch file on the ovirt hosts and engine, has the DNS defined as: hosts: files dns myhostname

5. How can we recover/rectify the situation above?

Thanks for your help, Maria Souvalioti _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GO6S6GXRJWYZN5...

-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/> *Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

Alex K

3 Mar 3 Mar

7:44 a.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

On Mon, Mar 1, 2021, 15:20 <souvaliotimaria@mail.com> wrote:

...

Hello again,

I am back with a brief description of the situation I am in, and questions about the recovery.

oVirt environment: 4.3.5.2 Hyperconverged GlusterFS: Replica 2 + Arbiter 1 GlusterFS volumes: data, engine, vmstore

The current situation is the following:

- The Cluster is in Global Maintenance.

- The volume engine is up with comment (in the Web GUI) : Up, unsynched entries, needs healing.

- The VM HostedEngine is paused due to a storage I/O error (Web GUI) while the output of virsh list --all command shows that the HostedEngine is running.

I tried to issue the gluster heal command (gluster volume heal engine) but nothing changed.

I have the following questions:

1. Should I restart the glusterd service? Where from? Is it enough if the glusterd is restarted on one host or should it be restarted on the other two as well?

It sounds as a gluster split brain. I would start from there. Can you check status by listing split brain entries?

...

2. Should the node that was NonResponsive and came back, be rebooted or not? It seems alright now and in good health.

3. Should the HostedEngine be restored with engine-backup or is it not necessary?

4. Could the loss of the DNS server for the oVirt hosts lead to an unresponsive host? The nsswitch file on the ovirt hosts and engine, has the DNS defined as: hosts: files dns myhostname

If you have opted for dns liveliness checks it could be.

...

5. How can we recover/rectify the situation above?

I would start checking for gluster split brains and ensure that all hosts have connectivity in the storage domain net (ping, jumbo frames if enabled). 99% of my similar issues have been caused from gluster split. The fact that the engine is shown as paused and that you can still access web ui makes me think you have a split brain issue

...

Thanks for your help, Maria Souvalioti _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GO6S6GXRJWYZN5...

souvaliotimaria＠mail.com

7:10 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Hello, Thank you very much for your reply. I get the following from the below gluster commands: [root@ov-no1 ~]# gluster volume heal engine info split-brain Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries in split-brain: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries in split-brain: 0 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries in split-brain: 0 [root@ov-no1 ~]# gluster volume heal engine info summary Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 [root@ov-no1 ~]# gluster volume info Volume Name: data Type: Replicate Volume ID: 6c7bb2e4-ed35-4826-81f6-34fcd2d0a984 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ov-no1.ariadne-t.local:/gluster_bricks/data/data Brick2: ov-no2.ariadne-t.local:/gluster_bricks/data/data Brick3: ov-no3.ariadne-t.local:/gluster_bricks/data/data (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable Volume Name: engine Type: Replicate Volume ID: 7173c827-309f-4e84-a0da-6b2b8eb50264 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Brick2: ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Brick3: ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable Volume Name: vmstore Type: Replicate Volume ID: 29992fc1-3e09-4360-b651-4449fcd32767 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ov-no1.ariadne-t.local:/gluster_bricks/vmstore/vmstore Brick2: ov-no2.ariadne-t.local:/gluster_bricks/vmstore/vmstore Brick3: ov-no3.ariadne-t.local:/gluster_bricks/vmstore/vmstore (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable [root@ov-no1 ~]# gluster volume heal engine info Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 However, checking the contents of the above entry in each host I get the following output in which it's shown that the file in the third host has a different date (the current date): [root@ov-no1 ~]# ls /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/ total 4.6G drwxr-xr-x. 2 vdsm kvm 149 Sep 11 2019 . drwxr-xr-x. 8 vdsm kvm 8.0K Sep 11 2019 .. -rw-rw----. 1 vdsm kvm 100G Dec 30 13:20 a48555f4-be23-4467-8a54-400ae7baf9d7 -rw-rw----. 1 vdsm kvm 1.0M Feb 24 20:50 a48555f4-be23-4467-8a54-400ae7baf9d7.lease -rw-r--r--. 1 vdsm kvm 321 Sep 11 2019 a48555f4-be23-4467-8a54-400ae7baf9d7.meta [root@ov-no2 ~]# ls /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/ total 4.6G drwxr-xr-x. 2 vdsm kvm 149 Sep 11 2019 . drwxr-xr-x. 8 vdsm kvm 8.0K Sep 11 2019 .. -rw-rw----. 1 vdsm kvm 100G Dec 30 13:20 a48555f4-be23-4467-8a54-400ae7baf9d7 -rw-rw----. 1 vdsm kvm 1.0M Feb 24 20:50 a48555f4-be23-4467-8a54-400ae7baf9d7.lease -rw-r--r--. 1 vdsm kvm 321 Sep 11 2019 a48555f4-be23-4467-8a54-400ae7baf9d7.meta [root@ov-no3 ~]# ls /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/ total 4.6G drwxr-xr-x. 2 vdsm kvm 149 Sep 11 2019 . drwxr-xr-x. 8 vdsm kvm 8.0K Sep 11 2019 .. -rw-rw----. 1 vdsm kvm 100G Mar 3 18:13 a48555f4-be23-4467-8a54-400ae7baf9d7 -rw-rw----. 1 vdsm kvm 1.0M Feb 24 20:50 a48555f4-be23-4467-8a54-400ae7baf9d7.lease -rw-r--r--. 1 vdsm kvm 321 Sep 11 2019 a48555f4-be23-4467-8a54-400ae7baf9d7.meta Also, the stat command on each host gives the following: [root@ov-no1 ~]# stat /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 File: ‘/rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7’ Size: 107374182400 Blocks: 9569291 IO Block: 131072 regular file Device: 29h/41d Inode: 10220711633933694927 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-09-11 19:08:58.012200046 +0300 Modify: 2020-12-30 13:20:39.794315096 +0200 Change: 2020-12-30 13:20:39.794315096 +0200 Birth: - [root@ov-no2 ~]# stat /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 File: ‘/rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7’ Size: 107374182400 Blocks: 9569291 IO Block: 131072 regular file Device: 29h/41d Inode: 10220711633933694927 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-09-11 19:08:58.012200046 +0300 Modify: 2020-12-30 13:20:39.794315096 +0200 Change: 2020-12-30 13:20:39.794315096 +0200 Birth: - [root@ov-no3 ~]# stat /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 File: ‘/rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7’ Size: 107374182400 Blocks: 9569291 IO Block: 131072 regular file Device: 29h/41d Inode: 10220711633933694927 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2020-10-02 03:02:51.104699119 +0300 Modify: 2021-03-03 18:23:07.122575696 +0200 Change: 2021-03-03 18:23:07.122575696 +0200 Birth: - Should I use the gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> command to initiate the healing process?

Alex K

10:02 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

On Wed, Mar 3, 2021, 19:13 <souvaliotimaria@mail.com> wrote:

...

Hello,

Thank you very much for your reply.

I get the following from the below gluster commands:

[root@ov-no1 ~]# gluster volume heal engine info split-brain Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries in split-brain: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries in split-brain: 0

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries in split-brain: 0

[root@ov-no1 ~]# gluster volume heal engine info summary Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0

[root@ov-no1 ~]# gluster volume info Volume Name: data Type: Replicate Volume ID: 6c7bb2e4-ed35-4826-81f6-34fcd2d0a984 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ov-no1.ariadne-t.local:/gluster_bricks/data/data Brick2: ov-no2.ariadne-t.local:/gluster_bricks/data/data Brick3: ov-no3.ariadne-t.local:/gluster_bricks/data/data (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable

Volume Name: engine Type: Replicate Volume ID: 7173c827-309f-4e84-a0da-6b2b8eb50264 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Brick2: ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Brick3: ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable

Volume Name: vmstore Type: Replicate Volume ID: 29992fc1-3e09-4360-b651-4449fcd32767 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ov-no1.ariadne-t.local:/gluster_bricks/vmstore/vmstore Brick2: ov-no2.ariadne-t.local:/gluster_bricks/vmstore/vmstore Brick3: ov-no3.ariadne-t.local:/gluster_bricks/vmstore/vmstore (arbiter) Options Reconfigured: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.strict-o-direct: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off client.event-threads: 4 server.event-threads: 4 network.ping-timeout: 30 storage.owner-uid: 36 storage.owner-gid: 36 cluster.granular-entry-heal: enable

[root@ov-no1 ~]# gluster volume heal engine info Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine

/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine

/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1

However, checking the contents of the above entry in each host I get the following output in which it's shown that the file in the third host has a different date (the current date):

[root@ov-no1 ~]# ls /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/ total 4.6G drwxr-xr-x. 2 vdsm kvm 149 Sep 11 2019 . drwxr-xr-x. 8 vdsm kvm 8.0K Sep 11 2019 .. -rw-rw----. 1 vdsm kvm 100G Dec 30 13:20 a48555f4-be23-4467-8a54-400ae7baf9d7 -rw-rw----. 1 vdsm kvm 1.0M Feb 24 20:50 a48555f4-be23-4467-8a54-400ae7baf9d7.lease -rw-r--r--. 1 vdsm kvm 321 Sep 11 2019 a48555f4-be23-4467-8a54-400ae7baf9d7.meta

[root@ov-no2 ~]# ls /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/ total 4.6G drwxr-xr-x. 2 vdsm kvm 149 Sep 11 2019 . drwxr-xr-x. 8 vdsm kvm 8.0K Sep 11 2019 .. -rw-rw----. 1 vdsm kvm 100G Dec 30 13:20 a48555f4-be23-4467-8a54-400ae7baf9d7 -rw-rw----. 1 vdsm kvm 1.0M Feb 24 20:50 a48555f4-be23-4467-8a54-400ae7baf9d7.lease -rw-r--r--. 1 vdsm kvm 321 Sep 11 2019 a48555f4-be23-4467-8a54-400ae7baf9d7.meta

[root@ov-no3 ~]# ls /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/ total 4.6G drwxr-xr-x. 2 vdsm kvm 149 Sep 11 2019 . drwxr-xr-x. 8 vdsm kvm 8.0K Sep 11 2019 .. -rw-rw----. 1 vdsm kvm 100G Mar 3 18:13 a48555f4-be23-4467-8a54-400ae7baf9d7 -rw-rw----. 1 vdsm kvm 1.0M Feb 24 20:50 a48555f4-be23-4467-8a54-400ae7baf9d7.lease -rw-r--r--. 1 vdsm kvm 321 Sep 11 2019 a48555f4-be23-4467-8a54-400ae7baf9d7.meta

Also, the stat command on each host gives the following:

[root@ov-no1 ~]# stat /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 File: ‘/rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7’ Size: 107374182400 Blocks: 9569291 IO Block: 131072 regular file Device: 29h/41d Inode: 10220711633933694927 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-09-11 19:08:58.012200046 +0300 Modify: 2020-12-30 13:20:39.794315096 +0200 Change: 2020-12-30 13:20:39.794315096 +0200 Birth: -

[root@ov-no2 ~]# stat /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 File: ‘/rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7’ Size: 107374182400 Blocks: 9569291 IO Block: 131072 regular file Device: 29h/41d Inode: 10220711633933694927 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-09-11 19:08:58.012200046 +0300 Modify: 2020-12-30 13:20:39.794315096 +0200 Change: 2020-12-30 13:20:39.794315096 +0200 Birth: -

[root@ov-no3 ~]# stat /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 File: ‘/rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7’ Size: 107374182400 Blocks: 9569291 IO Block: 131072 regular file Device: 29h/41d Inode: 10220711633933694927 Links: 1 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2020-10-02 03:02:51.104699119 +0300 Modify: 2021-03-03 18:23:07.122575696 +0200 Change: 2021-03-03 18:23:07.122575696 +0200 Birth: -

Should I use the gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> command to initiate the healing process?

Did you try to heal with latest-mtime?

...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KDENIOVUBH662L...

souvaliotimaria＠mail.com

4 Mar 4 Mar

12:51 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

I tried only the simple healing because I wasn't sure if I'd mess the gluster more than it already is. I will try latest-mtime in a couple of hours because the system is a production system and I have to do it after office hours. I will come back with an update. Thank you very much for your help!

souvaliotimaria＠mail.com

8:58 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Hello again, I've tried to heal the brick with latest-mtime, but I get the following: gluster volume heal engine split-brain latest-mtime /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Healing /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 failed: File not in split-brain. Volume heal failed. Should I try the solution described in this question, where we manually remove the conflicting entry, triggering the heal operations? https://lists.ovirt.org/archives/list/users@ovirt.org/thread/RPYIMSQCBYVQ654...

Alex K

5 Mar 5 Mar

3:46 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

On Thu, Mar 4, 2021 at 8:59 PM <souvaliotimaria@mail.com> wrote:

...

Hello again, I've tried to heal the brick with latest-mtime, but I get the following:

gluster volume heal engine split-brain latest-mtime /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Healing /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 failed: File not in split-brain. Volume heal failed.

you can try to run ls at the directory where the file which healing is pending resides. This might trigger the healing process of that file.

...

Should I try the solution described in this question, where we manually remove the conflicting entry, triggering the heal operations?

https://lists.ovirt.org/archives/list/users@ovirt.org/thread/RPYIMSQCBYVQ654... _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CCRNM7N3FSUYXD...

Strahil Nikolov

6 Mar 6 Mar

9:13 a.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

If it's a VM image, just use dd to read the whole file.dd if=VM_imageof=/dev/null bs=10M status=progress Best Regards,Strahil Nikolov On Fri, Mar 5, 2021 at 15:48, Alex K<rightkicktech@gmail.com> wrote: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RJO7EVEW2C3P7E...

souvaliotimaria＠mail.com

8 Mar 8 Mar

2:52 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Thank you for your reply. I'm trying that right now and I see it triggered the self-healing process. I will come back with an update. Best regards.

Strahil Nikolov

9 Mar 9 Mar

7:33 a.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Also check the status of the file on each brick with the getfattr command ( see https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ ) and provide the output. Best Regards,Strahil Nikolov Thank you for your reply. I'm trying that right now and I see it triggered the self-healing process. I will come back with an update. Best regards. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WKW4RAVHVOZN6C...

Maria Souvalioti

3:29 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

The commandgetfattr -n replica.split-brain-status <file> gives the following: [root@ov-no1 ~]# getfattr -n replica.split-brain-status /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 replica.split-brain-status="The file is not under data or metadata split-brain" And getfattr -d -m . -e hex <file> command gives : [root@ov-no1 ~]# getfattr -d -m . -e hex /rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local\:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: rhev/data-center/mnt/glusterSD/ov-no1.ariadne-t.local:_engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000 Also, from what I can tell, in the GUI the brick seems to still be in the healing process (since I run the dd command yesterday), as the counters in self-heal info field change over time. Thank you for your help On 3/9/2021 7:33 AM, Strahil Nikolov via Users wrote:

...

Also check the status of the file on each brick with the getfattr command ( see https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ ) and provide the output.

Best Regards, Strahil Nikolov

Thank you for your reply. I'm trying that right now and I see it triggered the self-healing process. I will come back with an update. Best regards.

_______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WKW4RAVHVOZN6C... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/WKW4RAVHVOZN6CZVK2TOC7727DHLKWRZ/>

Strahil Nikolov

9:11 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

The output of the command seems quite wierd: 'getfattr -d -m . -e hex file' Is it the same on all nodes ? Best Regards,Strahil Nikolov On Tue, Mar 9, 2021 at 15:36, Maria Souvalioti<souvaliotimaria@mail.com> wrote: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OHK2ZRG5OESS3O...

Maria Souvalioti

11:02 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Sorry I run the getfattr command wrongly. I run it again as getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 on each node and I got different results on the following attributes: -trusted.afr.dirty It is 0x000003940000000000000000 on node 1, and 0x000000000000000000000000 on the other two -trusted.afr.engine-client-0 It is 0x0000043a0000000000000000 on node 2 and 3, but node 1 doesn't have it at all. -trusted.afr.engine-client-2 It is 0x000000000000000000000000 on node 1 and 0x000004440000000000000000 on node 2. Node 3 doesn't have this entry at all. Hope this helps. Thanks for your help On 3/9/2021 9:11 PM, Strahil Nikolov via Users wrote:

...

The output of the command seems quite wierd: 'getfattr -d -m . -e hex file' Is it the same on all nodes ?

Best Regards, Strahil Nikolov

On Tue, Mar 9, 2021 at 15:36, Maria Souvalioti <souvaliotimaria@mail.com> wrote: _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OHK2ZRG5OESS3O... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OHK2ZRG5OESS3OGFSBQTZ66B5HF5X6G3/>

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MBG4A2DTXL5HW3...

souvaliotimaria＠mail.com

11:06 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

The output of the getfattr command on the nodes was the following: Node1: [root@ov-no1 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000003940000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000 Node2: [root@ov-no2 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000043a0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000 Node3: [root@ov-no3 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000004440000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Strahil Nikolov

10 Mar 10 Mar

10:21 a.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

It seems to me that ov-no1 didn't update the file properly. What was the output of the gluster volume heal command ? Best Regards,Strahil Nikolov The output of the getfattr command on the nodes was the following: Node1: [root@ov-no1 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000003940000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000 Node2: [root@ov-no2 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000043a0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000 Node3: [root@ov-no3 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000004440000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000 _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PUVBESAIZEJ7UR...

Maria Souvalioti

12:58 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

The gluster volume heal engine command didn't output anything in the CLI. The gluster volume heal engine info gives: # gluster volume heal engine info Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 And gluster volume heal engine info summary gives: # gluster volume heal engine info summary Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Also I found the following warning message in the logs that has been repeating itself since the problem started: [2021-03-10 10:08:11.646824] W [MSGID: 114061] [client-common.c:2644:client_pre_fsync_v2] 0-engine-client-0: (3fafabf3-d0cd-4b9a-8dd7-43145451f7cf) remote_fd is -1. EBADFD [File descriptor in bad state] And from what I see in the logs, the healing process seems to be still trying to fix the volume. [2021-03-10 10:47:34.820229] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on 3fafabf3-d0cd-4b9a-8dd7-43145451f7cf. sources=1 [2] sinks=0 The message "I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on 3fafabf3-d0cd-4b9a-8dd7-43145451f7cf. sources=1 [2] sinks=0 " repeated 8 times between [2021-03-10 10:47:34.820229] and [2021-03-10 10:48:00.088805] On 3/10/21 10:21 AM, Strahil Nikolov via Users wrote:

...

It seems to me that ov-no1 didn't update the file properly.

What was the output of the gluster volume heal command ?

Best Regards, Strahil Nikolov

The output of the getfattr command on the nodes was the following:

Node1: [root@ov-no1 <mailto:root@ov-no1> ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000003940000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node2: [root@ov-no2 <mailto:root@ov-no2> ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000043a0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node3: [root@ov-no3 <mailto:root@ov-no3> ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000004440000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

_______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PUVBESAIZEJ7UR...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/R3ODLVEODDFWP3...

Strahil Nikolov

11 Mar 11 Mar

8:06 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

It seems that the affected file can be moved away on ov-no1.ariadne-t.local, as the other 2 bricks "blame" the entry on ov-no1.ariadne-t.local . After that , you will need to "gluster volume heal <VOLUME_NAME> full" to trigger the heal. Best Regards, Strahil Nikolov В сряда, 10 март 2021 г., 12:58:10 ч. Гринуич+2, Maria Souvalioti <souvaliotimaria@mail.com> написа: The gluster volume heal engine command didn't output anything in the CLI. The gluster volume heal engine info gives: # gluster volume heal engine info Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 And gluster volume heal engine info summary gives: # gluster volume heal engine info summary Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Also I found the following warning message in the logs that has been repeating itself since the problem started: [2021-03-10 10:08:11.646824] W [MSGID: 114061] [client-common.c:2644:client_pre_fsync_v2] 0-engine-client-0: (3fafabf3-d0cd-4b9a-8dd7-43145451f7cf) remote_fd is -1. EBADFD [File descriptor in bad state] And from what I see in the logs, the healing process seems to be still trying to fix the volume. [2021-03-10 10:47:34.820229] I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on 3fafabf3-d0cd-4b9a-8dd7-43145451f7cf. sources=1 [2] sinks=0 The message "I [MSGID: 108026] [afr-self-heal-common.c:1741:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on 3fafabf3-d0cd-4b9a-8dd7-43145451f7cf. sources=1 [2] sinks=0 " repeated 8 times between [2021-03-10 10:47:34.820229] and [2021-03-10 10:48:00.088805] On 3/10/21 10:21 AM, Strahil Nikolov via Users wrote:

...

It seems to me that ov-no1 didn't update the file properly.

What was the output of the gluster volume heal command ? Best Regards, Strahil Nikolov

...

The output of the getfattr command on the nodes was the following:

Node1: [root@ov-no1 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000003940000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node2: [root@ov-no2 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000043a0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node3: [root@ov-no3 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000004440000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PUVBESAIZEJ7UR...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/R3ODLVEODDFWP3...

Maria Souvalioti

10 Mar 10 Mar

1:01 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Should I delete the file and restart glusterd on the ov-no1 server? Thank you very much On 3/10/21 10:21 AM, Strahil Nikolov via Users wrote:

...

It seems to me that ov-no1 didn't update the file properly.

What was the output of the gluster volume heal command ?

Best Regards, Strahil Nikolov

The output of the getfattr command on the nodes was the following:

Node1: [root@ov-no1 <mailto:root@ov-no1> ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000003940000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node2: [root@ov-no2 <mailto:root@ov-no2> ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000043a0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node3: [root@ov-no3 <mailto:root@ov-no3> ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000004440000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

_______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PUVBESAIZEJ7UR...

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/R3ODLVEODDFWP3...

Strahil Nikolov

11 Mar 11 Mar

8:06 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Just move it away (to be on the safe side) and trigger a full heal. Best Regards, Strahil Nikolov В сряда, 10 март 2021 г., 13:01:21 ч. Гринуич+2, Maria Souvalioti <souvaliotimaria@mail.com> написа: Should I delete the file and restart glusterd on the ov-no1 server? Thank you very much On 3/10/21 10:21 AM, Strahil Nikolov via Users wrote:

...

It seems to me that ov-no1 didn't update the file properly.

What was the output of the gluster volume heal command ? Best Regards, Strahil Nikolov

...

The output of the getfattr command on the nodes was the following:

Node1: [root@ov-no1 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000003940000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node2: [root@ov-no2 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000043a0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

Node3: [root@ov-no3 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000004440000000000000000 trusted.gfid=0x3fafabf3d0cd4b9a8dd743145451f7cf trusted.gfid2path.06f4f1065c7ed193=0x36313936323032302d386431342d343261372d613565332d3233346365656635343035632f61343835353566342d626532332d343436372d386135342d343030616537626166396437 trusted.glusterfs.mdata=0x010000000000000000000000005fec6287000000002f584958000000005fec6287000000002f584958000000005d791c1a0000000000ba286e trusted.glusterfs.shard.block-size=0x0000000004000000 trusted.glusterfs.shard.file-size=0x00000019000000000000000000000000000000000092040b0000000000000000

_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PUVBESAIZEJ7UR...

souvaliotimaria＠mail.com

8 Mar 8 Mar

2:51 p.m.

New subject: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

Thank you. I have tried that and it didn't work as the system sees that the file is not in split-brain. I have also tried force heal and full heal and still nothing. I always end up with the entry being stuck in unsynched stage.

1610

Age (days ago)

1622

Last active (days ago)

List overview

Download

21 comments

5 participants

participants (5)

Alex K
Maria Souvalioti
Sandro Bonazzola
souvaliotimaria＠mail.com
Strahil Nikolov

Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

tags

participants (5)