[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

1 Mar 2021

      Hello again, 

I am back with a brief description of the situation I am in, and questions about the recovery. 

oVirt environment: 4.3.5.2 Hyperconverged
GlusterFS: Replica 2 + Arbiter 1
GlusterFS volumes: data, engine, vmstore

The current situation is the following:

- The Cluster is in Global Maintenance.

- The volume engine is up with comment (in the Web GUI) : Up, unsynched entries, needs healing.

- The VM HostedEngine is paused due to a storage I/O error (Web GUI) while the output of virsh list --all command shows that the HostedEngine is running.

I tried to issue the gluster heal command (gluster volume heal engine) but nothing changed.

I have the following questions:

1. Should I restart the glusterd service? Where from? Is it enough if the glusterd is restarted on one host or should it be restarted on the other two as well?

2. Should the node that was NonResponsive and came back, be rebooted or not? It seems alright now and in good health.

3. Should the HostedEngine be restored with engine-backup or is it not necessary?

4. Could the loss of the DNS server for the oVirt hosts lead to an unresponsive host?
The nsswitch file on the ovirt hosts and engine, has the DNS defined as:
hosts:      files dns myhostname

5. How can we recover/rectify the situation above?

Thanks for your help,
Maria Souvalioti

[ovirt-users] Re: Gluster volume engine stuck in healing with 1 unsynched entry & HostedEngine paused

souvaliotimaria＠mail.com