Hello again,
I am back with a brief description of the situation I am in, and questions about the
recovery.
oVirt environment: 4.3.5.2 Hyperconverged
GlusterFS: Replica 2 + Arbiter 1
GlusterFS volumes: data, engine, vmstore
The current situation is the following:
- The Cluster is in Global Maintenance.
- The volume engine is up with comment (in the Web GUI) : Up, unsynched entries, needs
healing.
- The VM HostedEngine is paused due to a storage I/O error (Web GUI) while the output of
virsh list --all command shows that the HostedEngine is running.
I tried to issue the gluster heal command (gluster volume heal engine) but nothing
changed.
I have the following questions:
1. Should I restart the glusterd service? Where from? Is it enough if the glusterd is
restarted on one host or should it be restarted on the other two as well?
2. Should the node that was NonResponsive and came back, be rebooted or not? It seems
alright now and in good health.
3. Should the HostedEngine be restored with engine-backup or is it not necessary?
4. Could the loss of the DNS server for the oVirt hosts lead to an unresponsive host?
The nsswitch file on the ovirt hosts and engine, has the DNS defined as:
hosts: files dns myhostname
5. How can we recover/rectify the situation above?
Thanks for your help,
Maria Souvalioti