On March 16, 2020 4:47:01 PM GMT+02:00, Dario Pilori <d.pilori(a)inrim.it> wrote:
Dear all,
after a switch failure, our three-host oVirt hyperconverged setup has
strange issues with the gluster replicate-3 volume that contains the
hosted-engine VM.
Basically, after a host is properly rebooted (but not always after
every reboot, it happens quite randomly), the hosted-engine starts,
but it is immediately paused. On the other hosts, it runs perfectly.
After some digging in the documentation, I realized that this is due
to a storage issue. However, the health of the gluster volume is OK,
and forcing heal does not fix the problem.
The only solution (or workaround, I would say) is to reset the brick
on the faulty host and re-format the brick XFS file system.
This leaves me with some questions, which are: Why is the volume
health OK, while it is clearly not OK? If so, which commands do I need
to use to detect gluster issues? And, why is this situation happening?
Any suggestion is appreciated.
Regards,
Dario
You will need to give some info about the environment:
Gluster version
Gluster op-version
Gluster Bricks' file system
Have you tried to write in the gluster volume ?
Anything in the gluster brick logs (/var/log/gluster/bricks/<mountpoint>.log) ?
Best Regards,
Strahil Nikolov