
Dear all, after a switch failure, our three-host oVirt hyperconverged setup has strange issues with the gluster replicate-3 volume that contains the hosted-engine VM. Basically, after a host is properly rebooted (but not always after every reboot, it happens quite randomly), the hosted-engine starts, but it is immediately paused. On the other hosts, it runs perfectly. After some digging in the documentation, I realized that this is due to a storage issue. However, the health of the gluster volume is OK, and forcing heal does not fix the problem. The only solution (or workaround, I would say) is to reset the brick on the faulty host and re-format the brick XFS file system. This leaves me with some questions, which are: Why is the volume health OK, while it is clearly not OK? If so, which commands do I need to use to detect gluster issues? And, why is this situation happening? Any suggestion is appreciated. Regards, Dario -- Dario Pilori, PhD I.N.Ri.M. - Istituto Nazionale di Ricerca Metrologica Sistemi Informatici Strada delle Cacce, 91 - 10135 - Torino - Italy Ph: +39 011 3919 459