
oVirt may have started as a vSphere 'look-alike', but it graduated to a Nutanix 'clone', at least in terms of marketing. IMHO that means the 3-node hyperconverged default oVirt setup (2 replicas and 1 arbiter) deserves special love in terms of documenting failure scenarios. 3-node HCI is supposed to defend you against long-term effects of any single point of failure. There is no protection against the loss of dynamic state/session data, but state-free services should recover or resume: that's what it's all about. Sadly, what I find missing in the oVirt and Gluster documentation is an SOP (standard operating procedure) that one should follow in case of a late-night/early-morning on-call wakeup when one of those three HCI nodes should have failed... dramatically or via a 'brown out' e.g. where only the storage part was actually lost. My impression is that the oVirt and Gluster teams are barely talking, but in HCI that's fatal. And I sure can't find those recovery procedures, not even in the commercial RH documents. So please, either add them or show me where I missed them.