oVirt may have started as a vSphere 'look-alike', but it graduated to a Nutanix
'clone', at least in terms of marketing.
IMHO that means the 3-node hyperconverged default oVirt setup (2 replicas and 1 arbiter)
deserves special love in terms of documenting failure scenarios.
3-node HCI is supposed to defend you against long-term effects of any single point of
failure. There is no protection against the loss of dynamic state/session data, but
state-free services should recover or resume: that's what it's all about.
Sadly, what I find missing in the oVirt and Gluster documentation is an SOP (standard
operating procedure) that one should follow in case of a late-night/early-morning on-call
wakeup when one of those three HCI nodes should have failed... dramatically or via a
'brown out' e.g. where only the storage part was actually lost.
My impression is that the oVirt and Gluster teams are barely talking, but in HCI
that's fatal.
And I sure can't find those recovery procedures, not even in the commercial RH
documents.
So please, either add them or show me where I missed them.