Hello,
I manage 2 oVirt clusters that are not associated in any way, they each
have their own management engine running ovirt-engine-3.5.3.1-1. The
servers are Dell 6xx series and the power-management is configured using
idrac5 settings and each cluster is a pair of hypervisors.
The engines are both in a datacenter that had an electrical issue, each
cluster is at a different unrelated location. The problem I had was
caused by a downed switch causing the individual engines to continue to
function, however no longer have connectivity to their respective
clusters. Once the switch was replaced (about 30 minutes of downtime) ,
when connectivity was resumed, both engines chose to fence one of the
two "unresponsive hypervisors" by sending an iDrac command to power down.
The downed hypervisor Cluster1 for some reason, 8 minutes later, got a
iDrac command to power-up again. When I logged into the engine, the
guests that were running on the powered-down host were in "off" state.
I simply powered them back on.
The downed hypervisor on Cluster2 stayed off, and was unresponsive
according to the engine, however the VMs that were running on it were in
an unknown state. I had to power on the host and click the "host has
been rebooted" dialog for the cluster to free these guests to be booted
again.
My question is, is it normal for the engine to fence one or more hosts
when it loses connectivity to all thehypervisors in the cluster? Is
there a minimum of 3 hosts in a cluster for it to not fall into this
mode? I'd like to know what I can troubleshoot or how I can avoid an
issue like this should the engine be disconnected from the hypervisors
temporarily and then resume connectivity only to kill the well-running
guests.
Thanks in advance,
Marty