
Hello, I manage 2 oVirt clusters that are not associated in any way, they each have their own management engine running ovirt-engine-3.5.3.1-1. The servers are Dell 6xx series and the power-management is configured using idrac5 settings and each cluster is a pair of hypervisors. The engines are both in a datacenter that had an electrical issue, each cluster is at a different unrelated location. The problem I had was caused by a downed switch causing the individual engines to continue to function, however no longer have connectivity to their respective clusters. Once the switch was replaced (about 30 minutes of downtime) , when connectivity was resumed, both engines chose to fence one of the two "unresponsive hypervisors" by sending an iDrac command to power down. The downed hypervisor Cluster1 for some reason, 8 minutes later, got a iDrac command to power-up again. When I logged into the engine, the guests that were running on the powered-down host were in "off" state. I simply powered them back on. The downed hypervisor on Cluster2 stayed off, and was unresponsive according to the engine, however the VMs that were running on it were in an unknown state. I had to power on the host and click the "host has been rebooted" dialog for the cluster to free these guests to be booted again. My question is, is it normal for the engine to fence one or more hosts when it loses connectivity to all thehypervisors in the cluster? Is there a minimum of 3 hosts in a cluster for it to not fall into this mode? I'd like to know what I can troubleshoot or how I can avoid an issue like this should the engine be disconnected from the hypervisors temporarily and then resume connectivity only to kill the well-running guests. Thanks in advance, Marty