
Hello, I was worried about the high availability approach taken by RHEV/oVirt. I had read the thread titled "Some thoughts on enhancing High Availability in oVirt" but couldn't help but feel that oVirt is missing basic HA while it's developers are considering adding (and in my opinion unneeded) complexity with service monitoring. It all comes down to fencing. Picture this: 3 HP hypervisors running RHEV/oVirt with iLO fencing. Say hypervisor A runs 10 VMs, all of which are set to be highly available. Now suppose that hypervisor A has a power failure or an iLO failure (I've seen it happen more than once with a batch of HP DL380 G6s). Because RHEV would not be able to fence the hypervisor as it's iLO is unresponsive; those 10 HA VMs that were halted are NOT moved to other hypervisors automatically. I suggest that oVirt concentrates on having support for multiple fencing devices as a development priority. SCSI persistent reservation based fencing would be an ideal secondary, if not primary, fencing device; it would be easy to set up for users as SANs generally support it and is proven to work well, as seen on Red Hat clusters. I have brought up this point about fencing being a single point of failure in RHEV with a Red Hat employee (Mark Wagner) during the RHEV virtual event; but he said that it is not. I don't see how it isn't; one single loose iLO cable and the VMs are stuck until there is manual intervention. Any thoughts? -xrx