
On 09/20/12 18:13, Itamar Heim wrote:
On 09/20/2012 05:09 PM, Patrick Hurrelmann wrote:
On 20.09.2012 16:01, Itamar Heim wrote:
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too.
Could this be the problem?
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
Ugh, this is ugly. I'm evaluating oVirt currently myself and have already suffered from a dead PSU that took down IPMI as well. I really don't want to imagine what happens if the host with SPM goes down due to a power failure :/ Is there really no other way? I guess multiple fence devices are not possible right now. E.g. first try to fence via IPMI and if that fails pull the plug via APC MasterSwitch. Any thoughts?
SPM would be down until you manually confirm shutdown in this case. SPM doesn't affect running VMs on NFS/posix/local domains, and only thinly provisioned VMs on block storage (iscsi/FC).
question, if no power, would the APC still work? why not just use it to fence instead of IPMI?
(and helping us close the gap on support for multiple fence devices would be great)
I have brought this issue up before (2012-03-03, "oVirt/RHEV fencing; a single point of failure"). This power yanking thing is a very common test, and it is embarrassing that RHEV fails. Not only would the SPM not move over (making thin provisioning too dangerous to have), but highly available VMs would not auto restart either; making high availability a joke compared to other products. One solution, in addition to multiple fencing devices, would be to have an option to fence using SCSI persistent reservations. This wouldn't help Marc, as he's using NFS storage, but it would help in other cases when SANs are used instead of a NAS. Also, would APC fencing work if there are redundant power supplies? I know RHCS/RHEL HA supports this, but it does oVirt? There were talks of fencing being superseded by sanlock, is that happening? If fencing is going to stick around for sometime; I think this multiple fencing methods and SCSI fencing should be given a priority. Marc: You don't need java. Someone else should be able to give you a better method, but a hack would be to replace an existing agent on the oVirt node directly, eg. /sbin/fence_apc, with your own python script (perhaps get inspiration from fence_ipdu). Maybe even try scripting hard coded unrelated backup fencing methods. Test it on the command line, and if it works type "persist /sbin/fence_apc" and select & test APC power management from the GUI. Repeat for all hypervisors & when they are upgraded/reinstalled. This is obviously not supported nor recommended I imagine; but having only IPMI may be arguably worse. -xrx