[Users] SPM not selected after host failed

xrx xrx-ml at xrx.me
Sat Sep 22 15:09:00 UTC 2012


On 09/20/12 18:13, Itamar Heim wrote:
> On 09/20/2012 05:09 PM, Patrick Hurrelmann wrote:
>> On 20.09.2012 16:01, Itamar Heim wrote:
>>>> Power management is configured for both nodes. But this might be the
>>>> problem: we use the integrated IPMI over LAN power management - and
>>>> if I pull the plug on the machine the power management becomes un-
>>>> available, too.
>>>>
>>>> Could this be the problem?
>>>
>>> yes... no auto recovery if can't verify node was fenced.
>>> for your tests, maybe power off the machine for your tests as 
>>> opposed to
>>> "no power"?
>>
>> Ugh, this is ugly. I'm evaluating oVirt currently myself and have
>> already suffered from a dead PSU that took down IPMI as well. I really
>> don't want to imagine what happens if the host with SPM goes down due to
>> a power failure :/ Is there really no other way? I guess multiple fence
>> devices are not possible right now. E.g. first try to fence via IPMI and
>> if that fails pull the plug via APC MasterSwitch. Any thoughts?
>
> SPM would be down until you manually confirm shutdown in this case.
> SPM doesn't affect running VMs on NFS/posix/local domains, and only 
> thinly provisioned VMs on block storage (iscsi/FC).
>
> question, if no power, would the APC still work?
> why not just use it to fence instead of IPMI?
>
> (and helping us close the gap on support for multiple fence devices 
> would be great)

I have brought this issue up before (2012-03-03, "oVirt/RHEV fencing; a 
single point of failure"). This power yanking thing is a very common 
test, and it is embarrassing that RHEV fails. Not only would the SPM not 
move over (making thin provisioning too dangerous to have), but highly 
available VMs would not auto restart either; making high availability a 
joke compared to other products.

One solution, in addition to multiple fencing devices, would be to have 
an option to fence using SCSI persistent reservations. This wouldn't 
help Marc, as he's using NFS storage, but it would help in other cases 
when SANs are used instead of a NAS.

Also, would APC fencing work if there are redundant power supplies? I 
know RHCS/RHEL HA supports this, but it does oVirt?

There were talks of fencing being superseded by sanlock, is that 
happening? If fencing is going to stick around for sometime; I think 
this multiple fencing methods and SCSI fencing should be given a priority.

Marc: You don't need java. Someone else should be able to give you a 
better method, but a hack would be to replace an existing agent on the 
oVirt node directly, eg. /sbin/fence_apc, with your own python script 
(perhaps get inspiration from fence_ipdu). Maybe even try scripting hard 
coded unrelated backup fencing methods. Test it on the command line, and 
if it works type "persist /sbin/fence_apc" and select & test APC power 
management from the GUI. Repeat for all hypervisors & when they are 
upgraded/reinstalled. This is obviously not supported nor recommended I 
imagine; but having only IPMI may be arguably worse.


-xrx




More information about the Users mailing list