[Users] SPM not selected after host failed
xrx
xrx-ml at xrx.me
Sat Sep 22 15:09:00 UTC 2012
On 09/20/12 18:13, Itamar Heim wrote:
> On 09/20/2012 05:09 PM, Patrick Hurrelmann wrote:
>> On 20.09.2012 16:01, Itamar Heim wrote:
>>>> Power management is configured for both nodes. But this might be the
>>>> problem: we use the integrated IPMI over LAN power management - and
>>>> if I pull the plug on the machine the power management becomes un-
>>>> available, too.
>>>>
>>>> Could this be the problem?
>>>
>>> yes... no auto recovery if can't verify node was fenced.
>>> for your tests, maybe power off the machine for your tests as
>>> opposed to
>>> "no power"?
>>
>> Ugh, this is ugly. I'm evaluating oVirt currently myself and have
>> already suffered from a dead PSU that took down IPMI as well. I really
>> don't want to imagine what happens if the host with SPM goes down due to
>> a power failure :/ Is there really no other way? I guess multiple fence
>> devices are not possible right now. E.g. first try to fence via IPMI and
>> if that fails pull the plug via APC MasterSwitch. Any thoughts?
>
> SPM would be down until you manually confirm shutdown in this case.
> SPM doesn't affect running VMs on NFS/posix/local domains, and only
> thinly provisioned VMs on block storage (iscsi/FC).
>
> question, if no power, would the APC still work?
> why not just use it to fence instead of IPMI?
>
> (and helping us close the gap on support for multiple fence devices
> would be great)
I have brought this issue up before (2012-03-03, "oVirt/RHEV fencing; a
single point of failure"). This power yanking thing is a very common
test, and it is embarrassing that RHEV fails. Not only would the SPM not
move over (making thin provisioning too dangerous to have), but highly
available VMs would not auto restart either; making high availability a
joke compared to other products.
One solution, in addition to multiple fencing devices, would be to have
an option to fence using SCSI persistent reservations. This wouldn't
help Marc, as he's using NFS storage, but it would help in other cases
when SANs are used instead of a NAS.
Also, would APC fencing work if there are redundant power supplies? I
know RHCS/RHEL HA supports this, but it does oVirt?
There were talks of fencing being superseded by sanlock, is that
happening? If fencing is going to stick around for sometime; I think
this multiple fencing methods and SCSI fencing should be given a priority.
Marc: You don't need java. Someone else should be able to give you a
better method, but a hack would be to replace an existing agent on the
oVirt node directly, eg. /sbin/fence_apc, with your own python script
(perhaps get inspiration from fence_ipdu). Maybe even try scripting hard
coded unrelated backup fencing methods. Test it on the command line, and
if it works type "persist /sbin/fence_apc" and select & test APC power
management from the GUI. Repeat for all hypervisors & when they are
upgraded/reinstalled. This is obviously not supported nor recommended I
imagine; but having only IPMI may be arguably worse.
-xrx
More information about the Users
mailing list