[Users] SPM not selected after host failed

Hello all, we are currently in the process of evaluating oVirt as a basis for our new virutalization environment. As far as our evaluation has processed it seems to be the way to go, but when testing the high availability features I ran into a serious problem: Our testing setup looks like this: 2 hosts on Dell R210 and R210II machines, a seperate machine running the managing application in JBoss and providing storage space through NFS. Under normal conditions everything works fine: I can migrate machines between the two nodes, I can add a third node, access everything by VNC, monitor the VMs really nicely, the power management feature of the R210s work just fine. Then, when simulating the loss of a host by pulling the plug on the machine, (yes, that is kind of a crude check) some things seem to go terribly wrong: the system detects the host being unresponsive and assumes it is down. But the host happens to be the SPM and the other does not take over this function. This leaves the hole cluster in an unresponseive state and my datacenter is gone. I tracked down the problem in the log files to the point where the engine tries to migrate the SPM to another node: 2012-09-20 07:54:40,836 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-60) SPM selection - vds seems as spm node03 2012-09-20 07:54:40,837 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-60) spm vds is non responsive, stopping spm selection. 2012-09-20 07:54:44,344 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-51) XML RPC error in command GetCapabilitiesVDS ( Vds: node03 ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: Keine Route zum Zielrechner 2012-09-20 07:54:47,345 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-47) XML RPC error in command GetCapabilitiesVDS ( Vds: node03 ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: Keine Route zum Zielrechner 2012-09-20 07:54:50,869 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) hostFromVds::selectedVds - node04, spmStatus Free, storage pool ingenit 2012-09-20 07:54:50,892 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) SPM Init: could not find reported vds or not up - pool:ingenit vds_spm_id: 2 2012-09-20 07:54:50,905 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) SPM selection - vds seems as spm node03 As far as I understand these logs, the engine detects node03 not being responsive, starts electing a new SPM but does not find node04. That is strange as the host is online, pingable and worked just fine as part of the cluster. What I can do to remedy the situation using the management interface to set "Confirm Host has been rebooted" and switch the host into maintenance mode after that. Than the responsive node takes over and the VMs are being migrated, too. Has anyone experienced a similar problem? Is this by design and killing off the SPM is a bad coincident and always requires manual intervention? I would hope not :-) I tried to google some answers, but aside from a thread in May that did not help I came up empty. Thanks in advance for all the help... Kind regards from Germany, Marc -- ________________________________________________________________________ Dipl.-Inform. Marc-Christian Schröer schroeer@ingenit.com Geschäftsführer / CEO ---------------------------------------------------------------------- ingenit GmbH & Co. KG Tel. +49 (0)231 58 698-120 Emil-Figge-Strasse 76-80 Fax. +49 (0)231 58 698-121 D-44227 Dortmund www.ingenit.com Registergericht: Amtsgericht Dortmund, HRA 13 914 Gesellschafter : Thomas Klute, Marc-Christian Schröer ________________________________________________________________________

On 09/20/2012 09:02 AM, "Marc-Christian Schröer | ingenit GmbH & Co. KG" wrote:
Hello all,
we are currently in the process of evaluating oVirt as a basis for our new virutalization environment. As far as our evaluation has processed it seems to be the way to go, but when testing the high availability features I ran into a serious problem:
Our testing setup looks like this: 2 hosts on Dell R210 and R210II machines, a seperate machine running the managing application in JBoss and providing storage space through NFS. Under normal conditions everything works fine: I can migrate machines between the two nodes, I can add a third node, access everything by VNC, monitor the VMs really nicely, the power management feature of the R210s work just fine.
Then, when simulating the loss of a host by pulling the plug on the machine, (yes, that is kind of a crude check) some things seem to go terribly wrong: the system detects the host being unresponsive and assumes it is down. But the host happens to be the SPM and the other does not take over this function. This leaves the hole cluster in an unresponseive state and my datacenter is gone. I tracked down the problem in the log files to the point where the engine tries to migrate the SPM to another node:
2012-09-20 07:54:40,836 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-60) SPM selection - vds seems as spm node03 2012-09-20 07:54:40,837 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-60) spm vds is non responsive, stopping spm selection. 2012-09-20 07:54:44,344 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-51) XML RPC error in command GetCapabilitiesVDS ( Vds: node03 ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: Keine Route zum Zielrechner 2012-09-20 07:54:47,345 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-47) XML RPC error in command GetCapabilitiesVDS ( Vds: node03 ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: Keine Route zum Zielrechner 2012-09-20 07:54:50,869 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) hostFromVds::selectedVds - node04, spmStatus Free, storage pool ingenit 2012-09-20 07:54:50,892 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) SPM Init: could not find reported vds or not up - pool:ingenit vds_spm_id: 2 2012-09-20 07:54:50,905 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-69) SPM selection - vds seems as spm node03
As far as I understand these logs, the engine detects node03 not being responsive, starts electing a new SPM but does not find node04. That is strange as the host is online, pingable and worked just fine as part of the cluster.
What I can do to remedy the situation using the management interface to set "Confirm Host has been rebooted" and switch the host into maintenance mode after that. Than the responsive node takes over and the VMs are being migrated, too.
Has anyone experienced a similar problem? Is this by design and killing off the SPM is a bad coincident and always requires manual intervention? I would hope not :-)
I tried to google some answers, but aside from a thread in May that did not help I came up empty.
Thanks in advance for all the help...
Kind regards from Germany, Marc
is power management configured on both hosts? since the non responsive node happened to be the SPM, it must be fenced. engine should to this automatically (and this is what you did manually by 'confirm host has been rebooted'. but engine can only do this automatically if power management is configured on both hosts.

Am 20.09.2012 15:34, schrieb Itamar Heim: Hello Itamar, thank you for your answer.
is power management configured on both hosts? since the non responsive node happened to be the SPM, it must be fenced. engine should to this automatically (and this is what you did manually by 'confirm host has been rebooted'. but engine can only do this automatically if power management is configured on both hosts.
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too. Could this be the problem? Kind regards, Marc -- ________________________________________________________________________ Dipl.-Inform. Marc-Christian Schröer schroeer@ingenit.com Geschäftsführer / CEO ---------------------------------------------------------------------- ingenit GmbH & Co. KG Tel. +49 (0)231 58 698-120 Emil-Figge-Strasse 76-80 Fax. +49 (0)231 58 698-121 D-44227 Dortmund www.ingenit.com Registergericht: Amtsgericht Dortmund, HRA 13 914 Gesellschafter : Thomas Klute, Marc-Christian Schröer ________________________________________________________________________

On 09/20/2012 04:55 PM, "Marc-Christian Schröer | ingenit GmbH & Co. KG" wrote:
Am 20.09.2012 15:34, schrieb Itamar Heim:
Hello Itamar,
thank you for your answer.
is power management configured on both hosts? since the non responsive node happened to be the SPM, it must be fenced. engine should to this automatically (and this is what you did manually by 'confirm host has been rebooted'. but engine can only do this automatically if power management is configured on both hosts.
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too.
Could this be the problem?
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"? you could use APM, but if no power, even APM won't reply to fencing.
Kind regards, Marc

On 20.09.2012 16:01, Itamar Heim wrote:
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too.
Could this be the problem?
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
Ugh, this is ugly. I'm evaluating oVirt currently myself and have already suffered from a dead PSU that took down IPMI as well. I really don't want to imagine what happens if the host with SPM goes down due to a power failure :/ Is there really no other way? I guess multiple fence devices are not possible right now. E.g. first try to fence via IPMI and if that fails pull the plug via APC MasterSwitch. Any thoughts? Regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

On 09/20/2012 05:09 PM, Patrick Hurrelmann wrote:
On 20.09.2012 16:01, Itamar Heim wrote:
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too.
Could this be the problem?
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
Ugh, this is ugly. I'm evaluating oVirt currently myself and have already suffered from a dead PSU that took down IPMI as well. I really don't want to imagine what happens if the host with SPM goes down due to a power failure :/ Is there really no other way? I guess multiple fence devices are not possible right now. E.g. first try to fence via IPMI and if that fails pull the plug via APC MasterSwitch. Any thoughts?
SPM would be down until you manually confirm shutdown in this case. SPM doesn't affect running VMs on NFS/posix/local domains, and only thinly provisioned VMs on block storage (iscsi/FC). question, if no power, would the APC still work? why not just use it to fence instead of IPMI? (and helping us close the gap on support for multiple fence devices would be great)

On 20.09.2012 16:13, Itamar Heim wrote:
On 09/20/2012 05:09 PM, Patrick Hurrelmann wrote:
On 20.09.2012 16:01, Itamar Heim wrote:
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too.
Could this be the problem?
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
Ugh, this is ugly. I'm evaluating oVirt currently myself and have already suffered from a dead PSU that took down IPMI as well. I really don't want to imagine what happens if the host with SPM goes down due to a power failure :/ Is there really no other way? I guess multiple fence devices are not possible right now. E.g. first try to fence via IPMI and if that fails pull the plug via APC MasterSwitch. Any thoughts?
SPM would be down until you manually confirm shutdown in this case. SPM doesn't affect running VMs on NFS/posix/local domains, and only thinly provisioned VMs on block storage (iscsi/FC).
question, if no power, would the APC still work? why not just use it to fence instead of IPMI?
(and helping us close the gap on support for multiple fence devices would be great)
Ok, maybe I wasn't precise enough. With power failure I actually meant a broken PSU on the server and I won't be running any local/NFS storage but only iSCSI. But you're right with your point that in such situation fencing via APC would be sufficient. I was mixing my different environments. My lab only has IPMI right now, while the live environment most likely will have APC as well. Regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

On 09/20/12 18:13, Itamar Heim wrote:
On 09/20/2012 05:09 PM, Patrick Hurrelmann wrote:
On 20.09.2012 16:01, Itamar Heim wrote:
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too.
Could this be the problem?
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
Ugh, this is ugly. I'm evaluating oVirt currently myself and have already suffered from a dead PSU that took down IPMI as well. I really don't want to imagine what happens if the host with SPM goes down due to a power failure :/ Is there really no other way? I guess multiple fence devices are not possible right now. E.g. first try to fence via IPMI and if that fails pull the plug via APC MasterSwitch. Any thoughts?
SPM would be down until you manually confirm shutdown in this case. SPM doesn't affect running VMs on NFS/posix/local domains, and only thinly provisioned VMs on block storage (iscsi/FC).
question, if no power, would the APC still work? why not just use it to fence instead of IPMI?
(and helping us close the gap on support for multiple fence devices would be great)
I have brought this issue up before (2012-03-03, "oVirt/RHEV fencing; a single point of failure"). This power yanking thing is a very common test, and it is embarrassing that RHEV fails. Not only would the SPM not move over (making thin provisioning too dangerous to have), but highly available VMs would not auto restart either; making high availability a joke compared to other products. One solution, in addition to multiple fencing devices, would be to have an option to fence using SCSI persistent reservations. This wouldn't help Marc, as he's using NFS storage, but it would help in other cases when SANs are used instead of a NAS. Also, would APC fencing work if there are redundant power supplies? I know RHCS/RHEL HA supports this, but it does oVirt? There were talks of fencing being superseded by sanlock, is that happening? If fencing is going to stick around for sometime; I think this multiple fencing methods and SCSI fencing should be given a priority. Marc: You don't need java. Someone else should be able to give you a better method, but a hack would be to replace an existing agent on the oVirt node directly, eg. /sbin/fence_apc, with your own python script (perhaps get inspiration from fence_ipdu). Maybe even try scripting hard coded unrelated backup fencing methods. Test it on the command line, and if it works type "persist /sbin/fence_apc" and select & test APC power management from the GUI. Repeat for all hypervisors & when they are upgraded/reinstalled. This is obviously not supported nor recommended I imagine; but having only IPMI may be arguably worse. -xrx

Am 20.09.2012 16:01, schrieb Itamar Heim: Thanks again.
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
So I figured, I could use our Eaton/Raritan metered pdus to allow fencing the designated SPM nodes but than realized that controlling pdus by snmp was not supported by oVirt. Any chance that is going to change? Or can you point me to the Java class where I can add this? Kind regards, Marc PS: Sent the reply to Itamar directly. Sorry about that...

On 09/21/2012 01:01 PM, "Marc-Christian Schröer | ingenit GmbH & Co. KG" wrote:
Am 20.09.2012 16:01, schrieb Itamar Heim:
Thanks again.
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
So I figured, I could use our Eaton/Raritan metered pdus to allow fencing the designated SPM nodes but than realized that controlling pdus by snmp was not supported by oVirt. Any chance that is going to change? Or can you point me to the Java class where I can add this?
Lon - do we have any fence agent script for snmp based pdu's?

Yeah, several. Most APC units, IBM iPDU, some Eaton devices, and several others are all SNMP controlled. Marek wrote a library that makes implementing new agents (if needed) very easy. Typically, the only things that need to change are: * the OIDs used to retrieve/set power status * the metadata (which is used to generate man pages, config info, etc.) -- Lon ----- Original Message -----
On 09/21/2012 01:01 PM, "Marc-Christian Schröer | ingenit GmbH & Co. KG" wrote:
Am 20.09.2012 16:01, schrieb Itamar Heim:
Thanks again.
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
So I figured, I could use our Eaton/Raritan metered pdus to allow fencing the designated SPM nodes but than realized that controlling pdus by snmp was not supported by oVirt. Any chance that is going to change? Or can you point me to the Java class where I can add this?
Lon - do we have any fence agent script for snmp based pdu's?

On 09/24/2012 04:32 PM, Lon Hohberger wrote:
Yeah, several.
Most APC units, IBM iPDU, some Eaton devices, and several others are all SNMP controlled. Marek wrote a library that makes implementing new agents (if needed) very easy.
Typically, the only things that need to change are: * the OIDs used to retrieve/set power status * the metadata (which is used to generate man pages, config info, etc.)
Marc-Christian - did you check if any of the existing fence agent scripts don't already provide the fencing you need for your pdu's? if they exist it would be mostly a config change to add them to the list of fencing scripts that appear in ovirt ui (same if you add your own based on marek's library)
-- Lon
----- Original Message -----
On 09/21/2012 01:01 PM, "Marc-Christian Schröer | ingenit GmbH & Co. KG" wrote:
Am 20.09.2012 16:01, schrieb Itamar Heim:
Thanks again.
yes... no auto recovery if can't verify node was fenced. for your tests, maybe power off the machine for your tests as opposed to "no power"?
So I figured, I could use our Eaton/Raritan metered pdus to allow fencing the designated SPM nodes but than realized that controlling pdus by snmp was not supported by oVirt. Any chance that is going to change? Or can you point me to the Java class where I can add this?
Lon - do we have any fence agent script for snmp based pdu's?

Am 20.09.2012 15:34, schrieb Itamar Heim: Hello Itamar, thank you for your answer.
is power management configured on both hosts? since the non responsive node happened to be the SPM, it must be fenced. engine should to this automatically (and this is what you did manually by 'confirm host has been rebooted'. but engine can only do this automatically if power management is configured on both hosts.
Power management is configured for both nodes. But this might be the problem: we use the integrated IPMI over LAN power management - and if I pull the plug on the machine the power management becomes un- available, too. Could this be the problem? Kind regards, Marc -- ________________________________________________________________________ Dipl.-Inform. Marc-Christian Schröer schroeer@ingenit.com Geschäftsführer / CEO ---------------------------------------------------------------------- ingenit GmbH & Co. KG Tel. +49 (0)231 58 698-120 Emil-Figge-Strasse 76-80 Fax. +49 (0)231 58 698-121 D-44227 Dortmund www.ingenit.com Registergericht: Amtsgericht Dortmund, HRA 13 914 Gesellschafter : Thomas Klute, Marc-Christian Schröer ________________________________________________________________________
participants (6)
-
"Marc-Christian Schröer | ingenit GmbH & Co. KG"
-
"Marc-Christian Schröer | ingenit GmbH & Co. KG"
-
Itamar Heim
-
Lon Hohberger
-
Patrick Hurrelmann
-
xrx