[Users] fencing: HP ilo100 status does NMI, reboots computer

Ted Miller tmiller at hcjb.org
Wed Jan 22 15:31:22 UTC 2014


I am having trouble getting fencing to work on my HP DL180 g6 servers.  They 
have ilo100 controllers.  The documentation mentions ipmi compliance, but 
there are problems.

The ipmilan driver gets a response, but it is the wrong response.  A status 
request results in the NMI line being asserted, which (in standard PC 
architecture) is the same as pressing the reset button (which these servers 
don't have).

Here are some log excerpts:

16:33
	just after re-running re-install from engine, which ended:
*From oVirt GUI "Events" tab
*Host s1 installed
State was set to up for host s1.
Host s3 from cluster Default was *chosen* as a proxy to execute Status 
command on Host s1
Host s1 power management was verified successfully
16:34
	*on ssh screen:*
Message from syslogd at s1 at Jan 21 16:34:14 ...
  kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.

Message from syslogd at s1 at Jan 21 16:34:14 ...
  kernel:Do you have a strange power saving mode enabled?

Message from syslogd at s1 at Jan 21 16:34:14 ...
  kernel:Dazed and confused, but trying to continue

***from IPMI web interface event log:*
Generic 	01/21/2014 	21:34:15 	Gen ID 0x21 	Bus Uncorrectable Error 	Assertion
Generic 	01/21/2014 	21:34:15 	IOH_NMI_DETECT 	State Asserted 	Assertion

*
 From oVirt GUI "Events" tab
*Host s1 is non responsive
Host s3 from cluster Default was chosen as a proxy to execute Restart command 
on Host s1
Host s3 from cluster Default was chosen as a proxy to execute Stop command on 
Host s1
Host s3 from cluster Default was chosen as a proxy to execute Status command 
on Host s1
Host s1 was stopped by engine
Manual fence for host s1 was started
Host s3 from cluster Default was chosen as a proxy to execute Status command 
on Host s1
Host s3 from cluster Default was chosen as a proxy to execute Start command 
on Host s1
Host s3 from cluster Default was chosen as a proxy to execute Status command 
on Host s1
Host s1 was started by engine
Host s1 is rebooting
State was set to up for host s1.
Host s3 from cluster Default was chosen as a proxy to execute Status command 
on Host s1
16:41
	saw kernel panic output on remote KVM terminal
computer rebooted itself


I have searched for ilo100, but find nothing related to ovirt, so am clueless 
as to what is the "correct" driver for this hardware.

So far I have seen this mostly on server1 (s1), but that is also the one I 
have cycled up and down most often.

I have also seen where the commands are apparently issued too fast (these 
servers are fairly slow booting).  For example, I found that one server was 
powered down when the boot process had gotten to the stage where the RAID 
controller screen was up, so it had not had time to complete the boot that 
was already in progress.

Ted Miller
Elkhart, IN, USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140122/e0208461/attachment-0001.html>


More information about the Users mailing list