[Users] fencing: HP ilo100 status does NMI, reboots computer

Wed Jan 22 15:50:38 UTC 2014

On 1/22/2014 10:31 AM, Ted Miller wrote:
> I am having trouble getting fencing to work on my HP DL180 g6 servers.  
> They have ilo100 controllers.  The documentation mentions ipmi compliance, 
> but there are problems.
>
> The ipmilan driver gets a response, but it is the wrong response. A status 
> request results in the NMI line being asserted, which (in standard PC 
> architecture) is the same as pressing the reset button (which these servers 
> don't have).
>
> Here are some log excerpts:
>
> 16:33
> 	just after re-running re-install from engine, which ended:
> *From oVirt GUI "Events" tab
> *Host s1 installed
> State was set to up for host s1.
> Host s3 from cluster Default was *chosen* as a proxy to execute Status 
> command on Host s1
> Host s1 power management was verified successfully
> 16:34
> 	*on ssh screen:*
> Message from syslogd at s1 at Jan 21 16:34:14 ...
>  kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.
>
> Message from syslogd at s1 at Jan 21 16:34:14 ...
>  kernel:Do you have a strange power saving mode enabled?
>
> Message from syslogd at s1 at Jan 21 16:34:14 ...
>  kernel:Dazed and confused, but trying to continue
>
> *from IPMI web interface event log:*
> Generic 	01/21/2014 	21:34:15 	Gen ID 0x21 	Bus Uncorrectable Error 	Assertion
> Generic 	01/21/2014 	21:34:15 	IOH_NMI_DETECT 	State Asserted 	Assertion
>
> *
> From oVirt GUI "Events" tab
> *Host s1 is non responsive
> Host s3 from cluster Default was chosen as a proxy to execute Restart 
> command on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Stop command 
> on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Status 
> command on Host s1
> Host s1 was stopped by engine
> Manual fence for host s1 was started
> Host s3 from cluster Default was chosen as a proxy to execute Status 
> command on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Start command 
> on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Status 
> command on Host s1
> Host s1 was started by engine
> Host s1 is rebooting
> State was set to up for host s1.
> Host s3 from cluster Default was chosen as a proxy to execute Status 
> command on Host s1
> 16:41
> 	saw kernel panic output on remote KVM terminal
> computer rebooted itself
>
>
> I have searched for ilo100, but find nothing related to ovirt, so am 
> clueless as to what is the "correct" driver for this hardware.
>
> So far I have seen this mostly on server1 (s1), but that is also the one I 
> have cycled up and down most often.
>
> I have also seen where the commands are apparently issued too fast (these 
> servers are fairly slow booting).  For example, I found that one server was 
> powered down when the boot process had gotten to the stage where the RAID 
> controller screen was up, so it had not had time to complete the boot that 
> was already in progress.
>
> Ted Miller
Here's what happened between when I logged out last night and logged in again 
this morning, according to the Engine web admin:

2014-Jan-22, 10:36

User admin at internal logged in.

2014-Jan-22, 10:20

Host s1 power management was verified successfully.

2014-Jan-22, 10:20

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-22, 10:20

State was set to Up for host s1.

2014-Jan-22, 10:18

Host s1 is rebooting.

2014-Jan-22, 10:18

Host s1 was started by Engine.

2014-Jan-22, 10:18

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-22, 10:18

Host s3 from cluster Default was choosen as a proxy to execute Start command 
on Host s1.

2014-Jan-22, 10:18

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-22, 10:18

Manual fence for host s1 was started.

2014-Jan-22, 10:18

Host s1 was stopped by Engine.

2014-Jan-22, 10:18

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-22, 10:18

Host s3 from cluster Default was choosen as a proxy to execute Stop command 
on Host s1.

2014-Jan-22, 10:18

Host s3 from cluster Default was choosen as a proxy to execute Restart 
command on Host s1.

2014-Jan-22, 10:18

Host s1 is non responsive.

2014-Jan-21, 17:59

Host s1 power management was verified successfully.

2014-Jan-21, 17:59

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-21, 17:59

State was set to Up for host s1.

2014-Jan-21, 17:57

Host s1 is rebooting.

2014-Jan-21, 17:57

Host s1 was started by Engine.

2014-Jan-21, 17:57

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-21, 17:57

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-21, 17:56

Host s3 from cluster Default was choosen as a proxy to execute Start command 
on Host s1.

2014-Jan-21, 17:56

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-21, 17:56

Manual fence for host s1 was started.

2014-Jan-21, 17:56

Host s1 was stopped by Engine.

2014-Jan-21, 17:56

Host s3 from cluster Default was choosen as a proxy to execute Status command 
on Host s1.

2014-Jan-21, 17:56

Host s3 from cluster Default was choosen as a proxy to execute Stop command 
on Host s1.

2014-Jan-21, 17:56

Host s3 from cluster Default was choosen as a proxy to execute Restart 
command on Host s1.

2014-Jan-21, 17:56

Host s1 is non responsive.

2014-Jan-21, 16:56

User admin at internal logged out.

I will supply logs if you can tell me what you need and where to find it.

Ted Miller
Elkhart, IN, USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140122/fb479d5a/attachment-0001.html>