[Users] fencing: HP ilo100 status does NMI, reboots computer
Ted Miller
tmiller at hcjb.org
Wed Jan 22 15:50:38 UTC 2014
On 1/22/2014 10:31 AM, Ted Miller wrote:
> I am having trouble getting fencing to work on my HP DL180 g6 servers.
> They have ilo100 controllers. The documentation mentions ipmi compliance,
> but there are problems.
>
> The ipmilan driver gets a response, but it is the wrong response. A status
> request results in the NMI line being asserted, which (in standard PC
> architecture) is the same as pressing the reset button (which these servers
> don't have).
>
> Here are some log excerpts:
>
> 16:33
> just after re-running re-install from engine, which ended:
> *From oVirt GUI "Events" tab
> *Host s1 installed
> State was set to up for host s1.
> Host s3 from cluster Default was *chosen* as a proxy to execute Status
> command on Host s1
> Host s1 power management was verified successfully
> 16:34
> *on ssh screen:*
> Message from syslogd at s1 at Jan 21 16:34:14 ...
> kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.
>
> Message from syslogd at s1 at Jan 21 16:34:14 ...
> kernel:Do you have a strange power saving mode enabled?
>
> Message from syslogd at s1 at Jan 21 16:34:14 ...
> kernel:Dazed and confused, but trying to continue
>
> *from IPMI web interface event log:*
> Generic 01/21/2014 21:34:15 Gen ID 0x21 Bus Uncorrectable Error Assertion
> Generic 01/21/2014 21:34:15 IOH_NMI_DETECT State Asserted Assertion
>
> *
> From oVirt GUI "Events" tab
> *Host s1 is non responsive
> Host s3 from cluster Default was chosen as a proxy to execute Restart
> command on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Stop command
> on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Status
> command on Host s1
> Host s1 was stopped by engine
> Manual fence for host s1 was started
> Host s3 from cluster Default was chosen as a proxy to execute Status
> command on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Start command
> on Host s1
> Host s3 from cluster Default was chosen as a proxy to execute Status
> command on Host s1
> Host s1 was started by engine
> Host s1 is rebooting
> State was set to up for host s1.
> Host s3 from cluster Default was chosen as a proxy to execute Status
> command on Host s1
> 16:41
> saw kernel panic output on remote KVM terminal
> computer rebooted itself
>
>
> I have searched for ilo100, but find nothing related to ovirt, so am
> clueless as to what is the "correct" driver for this hardware.
>
> So far I have seen this mostly on server1 (s1), but that is also the one I
> have cycled up and down most often.
>
> I have also seen where the commands are apparently issued too fast (these
> servers are fairly slow booting). For example, I found that one server was
> powered down when the boot process had gotten to the stage where the RAID
> controller screen was up, so it had not had time to complete the boot that
> was already in progress.
>
> Ted Miller
Here's what happened between when I logged out last night and logged in again
this morning, according to the Engine web admin:
2014-Jan-22, 10:36
User admin at internal logged in.
2014-Jan-22, 10:20
Host s1 power management was verified successfully.
2014-Jan-22, 10:20
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-22, 10:20
State was set to Up for host s1.
2014-Jan-22, 10:18
Host s1 is rebooting.
2014-Jan-22, 10:18
Host s1 was started by Engine.
2014-Jan-22, 10:18
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-22, 10:18
Host s3 from cluster Default was choosen as a proxy to execute Start command
on Host s1.
2014-Jan-22, 10:18
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-22, 10:18
Manual fence for host s1 was started.
2014-Jan-22, 10:18
Host s1 was stopped by Engine.
2014-Jan-22, 10:18
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-22, 10:18
Host s3 from cluster Default was choosen as a proxy to execute Stop command
on Host s1.
2014-Jan-22, 10:18
Host s3 from cluster Default was choosen as a proxy to execute Restart
command on Host s1.
2014-Jan-22, 10:18
Host s1 is non responsive.
2014-Jan-21, 17:59
Host s1 power management was verified successfully.
2014-Jan-21, 17:59
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-21, 17:59
State was set to Up for host s1.
2014-Jan-21, 17:57
Host s1 is rebooting.
2014-Jan-21, 17:57
Host s1 was started by Engine.
2014-Jan-21, 17:57
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-21, 17:57
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-21, 17:56
Host s3 from cluster Default was choosen as a proxy to execute Start command
on Host s1.
2014-Jan-21, 17:56
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-21, 17:56
Manual fence for host s1 was started.
2014-Jan-21, 17:56
Host s1 was stopped by Engine.
2014-Jan-21, 17:56
Host s3 from cluster Default was choosen as a proxy to execute Status command
on Host s1.
2014-Jan-21, 17:56
Host s3 from cluster Default was choosen as a proxy to execute Stop command
on Host s1.
2014-Jan-21, 17:56
Host s3 from cluster Default was choosen as a proxy to execute Restart
command on Host s1.
2014-Jan-21, 17:56
Host s1 is non responsive.
2014-Jan-21, 16:56
User admin at internal logged out.
I will supply logs if you can tell me what you need and where to find it.
Ted Miller
Elkhart, IN, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140122/fb479d5a/attachment-0001.html>
More information about the Users
mailing list