--------------040009030705040405030507
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
I am having trouble getting fencing to work on my HP DL180 g6 servers. They
have ilo100 controllers. The documentation mentions ipmi compliance, but
there are problems.
The ipmilan driver gets a response, but it is the wrong response. A status
request results in the NMI line being asserted, which (in standard PC
architecture) is the same as pressing the reset button (which these servers
don't have).
Here are some log excerpts:
16:33
just after re-running re-install from engine, which ended:
*From oVirt GUI "Events" tab
*Host s1 installed
State was set to up for host s1.
Host s3 from cluster Default was *chosen* as a proxy to execute Status
command on Host s1
Host s1 power management was verified successfully
16:34
*on ssh screen:*
Message from syslogd@s1 at Jan 21 16:34:14 ...
kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.
Message from syslogd@s1 at Jan 21 16:34:14 ...
kernel:Do you have a strange power saving mode enabled?
Message from syslogd@s1 at Jan 21 16:34:14 ...
kernel:Dazed and confused, but trying to continue
***from IPMI web interface event log:*
Generic 01/21/2014 21:34:15 Gen ID 0x21 Bus Uncorrectable Error Assertion
Generic 01/21/2014 21:34:15 IOH_NMI_DETECT State Asserted Assertion
*
From oVirt GUI "Events" tab
*Host s1 is non responsive
Host s3 from cluster Default was chosen as a proxy to execute Restart command
on Host s1
Host s3 from cluster Default was chosen as a proxy to execute Stop command on
Host s1
Host s3 from cluster Default was chosen as a proxy to execute Status command
on Host s1
Host s1 was stopped by engine
Manual fence for host s1 was started
Host s3 from cluster Default was chosen as a proxy to execute Status command
on Host s1
Host s3 from cluster Default was chosen as a proxy to execute Start command
on Host s1
Host s3 from cluster Default was chosen as a proxy to execute Status command
on Host s1
Host s1 was started by engine
Host s1 is rebooting
State was set to up for host s1.
Host s3 from cluster Default was chosen as a proxy to execute Status command
on Host s1
16:41
saw kernel panic output on remote KVM terminal
computer rebooted itself
I have searched for ilo100, but find nothing related to ovirt, so am clueless
as to what is the "correct" driver for this hardware.
So far I have seen this mostly on server1 (s1), but that is also the one I
have cycled up and down most often.
I have also seen where the commands are apparently issued too fast (these
servers are fairly slow booting). For example, I found that one server was
powered down when the boot process had gotten to the stage where the RAID
controller screen was up, so it had not had time to complete the boot that
was already in progress.
Ted Miller
Elkhart, IN, USA
--------------040009030705040405030507
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
I am having trouble getting fencing to work on my HP DL180 g6
servers. They have ilo100 controllers. The documentation mentions
ipmi compliance, but there are problems.<br>
<br>
The ipmilan driver gets a response, but it is the wrong response. A
status request results in the NMI line being asserted, which (in
standard PC architecture) is the same as pressing the reset button
(which these servers don't have).<br>
<br>
Here are some log excerpts:<br>
<br>
<table style="text-align: left; width: 100%;" border="1"
cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td style="vertical-align: top;">16:33<br>
</td>
<td style="vertical-align: top;">just after re-running
re-install
from engine, which ended:<br>
<b>From oVirt GUI "Events" tab<br>
</b>Host s1 installed<br>
State was set to up for host s1.<br>
Host s3 from cluster Default was <b>chosen</b> as a proxy
to execute Status
command on Host s1<br>
Host s1 power management was verified successfully<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">16:34<br>
</td>
<td style="vertical-align: top;"><b>on ssh
screen:</b><br>
Message from syslogd@s1 at Jan 21 16:34:14 ...<br>
kernel:Uhhuh. NMI received for unknown reason 31 on CPU
0.<br>
<br>
Message from syslogd@s1 at Jan 21 16:34:14 ...<br>
kernel:Do you have a strange power saving mode enabled?<br>
<br>
Message from syslogd@s1 at Jan 21 16:34:14 ...<br>
kernel:Dazed and confused, but trying to continue<br>
<br>
<b></b><b>from IPMI web interface event
log:</b><br>
<table id="Table1" class="datatable"
cellpadding="5">
<tbody>
<tr>
<td>Generic</td>
<td>01/21/2014</td>
<td>21:34:15</td>
<td>Gen ID 0x21</td>
<td>Bus Uncorrectable Error</td>
<td>Assertion</td>
</tr>
<tr>
<td>Generic</td>
<td>01/21/2014</td>
<td>21:34:15</td>
<td>IOH_NMI_DETECT</td>
<td>State Asserted</td>
<td>Assertion</td>
</tr>
</tbody>
</table>
<b><br>
From oVirt GUI "Events" tab<br>
</b>Host s1 is non responsive<br>
Host s3 from cluster Default was chosen as a proxy to
execute Restart
command on Host s1<br>
Host s3 from cluster Default was chosen as a proxy to
execute Stop
command on Host s1<br>
Host s3 from cluster Default was chosen as a proxy to
execute Status
command on Host s1<br>
Host s1 was stopped by engine<br>
Manual fence for host s1 was started<br>
Host s3 from cluster Default was chosen as a proxy to
execute Status
command on Host s1<br>
Host s3 from cluster Default was chosen as a proxy to
execute Start
command on Host s1<br>
Host s3 from cluster Default was chosen as a proxy to
execute Status
command on Host s1<br>
Host s1 was started by engine<br>
Host s1 is rebooting<br>
State was set to up for host s1.<br>
Host s3 from cluster Default was chosen as a proxy to
execute Status
command on Host s1<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">16:41<br>
</td>
<td style="vertical-align: top;">saw kernel panic output on
remote KVM terminal<br>
computer rebooted itself</td>
</tr>
</tbody>
</table>
<br>
I have searched for ilo100, but find nothing related to ovirt, so am
clueless as to what is the "correct" driver for this hardware.<br>
<br>
So far I have seen this mostly on server1 (s1), but that is also the
one I have cycled up and down most often.<br>
<br>
I have also seen where the commands are apparently issued too fast
(these servers are fairly slow booting). For example, I found that
one server was powered down when the boot process had gotten to the
stage where the RAID controller screen was up, so it had not had
time to complete the boot that was already in progress.<br>
<br>
Ted Miller<br>
Elkhart, IN, USA<br>
<br>
</body>
</html>
--------------040009030705040405030507--