This is a multi-part message in MIME format.
--------------8B684CC5B15E6B654F94892C
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
I do setup power management, but because the second node if off, it's
working not correctly. I will install now a vm on a different server,
just for using it as a proxy.
But you think this can be the reason?
Am 25.11.2017 um 20:36 schrieb Charles Kozler:
Did you setup fencing?
I've also seen this behavior with stressed CPU and NMI watch dog in
BIOS rebooting a server but that was on freebsd. Have not seen it on
Linux
On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <jonbae77(a)gmail.com
<mailto:jonbae77@gmail.com>> wrote:
Hello community,
yesterday evening one of our nodes was rebooted, but I have not
found out why. The engine only reports this:
24.11.2017 22:01:43 Storage Pool Manager runs on Host
onode-1 (Address: onode-1.worknet.lan).
24.11.2017 21:58:50 Failed to verify Host onode-1 power
management.
24.11.2017 21:58:50 Status of host onode-1 was set to Up.
24.11.2017 21:58:41 Successfully refreshed the
capabilities of host onode-1.
24.11.2017 21:58:37 VDSM onode-1 command
GetCapabilitiesVDS failed: Client close
24.11.2017 21:58:37 VDSM onode-1 command
HSMGetAllTasksStatusesVDS failed: Not SPM: ()
24.11.2017 21:58:22 Host onode-1 is rebooting.
24.11.2017 21:58:22 Kdump flow is not in progress on host
onode-1.
24.11.2017 21:57:51 Host onode-1 is non responsive.
24.11.2017 21:57:51 VM playout was set to the Unknown status.
24.11.2017 21:57:51 VM gogs was set to the Unknown status.
24.11.2017 21:57:51 VM Windows2008 was set to the Unknown
status.
[...]
There is no crash report, and no relevant errors in dmesg.
Does the engine send a reboot command to the node, when it gets no
responds? Is there any other way to found out why the node was
rebooting? The node hangs on a usv and all other servers was
running well...
In the time, when the reboot was happen, I had a bigger video
compression job in one of the VMs, so maybe the CPUs got a bit
stressed, but they are not over committed.
Regards
Jonathan
_______________________________________________
Users mailing list
Users(a)ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users
<
http://lists.ovirt.org/mailman/listinfo/users>
--------------8B684CC5B15E6B654F94892C
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">I do setup power management, but
because the second node if off, it's working not correctly. I will
install now a vm on a different server, just for using it as a
proxy. <br>
But you think this can be the reason?<br>
<br>
<br>
Am 25.11.2017 um 20:36 schrieb Charles Kozler:<br>
</div>
<blockquote type="cite"
cite="mid:CAPoaMeJApXj2TTMCXjq7z_OHvaEuCqW7QgzPSWe8cuujKCvKMA@mail.gmail.com">
<div dir="auto">Did you setup fencing?
<div dir="auto"><br>
</div>
<div dir="auto">I've also seen this behavior with stressed
CPU
and NMI watch dog in BIOS rebooting a server but that was on
freebsd. Have not seen it on Linux </div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Nov 25, 2017 2:07 PM, "Jonathan
Baecker" <<a href="mailto:jonbae77@gmail.com"
moz-do-not-send="true">jonbae77(a)gmail.com</a>&gt;
wrote:<br
type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Hello community, <br>
</p>
<p>yesterday evening one of our nodes was rebooted, but I
have not found out why. The engine only reports this:</p>
<blockquote>
<blockquote>
<p>24.11.2017 22:01:43 Storage Pool Manager runs on
Host onode-1 (Address: onode-1.worknet.lan).<br>
24.11.2017 21:58:50 Failed to verify Host onode-1
power management.<br>
24.11.2017 21:58:50 Status of host onode-1 was set
to Up.<br>
24.11.2017 21:58:41 Successfully refreshed the
capabilities of host onode-1.<br>
24.11.2017 21:58:37 VDSM onode-1 command
GetCapabilitiesVDS failed: Client close<br>
24.11.2017 21:58:37 VDSM onode-1 command
HSMGetAllTasksStatusesVDS failed: Not SPM: ()<br>
24.11.2017 21:58:22 Host onode-1 is rebooting.<br>
24.11.2017 21:58:22 Kdump flow is not in progress on
host onode-1.<br>
24.11.2017 21:57:51 Host onode-1 is non responsive.<br>
24.11.2017 21:57:51 VM playout was set to the
Unknown status.<br>
24.11.2017 21:57:51 VM gogs was set to the Unknown
status.<br>
24.11.2017 21:57:51 VM Windows2008 was set to the
Unknown status.<br>
[...]</p>
</blockquote>
</blockquote>
<p>There is no crash report, and no relevant errors in
dmesg. <br>
</p>
<p>Does the engine send a reboot command to the node, when
it gets no responds? Is there any other way to found out
why the node was rebooting? The node hangs on a usv and
all other servers was running well...</p>
<p>In the time, when the reboot was happen, I had a bigger
video compression job in one of the VMs, so maybe the
CPUs got a bit stressed, but they are not over
committed. <br>
</p>
<p><br>
</p>
<p>Regards</p>
<p>Jonathan<br>
</p>
</div>
<br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org"
moz-do-not-send="true">Users(a)ovirt.org</a><br>
<a
href="http://lists.ovirt.org/mailman/listinfo/users"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://lists.ovirt.org/<wbr>mai...
<br>
</blockquote>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>
--------------8B684CC5B15E6B654F94892C--