Did you setup fencing?
I've also seen this behavior with stressed CPU and NMI watch dog in BIOS
rebooting a server but that was on freebsd. Have not seen it on Linux
On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <jonbae77(a)gmail.com> wrote:
Hello community,
yesterday evening one of our nodes was rebooted, but I have not found out
why. The engine only reports this:
24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address:
onode-1.worknet.lan).
24.11.2017 21:58:50 Failed to verify Host onode-1 power management.
24.11.2017 21:58:50 Status of host onode-1 was set to Up.
24.11.2017 21:58:41 Successfully refreshed the capabilities of host
onode-1.
24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client
close
24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed:
Not SPM: ()
24.11.2017 21:58:22 Host onode-1 is rebooting.
24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1.
24.11.2017 21:57:51 Host onode-1 is non responsive.
24.11.2017 21:57:51 VM playout was set to the Unknown status.
24.11.2017 21:57:51 VM gogs was set to the Unknown status.
24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status.
[...]
There is no crash report, and no relevant errors in dmesg.
Does the engine send a reboot command to the node, when it gets no
responds? Is there any other way to found out why the node was rebooting?
The node hangs on a usv and all other servers was running well...
In the time, when the reboot was happen, I had a bigger video compression
job in one of the VMs, so maybe the CPUs got a bit stressed, but they are
not over committed.
Regards
Jonathan
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users