Did you setup fencing?

I've also seen this behavior with stressed CPU and NMI watch dog in BIOS rebooting a server but that was on freebsd. Have not seen it on Linux 

On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <jonbae77@gmail.com> wrote:

Hello community,

yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this:

24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan).
24.11.2017 21:58:50 Failed to verify Host onode-1 power management.
24.11.2017 21:58:50 Status of host onode-1 was set to Up.
24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1.
24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close
24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()
24.11.2017 21:58:22 Host onode-1 is rebooting.
24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1.
24.11.2017 21:57:51 Host onode-1 is non responsive.
24.11.2017 21:57:51 VM playout was set to the Unknown status.
24.11.2017 21:57:51 VM gogs was set to the Unknown status.
24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status.
[...]

There is no crash report, and no relevant errors in dmesg.

Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well...

In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed.


Regards

Jonathan


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users