
Did you setup fencing? I've also seen this behavior with stressed CPU and NMI watch dog in BIOS rebooting a server but that was on freebsd. Have not seen it on Linux On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <jonbae77@gmail.com> wrote:
Hello community,
yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this:
24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan). 24.11.2017 21:58:50 Failed to verify Host onode-1 power management. 24.11.2017 21:58:50 Status of host onode-1 was set to Up. 24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1. 24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close 24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: () 24.11.2017 21:58:22 Host onode-1 is rebooting. 24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1. 24.11.2017 21:57:51 Host onode-1 is non responsive. 24.11.2017 21:57:51 VM playout was set to the Unknown status. 24.11.2017 21:57:51 VM gogs was set to the Unknown status. 24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status. [...]
There is no crash report, and no relevant errors in dmesg.
Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well...
In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed.
Regards
Jonathan
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users