
This is a multi-part message in MIME format. --------------D1D0407AE4CFA3C37FF62CF4 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hello community, yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this: 24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan). 24.11.2017 21:58:50 Failed to verify Host onode-1 power management. 24.11.2017 21:58:50 Status of host onode-1 was set to Up. 24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1. 24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close 24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: () 24.11.2017 21:58:22 Host onode-1 is rebooting. 24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1. 24.11.2017 21:57:51 Host onode-1 is non responsive. 24.11.2017 21:57:51 VM playout was set to the Unknown status. 24.11.2017 21:57:51 VM gogs was set to the Unknown status. 24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status. [...] There is no crash report, and no relevant errors in dmesg. Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well... In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed. Regards Jonathan --------------D1D0407AE4CFA3C37FF62CF4 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hello community, <br> </p> <p>yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this:</p> <blockquote> <blockquote> <p>24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan).<br> 24.11.2017 21:58:50 Failed to verify Host onode-1 power management.<br> 24.11.2017 21:58:50 Status of host onode-1 was set to Up.<br> 24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1.<br> 24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close<br> 24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()<br> 24.11.2017 21:58:22 Host onode-1 is rebooting.<br> 24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1.<br> 24.11.2017 21:57:51 Host onode-1 is non responsive.<br> 24.11.2017 21:57:51 VM playout was set to the Unknown status.<br> 24.11.2017 21:57:51 VM gogs was set to the Unknown status.<br> 24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status.<br> [...]</p> </blockquote> </blockquote> <p>There is no crash report, and no relevant errors in dmesg. <br> </p> <p>Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well...</p> <p>In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed. <br> </p> <p><br> </p> <p>Regards</p> <p>Jonathan<br> </p> </body> </html> --------------D1D0407AE4CFA3C37FF62CF4--

Did you setup fencing? I've also seen this behavior with stressed CPU and NMI watch dog in BIOS rebooting a server but that was on freebsd. Have not seen it on Linux On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <jonbae77@gmail.com> wrote:
Hello community,
yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this:
24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan). 24.11.2017 21:58:50 Failed to verify Host onode-1 power management. 24.11.2017 21:58:50 Status of host onode-1 was set to Up. 24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1. 24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close 24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: () 24.11.2017 21:58:22 Host onode-1 is rebooting. 24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1. 24.11.2017 21:57:51 Host onode-1 is non responsive. 24.11.2017 21:57:51 VM playout was set to the Unknown status. 24.11.2017 21:57:51 VM gogs was set to the Unknown status. 24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status. [...]
There is no crash report, and no relevant errors in dmesg.
Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well...
In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed.
Regards
Jonathan
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

This is a multi-part message in MIME format. --------------8B684CC5B15E6B654F94892C Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit I do setup power management, but because the second node if off, it's working not correctly. I will install now a vm on a different server, just for using it as a proxy. But you think this can be the reason? Am 25.11.2017 um 20:36 schrieb Charles Kozler:
Did you setup fencing?
I've also seen this behavior with stressed CPU and NMI watch dog in BIOS rebooting a server but that was on freebsd. Have not seen it on Linux
On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <jonbae77@gmail.com <mailto:jonbae77@gmail.com>> wrote:
Hello community,
yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this:
24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan). 24.11.2017 21:58:50 Failed to verify Host onode-1 power management. 24.11.2017 21:58:50 Status of host onode-1 was set to Up. 24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1. 24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close 24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: () 24.11.2017 21:58:22 Host onode-1 is rebooting. 24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1. 24.11.2017 21:57:51 Host onode-1 is non responsive. 24.11.2017 21:57:51 VM playout was set to the Unknown status. 24.11.2017 21:57:51 VM gogs was set to the Unknown status. 24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status. [...]
There is no crash report, and no relevant errors in dmesg.
Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well...
In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed.
Regards
Jonathan
_______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users <http://lists.ovirt.org/mailman/listinfo/users>
--------------8B684CC5B15E6B654F94892C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">I do setup power management, but because the second node if off, it's working not correctly. I will install now a vm on a different server, just for using it as a proxy. <br> But you think this can be the reason?<br> <br> <br> Am 25.11.2017 um 20:36 schrieb Charles Kozler:<br> </div> <blockquote type="cite" cite="mid:CAPoaMeJApXj2TTMCXjq7z_OHvaEuCqW7QgzPSWe8cuujKCvKMA@mail.gmail.com"> <div dir="auto">Did you setup fencing? <div dir="auto"><br> </div> <div dir="auto">I've also seen this behavior with stressed CPU and NMI watch dog in BIOS rebooting a server but that was on freebsd. Have not seen it on Linux </div> </div> <div class="gmail_extra"><br> <div class="gmail_quote">On Nov 25, 2017 2:07 PM, "Jonathan Baecker" <<a href="mailto:jonbae77@gmail.com" moz-do-not-send="true">jonbae77@gmail.com</a>> wrote:<br type="attribution"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div text="#000000" bgcolor="#FFFFFF"> <p>Hello community, <br> </p> <p>yesterday evening one of our nodes was rebooted, but I have not found out why. The engine only reports this:</p> <blockquote> <blockquote> <p>24.11.2017 22:01:43 Storage Pool Manager runs on Host onode-1 (Address: onode-1.worknet.lan).<br> 24.11.2017 21:58:50 Failed to verify Host onode-1 power management.<br> 24.11.2017 21:58:50 Status of host onode-1 was set to Up.<br> 24.11.2017 21:58:41 Successfully refreshed the capabilities of host onode-1.<br> 24.11.2017 21:58:37 VDSM onode-1 command GetCapabilitiesVDS failed: Client close<br> 24.11.2017 21:58:37 VDSM onode-1 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()<br> 24.11.2017 21:58:22 Host onode-1 is rebooting.<br> 24.11.2017 21:58:22 Kdump flow is not in progress on host onode-1.<br> 24.11.2017 21:57:51 Host onode-1 is non responsive.<br> 24.11.2017 21:57:51 VM playout was set to the Unknown status.<br> 24.11.2017 21:57:51 VM gogs was set to the Unknown status.<br> 24.11.2017 21:57:51 VM Windows2008 was set to the Unknown status.<br> [...]</p> </blockquote> </blockquote> <p>There is no crash report, and no relevant errors in dmesg. <br> </p> <p>Does the engine send a reboot command to the node, when it gets no responds? Is there any other way to found out why the node was rebooting? The node hangs on a usv and all other servers was running well...</p> <p>In the time, when the reboot was happen, I had a bigger video compression job in one of the VMs, so maybe the CPUs got a bit stressed, but they are not over committed. <br> </p> <p><br> </p> <p>Regards</p> <p>Jonathan<br> </p> </div> <br> ______________________________<wbr>_________________<br> Users mailing list<br> <a href="mailto:Users@ovirt.org" moz-do-not-send="true">Users@ovirt.org</a><br> <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br> <br> </blockquote> </div> </div> </blockquote> <p><br> </p> </body> </html> --------------8B684CC5B15E6B654F94892C--
participants (2)
-
Charles Kozler
-
Jonathan Baecker