<div dir="ltr">I have done a bit more investigating on this matter. If I restart the node from within oVirt using the power management option "restart", then the node restarts and vdsmd DOES NOT start. If I go into the DRAC and issue the command to power cycle the machine, then the machine restarts and vdsmd DOES start. I can run the following command from another node in the cluster: <div>fence_drac5 -a 192.168.200.105 -l root -p <password> -x -o reboot</div><div>and the node restarts and vdsmd DOES start.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein <span dir="ltr"><<a href="mailto:istein@redhat.com" target="_blank">istein@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Rob,<br>
<br>
Thanks for this report.<br>
<br>
Would you please provide these logs, at the time frame, the host failure occur:<br>
1. oVirt Engine: /var/log/ovirt-engine/engine.log<br>
2. host: /var/log/vdsm/vdsm.log<br>
<br>
If it is reproducible, please add this info as well.<br>
<br>
You can also check vdsm service status, on host, while host reported as Non responsive,<br>
by running on host 'service vdsmd status'<br>
There might some problem, that might have prevented from vdsm service to come up, on host.<br>
<br>
Ilanit.<br>
<br>
----- Original Message -----<br>
From: "Rob Abshear" <<a href="mailto:rabshear@citytwist.net">rabshear@citytwist.net</a>><br>
To: <a href="mailto:users@ovirt.org">users@ovirt.org</a><br>
Sent: Friday, January 23, 2015 9:22:42 PM<br>
Subject: [ovirt-users] Host remains Non-Responsive after reboot<br>
<br>
<br>
I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the cluster. Each host has a drac5 and it is configured and working. I am trying to simulate a node failure. I am running one HA VM on one of the hosts for testing. I simulate the failure by powering off the host with the VM running.<br>
<br>
Here is what is happening.<br>
<br>
<br>
* Host is powered off<br>
* ~4 minutes pass and the host is recognized as not responding<br>
* Automatic fence runs and the VM migrates. Another host in the node is chosen as a proxy to execute Status command on the host.<br>
* Same host is chosen as proxy to execute Start command on the host.<br>
* Same host is chosen as proxy to execute Status command on the host.<br>
* The host DOES physically start.<br>
* The host never shows status of UP.<br>
* I select “confirm host has been rebooted” and I see a manual fence start.<br>
* Host stays non-responsive.<br>
* I put the host in maintenance and then activate it.<br>
* Host still non-responsive<br>
* I put the host in maintenance and do a reinstall<br>
* Reinstall finishes and host becomes UP<br>
<br>
So, everything seems to go fine with the HA functionality, but the host never recovers without being reinstalled. Please let me know which logs you need to look at to help me out with this.<br>
<br>
Thanks<br>
<br>
<br>
Sent with Mixmax<br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" target="_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br>
</blockquote></div><br></div>