[ovirt-users] Host remains Non-Responsive after reboot

ILanit Stein istein at redhat.com
Tue Jan 27 02:05:18 EST 2015


It might be a bug, 
Would you please attach the logs, I mentioned bellow,
that can bring more details on the failure?
Adding Eli, that may want to give some input on this issue.

Thanks,
Ilanit.

----- Original Message -----
From: "Rob Abshear" <rabshear at citytwist.net>
To: "ILanit Stein" <istein at redhat.com>
Cc: users at ovirt.org
Sent: Monday, January 26, 2015 9:43:14 PM
Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot

I have done a bit more investigating on this matter.  If I restart the node
from within oVirt using the power management option "restart", then the
node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
the command to power cycle the machine, then the machine restarts and vdsmd
DOES start.  I can run the following command from another node in the
cluster:
fence_drac5 -a 192.168.200.105 -l root -p <password> -x -o reboot
and the node restarts and vdsmd DOES start.

On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein <istein at redhat.com> wrote:

> Hi Rob,
>
> Thanks for this report.
>
> Would you please provide these logs, at the time frame, the host failure
> occur:
> 1. oVirt Engine: /var/log/ovirt-engine/engine.log
> 2. host: /var/log/vdsm/vdsm.log
>
> If it is reproducible, please add this info as well.
>
> You can also check vdsm service status, on host, while host reported as
> Non responsive,
> by running on host 'service vdsmd status'
> There might some problem, that might have prevented from vdsm service to
> come up, on host.
>
> Ilanit.
>
> ----- Original Message -----
> From: "Rob Abshear" <rabshear at citytwist.net>
> To: users at ovirt.org
> Sent: Friday, January 23, 2015 9:22:42 PM
> Subject: [ovirt-users] Host remains Non-Responsive after reboot
>
>
> I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
> cluster. Each host has a drac5 and it is configured and working. I am
> trying to simulate a node failure. I am running one HA VM on one of the
> hosts for testing. I simulate the failure by powering off the host with the
> VM running.
>
> Here is what is happening.
>
>
>     * Host is powered off
>     * ~4 minutes pass and the host is recognized as not responding
>     * Automatic fence runs and the VM migrates. Another host in the node
> is chosen as a proxy to execute Status command on the host.
>     * Same host is chosen as proxy to execute Start command on the host.
>     * Same host is chosen as proxy to execute Status command on the host.
>     * The host DOES physically start.
>     * The host never shows status of UP.
>     * I select “confirm host has been rebooted” and I see a manual fence
> start.
>     * Host stays non-responsive.
>     * I put the host in maintenance and then activate it.
>     * Host still non-responsive
>     * I put the host in maintenance and do a reinstall
>     * Reinstall finishes and host becomes UP
>
> So, everything seems to go fine with the HA functionality, but the host
> never recovers without being reinstalled. Please let me know which logs you
> need to look at to help me out with this.
>
> Thanks
>
>
>         Sent with Mixmax
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>


More information about the Users mailing list