[ovirt-users] Host remains Non-Responsive after reboot

Piotr Kliczewski pkliczew at redhat.com
Tue Jan 27 15:57:19 UTC 2015





----- Original Message -----
> From: "Eli Mesika" <emesika at redhat.com>
> To: "Piotr Kliczewski" <pkliczew at redhat.com>
> Cc: "Artyom Lukianov" <alukiano at redhat.com>, users at ovirt.org, rabshear at citytwist.net, "ILanit Stein"
> <istein at redhat.com>
> Sent: Tuesday, January 27, 2015 4:39:26 PM
> Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
> 
> 
> 
> ----- Original Message -----
> > From: "ILanit Stein" <istein at redhat.com>
> > To: "Artyom Lukianov" <alukiano at redhat.com>, "Eli Mesika"
> > <emesika at redhat.com>
> > Cc: users at ovirt.org, rabshear at citytwist.net
> > Sent: Tuesday, January 27, 2015 5:19:12 PM
> > Subject: Fwd: [ovirt-users] Host remains Non-Responsive after reboot
> > 
> > 
> > Hi Guys,
> > 
> > Can you please look into this please?
> 
> Hi
> From the logs I can see clearly that host is turned on in 2015-01-26
> 11:56:51,191
> However, there is a stomp exception in 2015-01-26 11:56:53,544 and a
> connection timeout in 2015-01-26 11:56:53,553 that might be related
> 
> Piotr, can you please have a look ?
> 

Sure. Can you please send me the logs?

> 
> > 
> > Thanks,
> > Ilanit.
> > ----- Forwarded Message -----
> > From: "Rob Abshear" <rabshear at citytwist.net>
> > To: "ILanit Stein" <istein at redhat.com>
> > Sent: Tuesday, January 27, 2015 3:05:56 PM
> > Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
> > 
> > Here are the logs. you requested.  The shutdown of the node was at 11:53
> > and vdsmd was manually restarted at 12:01 to get the node back online.
> > 
> > On Tue, Jan 27, 2015 at 2:05 AM, ILanit Stein <istein at redhat.com> wrote:
> > 
> > > It might be a bug,
> > > Would you please attach the logs, I mentioned bellow,
> > > that can bring more details on the failure?
> > > Adding Eli, that may want to give some input on this issue.
> > >
> > > Thanks,
> > > Ilanit.
> > >
> > > ----- Original Message -----
> > > From: "Rob Abshear" <rabshear at citytwist.net>
> > > To: "ILanit Stein" <istein at redhat.com>
> > > Cc: users at ovirt.org
> > > Sent: Monday, January 26, 2015 9:43:14 PM
> > > Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
> > >
> > > I have done a bit more investigating on this matter.  If I restart the
> > > node
> > > from within oVirt using the power management option "restart", then the
> > > node restarts and vdsmd DOES NOT start.  If I go into the DRAC and issue
> > > the command to power cycle the machine, then the machine restarts and
> > > vdsmd
> > > DOES start.  I can run the following command from another node in the
> > > cluster:
> > > fence_drac5 -a 192.168.200.105 -l root -p <password> -x -o reboot
> > > and the node restarts and vdsmd DOES start.
> > >
> > > On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein <istein at redhat.com> wrote:
> > >
> > > > Hi Rob,
> > > >
> > > > Thanks for this report.
> > > >
> > > > Would you please provide these logs, at the time frame, the host
> > > > failure
> > > > occur:
> > > > 1. oVirt Engine: /var/log/ovirt-engine/engine.log
> > > > 2. host: /var/log/vdsm/vdsm.log
> > > >
> > > > If it is reproducible, please add this info as well.
> > > >
> > > > You can also check vdsm service status, on host, while host reported as
> > > > Non responsive,
> > > > by running on host 'service vdsmd status'
> > > > There might some problem, that might have prevented from vdsm service
> > > > to
> > > > come up, on host.
> > > >
> > > > Ilanit.
> > > >
> > > > ----- Original Message -----
> > > > From: "Rob Abshear" <rabshear at citytwist.net>
> > > > To: users at ovirt.org
> > > > Sent: Friday, January 23, 2015 9:22:42 PM
> > > > Subject: [ovirt-users] Host remains Non-Responsive after reboot
> > > >
> > > >
> > > > I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts in the
> > > > cluster. Each host has a drac5 and it is configured and working. I am
> > > > trying to simulate a node failure. I am running one HA VM on one of the
> > > > hosts for testing. I simulate the failure by powering off the host with
> > > the
> > > > VM running.
> > > >
> > > > Here is what is happening.
> > > >
> > > >
> > > >     * Host is powered off
> > > >     * ~4 minutes pass and the host is recognized as not responding
> > > >     * Automatic fence runs and the VM migrates. Another host in the
> > > >     node
> > > > is chosen as a proxy to execute Status command on the host.
> > > >     * Same host is chosen as proxy to execute Start command on the
> > > >     host.
> > > >     * Same host is chosen as proxy to execute Status command on the
> > > >     host.
> > > >     * The host DOES physically start.
> > > >     * The host never shows status of UP.
> > > >     * I select “confirm host has been rebooted” and I see a manual
> > > >     fence
> > > > start.
> > > >     * Host stays non-responsive.
> > > >     * I put the host in maintenance and then activate it.
> > > >     * Host still non-responsive
> > > >     * I put the host in maintenance and do a reinstall
> > > >     * Reinstall finishes and host becomes UP
> > > >
> > > > So, everything seems to go fine with the HA functionality, but the host
> > > > never recovers without being reinstalled. Please let me know which logs
> > > you
> > > > need to look at to help me out with this.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >         Sent with Mixmax
> > > >
> > > > _______________________________________________
> > > > Users mailing list
> > > > Users at ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/users
> > > >
> > >
> > 
> 



More information about the Users mailing list