[ovirt-users] Host remains Non-Responsive after reboot

Rob Abshear rabshear at citytwist.net
Tue Jan 27 20:15:38 UTC 2015


Yeah.  There would have been a lot of connection issues because I was doing
a lot of testing and reconfiguring.    The only part that's really
applicable for this issue is the period you mentioned from 11:54 to 12:01.
I did use the service vdsm status command after the host came back up and
the service was not running.  I start the service manually and it comes up
without error and the node comes back online.  Is it normal operation for
the host to automatically recover if it can, including starting vdsmd?  One
of my colleagues thinks that, perhaps that we are experiencing normal
operation.  But I can't imagine that the host wouldn't come back completely
if it's able.

On Tue, Jan 27, 2015 at 3:05 PM, Piotr Kliczewski <
piotr.kliczewski at gmail.com> wrote:

> Looking at the logs I can see that connection was lost at 2015-01-26
> 09:24:43,213
> and I can see good number of reconnection attempts which end up with
> timeout or 'no route to host'.
> The connection was recovered at 2015-01-26 09:28:56,292.
>
> Vdsm.log do not contain above connection loss (it starts at 2015-01-26
> 10:01:02,208).
>
> It was lost again at 2015-01-26 11:54:58,741 and it was recovered at
> 2015-01-26 12:01:47,752.
>
> I checked vdsm logs and I can see really weird lack of logs:
>
> JsonRpc (StompReactor)::DEBUG::2015-01-26
> 11:52:35,893::stompReactor::98::Broker.StompAdapter::(handle_frame)
> Handling message <StompFMainThread::INFO::2015-01-26
> 12:01:45,183::vdsm::131::vds::(run) (PID: 7021) I am the actual vdsm
> 4.16.10-8.gitc937927.el6 love005.ovt.visionamics.com
> (2.6.32-504.3.3.el6.x86_64)
> MainThread::DEBUG::2015-01-26
>
> 12:01:45,184::resourceManager::421::Storage.ResourceManager::(registerNamespace)
> Registering namespace 'Storage'
>
> which covers having no connection from the engine perspective.
>
> Usually when there are connectivity issues we see timeouts in the logs
> but here there are 'no route to host' as well
> which suggest networking issues.
>
> @Dan - Do you know what caused lack of logs in vdsm?
> @ILanit - What vdsm version do you use?
>
> On Tue, Jan 27, 2015 at 4:57 PM, Piotr Kliczewski <pkliczew at redhat.com>
> wrote:
> >
> >
> >
> >
> > ----- Original Message -----
> >> From: "Eli Mesika" <emesika at redhat.com>
> >> To: "Piotr Kliczewski" <pkliczew at redhat.com>
> >> Cc: "Artyom Lukianov" <alukiano at redhat.com>, users at ovirt.org,
> rabshear at citytwist.net, "ILanit Stein"
> >> <istein at redhat.com>
> >> Sent: Tuesday, January 27, 2015 4:39:26 PM
> >> Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
> >>
> >>
> >>
> >> ----- Original Message -----
> >> > From: "ILanit Stein" <istein at redhat.com>
> >> > To: "Artyom Lukianov" <alukiano at redhat.com>, "Eli Mesika"
> >> > <emesika at redhat.com>
> >> > Cc: users at ovirt.org, rabshear at citytwist.net
> >> > Sent: Tuesday, January 27, 2015 5:19:12 PM
> >> > Subject: Fwd: [ovirt-users] Host remains Non-Responsive after reboot
> >> >
> >> >
> >> > Hi Guys,
> >> >
> >> > Can you please look into this please?
> >>
> >> Hi
> >> From the logs I can see clearly that host is turned on in 2015-01-26
> >> 11:56:51,191
> >> However, there is a stomp exception in 2015-01-26 11:56:53,544 and a
> >> connection timeout in 2015-01-26 11:56:53,553 that might be related
> >>
> >> Piotr, can you please have a look ?
> >>
> >
> > Sure. Can you please send me the logs?
> >
> >>
> >> >
> >> > Thanks,
> >> > Ilanit.
> >> > ----- Forwarded Message -----
> >> > From: "Rob Abshear" <rabshear at citytwist.net>
> >> > To: "ILanit Stein" <istein at redhat.com>
> >> > Sent: Tuesday, January 27, 2015 3:05:56 PM
> >> > Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
> >> >
> >> > Here are the logs. you requested.  The shutdown of the node was at
> 11:53
> >> > and vdsmd was manually restarted at 12:01 to get the node back online.
> >> >
> >> > On Tue, Jan 27, 2015 at 2:05 AM, ILanit Stein <istein at redhat.com>
> wrote:
> >> >
> >> > > It might be a bug,
> >> > > Would you please attach the logs, I mentioned bellow,
> >> > > that can bring more details on the failure?
> >> > > Adding Eli, that may want to give some input on this issue.
> >> > >
> >> > > Thanks,
> >> > > Ilanit.
> >> > >
> >> > > ----- Original Message -----
> >> > > From: "Rob Abshear" <rabshear at citytwist.net>
> >> > > To: "ILanit Stein" <istein at redhat.com>
> >> > > Cc: users at ovirt.org
> >> > > Sent: Monday, January 26, 2015 9:43:14 PM
> >> > > Subject: Re: [ovirt-users] Host remains Non-Responsive after reboot
> >> > >
> >> > > I have done a bit more investigating on this matter.  If I restart
> the
> >> > > node
> >> > > from within oVirt using the power management option "restart", then
> the
> >> > > node restarts and vdsmd DOES NOT start.  If I go into the DRAC and
> issue
> >> > > the command to power cycle the machine, then the machine restarts
> and
> >> > > vdsmd
> >> > > DOES start.  I can run the following command from another node in
> the
> >> > > cluster:
> >> > > fence_drac5 -a 192.168.200.105 -l root -p <password> -x -o reboot
> >> > > and the node restarts and vdsmd DOES start.
> >> > >
> >> > > On Sun, Jan 25, 2015 at 1:56 AM, ILanit Stein <istein at redhat.com>
> wrote:
> >> > >
> >> > > > Hi Rob,
> >> > > >
> >> > > > Thanks for this report.
> >> > > >
> >> > > > Would you please provide these logs, at the time frame, the host
> >> > > > failure
> >> > > > occur:
> >> > > > 1. oVirt Engine: /var/log/ovirt-engine/engine.log
> >> > > > 2. host: /var/log/vdsm/vdsm.log
> >> > > >
> >> > > > If it is reproducible, please add this info as well.
> >> > > >
> >> > > > You can also check vdsm service status, on host, while host
> reported as
> >> > > > Non responsive,
> >> > > > by running on host 'service vdsmd status'
> >> > > > There might some problem, that might have prevented from vdsm
> service
> >> > > > to
> >> > > > come up, on host.
> >> > > >
> >> > > > Ilanit.
> >> > > >
> >> > > > ----- Original Message -----
> >> > > > From: "Rob Abshear" <rabshear at citytwist.net>
> >> > > > To: users at ovirt.org
> >> > > > Sent: Friday, January 23, 2015 9:22:42 PM
> >> > > > Subject: [ovirt-users] Host remains Non-Responsive after reboot
> >> > > >
> >> > > >
> >> > > > I am running oVirt Engine Version 3.5.0.1-1.el6. I have 4 hosts
> in the
> >> > > > cluster. Each host has a drac5 and it is configured and working.
> I am
> >> > > > trying to simulate a node failure. I am running one HA VM on one
> of the
> >> > > > hosts for testing. I simulate the failure by powering off the
> host with
> >> > > the
> >> > > > VM running.
> >> > > >
> >> > > > Here is what is happening.
> >> > > >
> >> > > >
> >> > > >     * Host is powered off
> >> > > >     * ~4 minutes pass and the host is recognized as not responding
> >> > > >     * Automatic fence runs and the VM migrates. Another host in
> the
> >> > > >     node
> >> > > > is chosen as a proxy to execute Status command on the host.
> >> > > >     * Same host is chosen as proxy to execute Start command on the
> >> > > >     host.
> >> > > >     * Same host is chosen as proxy to execute Status command on
> the
> >> > > >     host.
> >> > > >     * The host DOES physically start.
> >> > > >     * The host never shows status of UP.
> >> > > >     * I select “confirm host has been rebooted” and I see a manual
> >> > > >     fence
> >> > > > start.
> >> > > >     * Host stays non-responsive.
> >> > > >     * I put the host in maintenance and then activate it.
> >> > > >     * Host still non-responsive
> >> > > >     * I put the host in maintenance and do a reinstall
> >> > > >     * Reinstall finishes and host becomes UP
> >> > > >
> >> > > > So, everything seems to go fine with the HA functionality, but
> the host
> >> > > > never recovers without being reinstalled. Please let me know
> which logs
> >> > > you
> >> > > > need to look at to help me out with this.
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >
> >> > > >         Sent with Mixmax
> >> > > >
> >> > > > _______________________________________________
> >> > > > Users mailing list
> >> > > > Users at ovirt.org
> >> > > > http://lists.ovirt.org/mailman/listinfo/users
> >> > > >
> >> > >
> >> >
> >>
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150127/0617bd82/attachment-0001.html>


More information about the Users mailing list