[ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network Outage

Alan Griffiths apgriffiths79 at gmail.com
Wed Jul 19 14:47:14 UTC 2017


Are there other failed services?

systemctl --state=failed

On 19 July 2017 at 15:40, Anthony.Fillmore <Anthony.Fillmore at target.com>
wrote:

> Hey Alan,
>
>
>
> Rpcbind is running on my box, looks like no issue there.  Any other ideas
> on what could be keeping vdsmd dead?  I even uninstalled all Ovirt related
> components from the host and went for a reinstall of the host through Ovirt
> (just short of actually fully removing the host from ovirt and re-adding,
> which I want to avoid) and the reinstall ends up timing out when it
> attempts to start VDSM (checking logs can see the service is dead when it
> gets here).
>
>
>
> Thanks,
>
> Tony
>
>
>
> *From:* Alan Griffiths [mailto:apgriffiths79 at gmail.com]
> *Sent:* Wednesday, July 19, 2017 4:14 AM
> *To:* Anthony.Fillmore <Anthony.Fillmore at target.com>
> *Cc:* Pavel Gashev <Pax at acronis.com>; users at ovirt.org; Brandon.Markgraf <
> Brandon.Markgraf at target.com>; Sandeep.Mendiratta <
> Sandeep.Mendiratta at target.com>
> *Subject:* Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after
> Network Outage
>
>
>
> Is rpcbind running? This is a dependency for vdsmd.
>
>
>
> I've seen issues where rpcbind will not start on boot if IPv6 is disabled.
> The solution for me was to rebuild the initramfs, aka "dracut -f"
>
>
>
> On 18 July 2017 at 18:13, Anthony.Fillmore <Anthony.Fillmore at target.com>
> wrote:
>
> [boxname ~]# systemctl status -l vdsm-network
>
> ● vdsm-network.service - Virtual Desktop Server Manager network restoration
>
>    Loaded: loaded (/usr/lib/systemd/system/vdsm-network.service; enabled;
> vendor preset: enabled)
>
>    Active: activating (start) since Tue 2017-07-18 10:42:57 CDT; 1h 29min
> ago
>
>   Process: 8216 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append
> --logfile=/var/log/vdsm/upgrade.log upgrade-unified-persistence
> (code=exited, status=0/SUCCESS)
>
> Main PID: 8231 (vdsm-tool)
>
>    CGroup: /system.slice/vdsm-network.service
>
>            ├─8231 /usr/bin/python /usr/bin/vdsm-tool restore-nets
>
>            └─8240 /usr/bin/python /usr/share/vdsm/vdsm-restore-net-config
>
>
>
> Jul 18 10:42:57 t0894bmh1001.stores.target.com systemd[1]: Starting
> Virtual Desktop Server Manager network restoration...
>
>
>
> Thanks,
>
> Tony
>
> *From:* Pavel Gashev [mailto:Pax at acronis.com]
> *Sent:* Tuesday, July 18, 2017 11:17 AM
> *To:* Anthony.Fillmore <Anthony.Fillmore at target.com>; users at ovirt.org
> *Cc:* Brandon.Markgraf <Brandon.Markgraf at target.com>; Sandeep.Mendiratta <
> Sandeep.Mendiratta at target.com>
> *Subject:* [EXTERNAL] Re: [ovirt-users] Host stuck unresponsive after
> Network Outage
>
>
>
> Anthony,
>
>
>
> Output of “systemctl status -l vdsm-network” would help.
>
>
>
>
>
> *From: *<users-bounces at ovirt.org> on behalf of "Anthony.Fillmore" <
> Anthony.Fillmore at target.com>
> *Date: *Tuesday, 18 July 2017 at 18:13
> *To: *"users at ovirt.org" <users at ovirt.org>
> *Cc: *"Brandon.Markgraf" <Brandon.Markgraf at target.com>,
> "Sandeep.Mendiratta" <Sandeep.Mendiratta at target.com>
> *Subject: *[ovirt-users] Host stuck unresponsive after Network Outage
>
>
>
> Hey Ovirt Users and Team,
>
>
>
> I have a host that I am unable to recover post a network outage.  The host
> is stuck in unresponsive mode, even though the host is on the network, able
> to SSH and seems to be healthy.  I’ve tried several things to recover the
> host in Ovirt, but have had no success so far.  I’d like to reach out to
> the community before blowing away and rebuilding the host.
>
>
>
> *Environment*: I have an Ovengine server with about 26 Datacenters, with
> 2 to 3 hosts per Datacenter.  My Ovengine server is hosted centrally, with
> my hosts being bare-metal and distributed throughout my environment.
>   Ovengine is version 4.0.6.
>
>
>
> *What I’ve tried: *put into maintenance mode, rebooted the host.
> Confirmed host was rebooted and tried to active, goes back to
> unresponsive.   Attempted a reinstall, which fails.
>
>
>
> *Checking from the host perspective, I can see the following problems: *
>
>
>
> [boxname~]# systemctl status vdsmd
>
> ● vdsmd.service - Virtual Desktop Server Manager
>
>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
> preset: enabled)
>
>    Active: inactive (dead)
>
>
>
> Jul 14 12:34:28 boxname systemd[1]: Dependency failed for Virtual Desktop
> Server Manager.
>
> Jul 14 12:34:28 boxname systemd[1]: Job vdsmd.service/start failed with
> result 'dependency'.
>
>
>
> *Going a bit deeper, the results of journalctl –xe: *
>
>
>
> [root at boxname ~]# journalctl -xe
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit libvirtd.service has begun shutting down.
>
> Jul 18 09:07:31 boxname systemd[1]: Stopped Virtualization daemon.
>
> -- Subject: Unit libvirtd.service has finished shutting down
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit libvirtd.service has finished shutting down.
>
> Jul 18 09:07:31 boxname systemd[1]: Reloading.
>
> Jul 18 09:07:31 boxname systemd[1]: Binding to IPv6 address not available
> since kernel does not support IPv6.
>
> Jul 18 09:07:31 boxname systemd[1]: [/usr/lib/systemd/system/rpcbind.socket:6]
> Failed to parse address value, ignoring: [::
>
> Jul 18 09:07:31 boxname systemd[1]: Started Auxiliary vdsm service for
> running helper functions as root.
>
> -- Subject: Unit supervdsmd.service has finished start-up
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit supervdsmd.service has finished starting up.
>
> --
>
> -- The start-up result is done.
>
> Jul 18 09:07:31 boxname systemd[1]: Starting Auxiliary vdsm service for
> running helper functions as root...
>
> -- Subject: Unit supervdsmd.service has begun start-up
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit supervdsmd.service has begun starting up.
>
> Jul 18 09:07:31 boxname systemd[1]: Starting Virtualization daemon...
>
> -- Subject: Unit libvirtd.service has begun start-up
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit libvirtd.service has begun starting up.
>
> Jul 18 09:07:32 boxname systemd[1]: Started Virtualization daemon.
>
> -- Subject: Unit libvirtd.service has finished start-up
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit libvirtd.service has finished starting up.
>
> --
>
> -- The start-up result is done.
>
> Jul 18 09:07:32 boxname systemd[1]: Starting Virtual Desktop Server
> Manager network restoration...
>
> -- Subject: Unit vdsm-network.service has begun start-up
>
> -- Defined-By: systemd
>
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
> --
>
> -- Unit vdsm-network.service has begun starting up.
>
> lines 2751-2797/2797 (END)
>
>
>
> Does the community have suggestions on what can be done next to recover
> this host within Ovirt?  I can provide additional log dumps as needed,
> please inform with what you need to assist further.
>
>
>
> Thank you,
>
> Tony
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170719/da9438d9/attachment-0001.html>


More information about the Users mailing list