What happens if you run "/usr/bin/vdsm-tool restore-nets" manually?

On 19 July 2017 at 16:22, Anthony.Fillmore <Anthony.Fillmore@target.com> wrote:

All services active and running except the vdsm-network.service which last entry is “activating”:

 

[root@t0894bmh1001 vdsm.conf.d]# systemctl status -l vdsm-network.service -l

● vdsm-network.service - Virtual Desktop Server Manager network restoration

   Loaded: loaded (/usr/lib/systemd/system/vdsm-network.service; enabled; vendor preset: enabled)

   Active: activating (start) since Tue 2017-07-18 10:42:57 CDT; 23h ago

  Process: 8216 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append --logfile=/var/log/vdsm/upgrade.log upgrade-unified-persistence (code=exited, status=0/SUCCESS)

Main PID: 8231 (vdsm-tool)

   CGroup: /system.slice/vdsm-network.service

           8231 /usr/bin/python /usr/bin/vdsm-tool restore-nets

           └─8240 /usr/bin/python /usr/share/vdsm/vdsm-restore-net-config

From: Alan Griffiths [mailto:apgriffiths79@gmail.com]
Sent: Wednesday, July 19, 2017 10:13 AM


To: Anthony.Fillmore <Anthony.Fillmore@target.com>
Cc: Pavel Gashev <Pax@acronis.com>; users@ovirt.org; Brandon.Markgraf <Brandon.Markgraf@target.com>; Sandeep.Mendiratta <Sandeep.Mendiratta@target.com>
Subject: Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network Outage

 

Looking at vdsmd.service on one of my 4.0 hosts.

 

Requires=multipathd.service libvirtd.service time-sync.target \

         iscsid.service rpcbind.service supervdsmd.service sanlock.service \

         vdsm-network.service

 

Are all these services present and running?

 

 

On 19 July 2017 at 16:05, Anthony.Fillmore <Anthony.Fillmore@target.com> wrote:

Are the vdsm.conf or mom.conf file in /etc/vdsm of note in this situation? 

 

From: Anthony.Fillmore
Sent: Wednesday, July 19, 2017 9:57 AM
To: 'Alan Griffiths' <apgriffiths79@gmail.com>
Cc: Pavel Gashev <Pax@acronis.com>; users@ovirt.org; Brandon.Markgraf <Brandon.Markgraf@target.com>; Sandeep.Mendiratta <Sandeep.Mendiratta@target.com>
Subject: RE: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network Outage

 

[boxname ~]# systemctl | grep -i dead

mom-vdsm.service                                                                                                                                                               start MOM instance configured for VDSM purposes

vdsmd.service                                                                                                                                                                  start Virtual Desktop Server Manager

 

 

[ boxname ~]# systemctl | grep -i exited

blk-availability.service                                                                                                                                                             Availability of block devices

iptables.service                                                                                                                                                                     IPv4 firewall with iptables

kdump.service                                                                                                                                                                        Crash recovery kernel arming

kmod-static-nodes.service                                                                                                                                                            Create list of required static device nodes for the current kernel

lvm2-monitor.service                                                                                                                                                                 Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling

lvm2-pvscan@253:3.service                                                                                                                                                            LVM2 PV scan on device 253:3

lvm2-pvscan@253:4.service                                                                                                                                                            LVM2 PV scan on device 253:4

lvm2-pvscan@8:3.service                                                                                                                                                              LVM2 PV scan on device 8:3

network.service                                                                                                                                                                      LSB: Bring up/down networking

openvswitch-nonetwork.service                                                                                                                                                        Open vSwitch Internal Unit

openvswitch.service                                                                                                                                                                  Open vSwitch

rhel-dmesg.service                                                                                                                                                                   Dump dmesg to /var/log/dmesg

rhel-import-state.service                                                                                                                                                            Import network configuration from initramfs

rhel-readonly.service                                                                                                                                                                Configure read-only root support

systemd-journal-flush.service                                                                                                                                                        Flush Journal to Persistent Storage

systemd-modules-load.service                                                                                                                                                         Load Kernel Modules

systemd-random-seed.service                                                                                                                                                          Load/Save Random Seed

systemd-readahead-collect.service                                                                                                                                                    Collect Read-Ahead Data

systemd-readahead-replay.service                                                                                                                                                     Replay Read-Ahead Data

systemd-remount-fs.service                                                                                                                                                           Remount Root and Kernel File Systems

systemd-sysctl.service                                                                                                                                                               Apply Kernel Variables

systemd-tmpfiles-setup-dev.service                                                                                                                                                   Create Static Device Nodes in /dev

systemd-tmpfiles-setup.service                                                                                                                                                       Create Volatile Files and Directories

systemd-udev-trigger.service                                                                                                                                                         udev Coldplug all Devices

systemd-update-utmp.service                                                                                                                                                          Update UTMP about System Boot/Shutdown

systemd-user-sessions.service                                                                                                                                                        Permit User Sessions

systemd-vconsole-setup.service                                                                                                                                                       Setup Virtual Console

vdsm-network-init.service                                                                                                                                                            Virtual Desktop Server Manager network IP+link restoration

 

From: Alan Griffiths [mailto:apgriffiths79@gmail.com]
Sent: Wednesday, July 19, 2017 9:47 AM


To: Anthony.Fillmore <Anthony.Fillmore@target.com>
Cc: Pavel Gashev <Pax@acronis.com>; users@ovirt.org; Brandon.Markgraf <Brandon.Markgraf@target.com>; Sandeep.Mendiratta <Sandeep.Mendiratta@target.com>
Subject: Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network Outage

 

Are there other failed services?

 

systemctl --state=failed

 

On 19 July 2017 at 15:40, Anthony.Fillmore <Anthony.Fillmore@target.com> wrote:

Hey Alan,

 

Rpcbind is running on my box, looks like no issue there.  Any other ideas on what could be keeping vdsmd dead?  I even uninstalled all Ovirt related components from the host and went for a reinstall of the host through Ovirt (just short of actually fully removing the host from ovirt and re-adding, which I want to avoid) and the reinstall ends up timing out when it attempts to start VDSM (checking logs can see the service is dead when it gets here).

 

Thanks,

Tony

 

From: Alan Griffiths [mailto:apgriffiths79@gmail.com]
Sent: Wednesday, July 19, 2017 4:14 AM
To: Anthony.Fillmore <Anthony.Fillmore@target.com>
Cc: Pavel Gashev <Pax@acronis.com>; users@ovirt.org; Brandon.Markgraf <Brandon.Markgraf@target.com>; Sandeep.Mendiratta <Sandeep.Mendiratta@target.com>
Subject: Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after Network Outage

 

Is rpcbind running? This is a dependency for vdsmd.

 

I've seen issues where rpcbind will not start on boot if IPv6 is disabled. The solution for me was to rebuild the initramfs, aka "dracut -f"

 

On 18 July 2017 at 18:13, Anthony.Fillmore <Anthony.Fillmore@target.com> wrote:

[boxname ~]# systemctl status -l vdsm-network

● vdsm-network.service - Virtual Desktop Server Manager network restoration

   Loaded: loaded (/usr/lib/systemd/system/vdsm-network.service; enabled; vendor preset: enabled)

   Active: activating (start) since Tue 2017-07-18 10:42:57 CDT; 1h 29min ago

  Process: 8216 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append --logfile=/var/log/vdsm/upgrade.log upgrade-unified-persistence (code=exited, status=0/SUCCESS)

Main PID: 8231 (vdsm-tool)

   CGroup: /system.slice/vdsm-network.service

           ─8231 /usr/bin/python /usr/bin/vdsm-tool restore-nets

           └─8240 /usr/bin/python /usr/share/vdsm/vdsm-restore-net-config

 

Jul 18 10:42:57 t0894bmh1001.stores.target.com systemd[1]: Starting Virtual Desktop Server Manager network restoration...

 

Thanks,

Tony

From: Pavel Gashev [mailto:Pax@acronis.com]
Sent: Tuesday, July 18, 2017 11:17 AM
To: Anthony.Fillmore <Anthony.Fillmore@target.com>; users@ovirt.org
Cc: Brandon.Markgraf <Brandon.Markgraf@target.com>; Sandeep.Mendiratta <Sandeep.Mendiratta@target.com>
Subject: [EXTERNAL] Re: [ovirt-users] Host stuck unresponsive after Network Outage

 

Anthony,

 

Output of “systemctl status -l vdsm-network” would help.

 

 

From: <users-bounces@ovirt.org> on behalf of "Anthony.Fillmore" <Anthony.Fillmore@target.com>
Date: Tuesday, 18 July 2017 at 18:13
To: "users@ovirt.org" <users@ovirt.org>
Cc: "Brandon.Markgraf" <Brandon.Markgraf@target.com>, "Sandeep.Mendiratta" <Sandeep.Mendiratta@target.com>
Subject: [ovirt-users] Host stuck unresponsive after Network Outage

 

Hey Ovirt Users and Team,

 

I have a host that I am unable to recover post a network outage.  The host is stuck in unresponsive mode, even though the host is on the network, able to SSH and seems to be healthy.  I’ve tried several things to recover the host in Ovirt, but have had no success so far.  I’d like to reach out to the community before blowing away and rebuilding the host.

 

Environment: I have an Ovengine server with about 26 Datacenters, with 2 to 3 hosts per Datacenter.  My Ovengine server is hosted centrally, with my hosts being bare-metal and distributed throughout my environment.    Ovengine is version 4.0.6. 

 

What I’ve tried: put into maintenance mode, rebooted the host.  Confirmed host was rebooted and tried to active, goes back to unresponsive.   Attempted a reinstall, which fails. 

 

Checking from the host perspective, I can see the following problems:

 

[boxname~]# systemctl status vdsmd

● vdsmd.service - Virtual Desktop Server Manager

   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)

   Active: inactive (dead)

 

Jul 14 12:34:28 boxname systemd[1]: Dependency failed for Virtual Desktop Server Manager.

Jul 14 12:34:28 boxname systemd[1]: Job vdsmd.service/start failed with result 'dependency'.

 

Going a bit deeper, the results of journalctl –xe:

 

[root@boxname ~]# journalctl -xe

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit libvirtd.service has begun shutting down.

Jul 18 09:07:31 boxname systemd[1]: Stopped Virtualization daemon.

-- Subject: Unit libvirtd.service has finished shutting down

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit libvirtd.service has finished shutting down.

Jul 18 09:07:31 boxname systemd[1]: Reloading.

Jul 18 09:07:31 boxname systemd[1]: Binding to IPv6 address not available since kernel does not support IPv6.

Jul 18 09:07:31 boxname systemd[1]: [/usr/lib/systemd/system/rpcbind.socket:6] Failed to parse address value, ignoring: [::

Jul 18 09:07:31 boxname systemd[1]: Started Auxiliary vdsm service for running helper functions as root.

-- Subject: Unit supervdsmd.service has finished start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit supervdsmd.service has finished starting up.

--

-- The start-up result is done.

Jul 18 09:07:31 boxname systemd[1]: Starting Auxiliary vdsm service for running helper functions as root...

-- Subject: Unit supervdsmd.service has begun start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit supervdsmd.service has begun starting up.

Jul 18 09:07:31 boxname systemd[1]: Starting Virtualization daemon...

-- Subject: Unit libvirtd.service has begun start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit libvirtd.service has begun starting up.

Jul 18 09:07:32 boxname systemd[1]: Started Virtualization daemon.

-- Subject: Unit libvirtd.service has finished start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit libvirtd.service has finished starting up.

--

-- The start-up result is done.

Jul 18 09:07:32 boxname systemd[1]: Starting Virtual Desktop Server Manager network restoration...

-- Subject: Unit vdsm-network.service has begun start-up

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit vdsm-network.service has begun starting up.

lines 2751-2797/2797 (END)

 

Does the community have suggestions on what can be done next to recover this host within Ovirt?  I can provide additional log dumps as needed, please inform with what you need to assist further.

 

Thank you,

Tony

 


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users