Looking at vdsmd.service on one of my 4.0 hosts.
Requires=multipathd.service libvirtd.service time-sync.target \
iscsid.service rpcbind.service supervdsmd.service sanlock.service \
vdsm-network.service
Are all these services present and running?
On 19 July 2017 at 16:05, Anthony.Fillmore <Anthony.Fillmore(a)target.com>
wrote:
Are the vdsm.conf or mom.conf file in /etc/vdsm of note in this
situation?
*From:* Anthony.Fillmore
*Sent:* Wednesday, July 19, 2017 9:57 AM
*To:* 'Alan Griffiths' <apgriffiths79(a)gmail.com>
*Cc:* Pavel Gashev <Pax(a)acronis.com>; users(a)ovirt.org; Brandon.Markgraf <
Brandon.Markgraf(a)target.com>; Sandeep.Mendiratta <
Sandeep.Mendiratta(a)target.com>
*Subject:* RE: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after
Network Outage
[boxname ~]# systemctl | grep -i dead
mom-vdsm.service
start MOM instance
configured for VDSM purposes
vdsmd.service
start Virtual Desktop
Server Manager
[ boxname ~]# systemctl | grep -i exited
blk-availability.service
Availability
of block devices
iptables.service
IPv4 firewall with
iptables
kdump.service
Crash
recovery kernel arming
kmod-static-nodes.service
Create list
of required static device nodes for the current kernel
lvm2-monitor.service
Monitoring
of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
lvm2-pvscan@253:3.service
LVM2 PV scan
on device 253:3
lvm2-pvscan@253:4.service
LVM2 PV scan
on device 253:4
lvm2-pvscan@8:3.service
LVM2 PV scan
on device 8:3
network.service
LSB:
Bring up/down networking
openvswitch-nonetwork.service
Open vSwitch
Internal Unit
openvswitch.service
Open vSwitch
rhel-dmesg.service
Dump dmesg
to /var/log/dmesg
rhel-import-state.service
Import
network configuration from initramfs
rhel-readonly.service
Configure
read-only root support
systemd-journal-flush.service
Flush Journal to
Persistent Storage
systemd-modules-load.service
Load
Kernel Modules
systemd-random-seed.service
Load/Save
Random Seed
systemd-readahead-collect.service
Collect
Read-Ahead Data
systemd-readahead-replay.service
Replay
Read-Ahead Data
systemd-remount-fs.service
Remount Root
and Kernel File Systems
systemd-sysctl.service
Apply Kernel
Variables
systemd-tmpfiles-setup-dev.service
Create
Static Device Nodes in /dev
systemd-tmpfiles-setup.service
Create Volatile Files and Directories
systemd-udev-trigger.service
udev
Coldplug all Devices
systemd-update-utmp.service
Update UTMP
about System Boot/Shutdown
systemd-user-sessions.service
Permit
User Sessions
systemd-vconsole-setup.service
Setup
Virtual Console
vdsm-network-init.service
Virtual
Desktop Server Manager network IP+link restoration
*From:* Alan Griffiths [mailto:apgriffiths79@gmail.com
<apgriffiths79(a)gmail.com>]
*Sent:* Wednesday, July 19, 2017 9:47 AM
*To:* Anthony.Fillmore <Anthony.Fillmore(a)target.com>
*Cc:* Pavel Gashev <Pax(a)acronis.com>; users(a)ovirt.org; Brandon.Markgraf <
Brandon.Markgraf(a)target.com>; Sandeep.Mendiratta <
Sandeep.Mendiratta(a)target.com>
*Subject:* Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after
Network Outage
Are there other failed services?
systemctl --state=failed
On 19 July 2017 at 15:40, Anthony.Fillmore <Anthony.Fillmore(a)target.com>
wrote:
Hey Alan,
Rpcbind is running on my box, looks like no issue there. Any other ideas
on what could be keeping vdsmd dead? I even uninstalled all Ovirt related
components from the host and went for a reinstall of the host through Ovirt
(just short of actually fully removing the host from ovirt and re-adding,
which I want to avoid) and the reinstall ends up timing out when it
attempts to start VDSM (checking logs can see the service is dead when it
gets here).
Thanks,
Tony
*From:* Alan Griffiths [mailto:apgriffiths79@gmail.com]
*Sent:* Wednesday, July 19, 2017 4:14 AM
*To:* Anthony.Fillmore <Anthony.Fillmore(a)target.com>
*Cc:* Pavel Gashev <Pax(a)acronis.com>; users(a)ovirt.org; Brandon.Markgraf <
Brandon.Markgraf(a)target.com>; Sandeep.Mendiratta <
Sandeep.Mendiratta(a)target.com>
*Subject:* Re: [ovirt-users] [EXTERNAL] Re: Host stuck unresponsive after
Network Outage
Is rpcbind running? This is a dependency for vdsmd.
I've seen issues where rpcbind will not start on boot if IPv6 is disabled.
The solution for me was to rebuild the initramfs, aka "dracut -f"
On 18 July 2017 at 18:13, Anthony.Fillmore <Anthony.Fillmore(a)target.com>
wrote:
[boxname ~]# systemctl status -l vdsm-network
● vdsm-network.service - Virtual Desktop Server Manager network restoration
Loaded: loaded (/usr/lib/systemd/system/vdsm-network.service; enabled;
vendor preset: enabled)
Active: activating (start) since Tue 2017-07-18 10:42:57 CDT; 1h 29min
ago
Process: 8216 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append
--logfile=/var/log/vdsm/upgrade.log upgrade-unified-persistence
(code=exited, status=0/SUCCESS)
Main PID: 8231 (vdsm-tool)
CGroup: /system.slice/vdsm-network.service
├─8231 /usr/bin/python /usr/bin/vdsm-tool restore-nets
└─8240 /usr/bin/python /usr/share/vdsm/vdsm-restore-net-config
Jul 18 10:42:57
t0894bmh1001.stores.target.com systemd[1]: Starting
Virtual Desktop Server Manager network restoration...
Thanks,
Tony
*From:* Pavel Gashev [mailto:Pax@acronis.com]
*Sent:* Tuesday, July 18, 2017 11:17 AM
*To:* Anthony.Fillmore <Anthony.Fillmore(a)target.com>; users(a)ovirt.org
*Cc:* Brandon.Markgraf <Brandon.Markgraf(a)target.com>; Sandeep.Mendiratta <
Sandeep.Mendiratta(a)target.com>
*Subject:* [EXTERNAL] Re: [ovirt-users] Host stuck unresponsive after
Network Outage
Anthony,
Output of “systemctl status -l vdsm-network” would help.
*From: *<users-bounces(a)ovirt.org> on behalf of "Anthony.Fillmore" <
Anthony.Fillmore(a)target.com>
*Date: *Tuesday, 18 July 2017 at 18:13
*To: *"users(a)ovirt.org" <users(a)ovirt.org>
*Cc: *"Brandon.Markgraf" <Brandon.Markgraf(a)target.com>,
"Sandeep.Mendiratta" <Sandeep.Mendiratta(a)target.com>
*Subject: *[ovirt-users] Host stuck unresponsive after Network Outage
Hey Ovirt Users and Team,
I have a host that I am unable to recover post a network outage. The host
is stuck in unresponsive mode, even though the host is on the network, able
to SSH and seems to be healthy. I’ve tried several things to recover the
host in Ovirt, but have had no success so far. I’d like to reach out to
the community before blowing away and rebuilding the host.
*Environment*: I have an Ovengine server with about 26 Datacenters, with
2 to 3 hosts per Datacenter. My Ovengine server is hosted centrally, with
my hosts being bare-metal and distributed throughout my environment.
Ovengine is version 4.0.6.
*What I’ve tried: *put into maintenance mode, rebooted the host.
Confirmed host was rebooted and tried to active, goes back to
unresponsive. Attempted a reinstall, which fails.
*Checking from the host perspective, I can see the following problems: *
[boxname~]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
preset: enabled)
Active: inactive (dead)
Jul 14 12:34:28 boxname systemd[1]: Dependency failed for Virtual Desktop
Server Manager.
Jul 14 12:34:28 boxname systemd[1]: Job vdsmd.service/start failed with
result 'dependency'.
*Going a bit deeper, the results of journalctl –xe: *
[root@boxname ~]# journalctl -xe
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has begun shutting down.
Jul 18 09:07:31 boxname systemd[1]: Stopped Virtualization daemon.
-- Subject: Unit libvirtd.service has finished shutting down
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has finished shutting down.
Jul 18 09:07:31 boxname systemd[1]: Reloading.
Jul 18 09:07:31 boxname systemd[1]: Binding to IPv6 address not available
since kernel does not support IPv6.
Jul 18 09:07:31 boxname systemd[1]: [/usr/lib/systemd/system/rpcbind.socket:6]
Failed to parse address value, ignoring: [::
Jul 18 09:07:31 boxname systemd[1]: Started Auxiliary vdsm service for
running helper functions as root.
-- Subject: Unit supervdsmd.service has finished start-up
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit supervdsmd.service has finished starting up.
--
-- The start-up result is done.
Jul 18 09:07:31 boxname systemd[1]: Starting Auxiliary vdsm service for
running helper functions as root...
-- Subject: Unit supervdsmd.service has begun start-up
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit supervdsmd.service has begun starting up.
Jul 18 09:07:31 boxname systemd[1]: Starting Virtualization daemon...
-- Subject: Unit libvirtd.service has begun start-up
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has begun starting up.
Jul 18 09:07:32 boxname systemd[1]: Started Virtualization daemon.
-- Subject: Unit libvirtd.service has finished start-up
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit libvirtd.service has finished starting up.
--
-- The start-up result is done.
Jul 18 09:07:32 boxname systemd[1]: Starting Virtual Desktop Server
Manager network restoration...
-- Subject: Unit vdsm-network.service has begun start-up
-- Defined-By: systemd
-- Support:
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit vdsm-network.service has begun starting up.
lines 2751-2797/2797 (END)
Does the community have suggestions on what can be done next to recover
this host within Ovirt? I can provide additional log dumps as needed,
please inform with what you need to assist further.
Thank you,
Tony
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users