On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
Hi all,
I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine on
CentOS 7.1). The engine update went fine. Updating the hosts succeeds until the
first reboot. After a reboot the host does not come up again. It is missing all
network configuration. All network cfgs in /etc/sysconfig/network-scripts are
missing except ifcfg-lo. The host boots up without working networking. Using
IPMI and config backups, I was able to restore the lost network configs. Once
these are restored and the host is rebooted again all seems to be back to good.
This has now happend to 2 updated hosts (this installation has a total of 4
hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.
Before updating the second host, I gathered some information. All these hosts
have 3 physical nics. One is used for the ovirtmgmt bridge and the other 2 are
used for iSCSI storage vlans.
ifcfgs before update:
/etc/sysconfig/network-scripts/ifcfg-em1
# Generated by VDSM version 4.16.20-0.el7.centos
DEVICE=em1
HWADDR=d0:67:e5:f0:e5:c6
BRIDGE=ovirtmgmt
ONBOOT=yes
NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-lo
DEVICE=lo
IPADDR=127.0.0.1
NETMASK=255.0.0.0
NETWORK=127.0.0.0
# If you're having problems with gated making 127.0.0.0/8 a martian,
# you can change this to something else (255.255.255.255, for example)
BROADCAST=127.255.255.255
ONBOOT=yes
NAME=loopback
/etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
# Generated by VDSM version 4.16.20-0.el7.centos
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
IPADDR=1.2.3.16
NETMASK=255.255.255.0
GATEWAY=1.2.3.11
BOOTPROTO=none
DEFROUTE=yes
NM_CONTROLLED=no
HOTPLUG=no
/etc/sysconfig/network-scripts/ifcfg-p4p1
# Generated by VDSM version 4.16.20-0.el7.centos
DEVICE=p4p1
HWADDR=68:05:ca:01:bc:0c
ONBOOT=no
IPADDR=4.5.7.102
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=9000
DEFROUTE=no
NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-p3p1
# Generated by VDSM version 4.16.20-0.el7.centos
DEVICE=p3p1
HWADDR=68:05:ca:18:86:45
ONBOOT=no
IPADDR=4.5.6.102
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=9000
DEFROUTE=no
NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-lo
ip link before update:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT
link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state
UP mode DEFAULT qlen 1000
link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
4: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP mode
DEFAULT qlen 1000
link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
5: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP mode
DEFAULT qlen 1000
link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
mode DEFAULT
link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
8: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
vdsm files before update:
/var/lib/vdsm
/var/lib/vdsm/bonding-defaults.json
/var/lib/vdsm/netconfback
/var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
/var/lib/vdsm/netconfback/ifcfg-em1
/var/lib/vdsm/netconfback/route-ovirtmgmt
/var/lib/vdsm/netconfback/rule-ovirtmgmt
/var/lib/vdsm/netconfback/ifcfg-p4p1
/var/lib/vdsm/netconfback/ifcfg-p3p1
/var/lib/vdsm/persistence
/var/lib/vdsm/persistence/netconf
/var/lib/vdsm/persistence/netconf.1416666697752319079
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
/var/lib/vdsm/upgrade
/var/lib/vdsm/upgrade/upgrade-unified-persistence
/var/lib/vdsm/transient
File in /var/lib/vdsm/netconfback each only contained a comment:
# original file did not exist
This is quite peculiar. Do you know when these where created?
Have you made any networking changes on 3.5.3 just before boot?
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
{"nic": "em1", "netmask": "255.255.255.0",
"bootproto": "none", "ipaddr": "1.2.3.16",
"gateway": "1.2.3.11"}
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
{"nic": "p3p1", "netmask": "255.255.255.0",
"ipaddr": "4.5.6.102", "bridged": "false",
"mtu": "9000"}
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
{"nic": "p4p1", "netmask": "255.255.255.0",
"ipaddr": "4.5.7.102", "bridged": "false",
"mtu": "9000"}
After update and reboot, no ifcfg scripts are left. Only interface lo is up.
Syslog doess not seem to contain anything suspicious before refore reboot.
Have you tweaked vdsm.conf in any way? In particular did you set
net_persistence?
Log excerpts from bootup:
Sep 3 17:27:23 vhm-prd-02 network: Bringing up loopback interface: [ OK ]
Sep 3 17:27:23 vhm-prd-02 systemd-ovirt-ha-agent: Starting ovirt-ha-agent: [ OK ]
Sep 3 17:27:23 vhm-prd-02 systemd: Started oVirt Hosted Engine High Availability
Monitoring Agent.
Sep 3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
Sep 3 17:27:23 vhm-prd-02 kernel: device em1 entered promiscuous mode
Sep 3 17:27:23 vhm-prd-02 network: Bringing up interface em1: [ OK ]
Sep 3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): ovirtmgmt: link is not
ready
Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: Joining mDNS multicast group on interface
ovirtmgmt.IPv4 with address 1.2.3.16.
Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: New relevant interface ovirtmgmt.IPv4 for
mDNS.
Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: Registering new address record for 1.2.3.16
on ovirtmgmt.IPv4.
Sep 3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Link is up at 1000 Mbps, full
duplex
Sep 3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Flow control is off for TX and
off for RX
Sep 3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes
ready
Sep 3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered forwarding state
Sep 3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered forwarding state
Sep 3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ovirtmgmt: link becomes
ready
Sep 3 17:27:26 vhm-prd-02 network: Bringing up interface ovirtmgmt: [ OK ]
Sep 3 17:27:26 vhm-prd-02 systemd: Started LSB: Bring up/down networking.
Sep 3 17:27:26 vhm-prd-02 systemd: Starting Network.
Sep 3 17:27:26 vhm-prd-02 systemd: Reached target Network.
So ovirtmgmt and em1 were restore and initialized just fine (p3p1 and p4p1
should have been started, too, but engine configured them as ONBOOT=no).
Further in messages (full log is attached):
would you also attach your post-boot supervdsm.log?
Sep 3 17:27:26 vhm-prd-02 systemd: Starting Virtual Desktop Server Manager network
restoration...
Sep 3 17:27:26 vhm-prd-02 systemd: Started OSAD daemon.
Sep 3 17:27:27 vhm-prd-02 systemd: Started Terminate Plymouth Boot Screen.
Sep 3 17:27:27 vhm-prd-02 systemd: Started Wait for Plymouth Boot Screen to Quit.
Sep 3 17:27:27 vhm-prd-02 systemd: Starting Serial Getty on ttyS1...
Sep 3 17:27:27 vhm-prd-02 systemd: Started Serial Getty on ttyS1.
Sep 3 17:27:27 vhm-prd-02 systemd: Starting Getty on tty1...
Sep 3 17:27:27 vhm-prd-02 systemd: Started Getty on tty1.
Sep 3 17:27:27 vhm-prd-02 systemd: Starting Login Prompts.
Sep 3 17:27:27 vhm-prd-02 systemd: Reached target Login Prompts.
Sep 3 17:27:27 vhm-prd-02 iscsid: iSCSI daemon with pid=1300 started!
Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address record for
fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.*.
Sep 3 17:27:27 vhm-prd-02 kdumpctl: kexec: loaded kdump kernel
Sep 3 17:27:27 vhm-prd-02 kdumpctl: Starting kdump: [OK]
Sep 3 17:27:27 vhm-prd-02 systemd: Started Crash recovery kernel arming.
Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address record for
fe80::d267:e5ff:fef0:e5c6 on em1.*.
Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for 1.2.3.16 on
ovirtmgmt.
Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Leaving mDNS multicast group on interface
ovirtmgmt.IPv4 with address 1.2.3.16.
Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Interface ovirtmgmt.IPv4 no longer relevant
for mDNS.
Sep 3 17:27:27 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered disabled state
Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for
fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.
Sep 3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for
fe80::d267:e5ff:fef0:e5c6 on em1.
Sep 3 17:27:28 vhm-prd-02 kernel: device em1 left promiscuous mode
Sep 3 17:27:28 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered disabled state
Sep 3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing workstation service for
ovirtmgmt.
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call last):
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/vdsm-restore-net-config", line 345, in <module>
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: restore(args)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/vdsm-restore-net-config", line 314, in restore
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: unified_restoration()
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/vdsm-restore-net-config", line 93, in unified_restoration
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: setupNetworks(nets, bonds, connectivityCheck=False,
_inRollback=True)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/api.py",
line 642, in setupNetworks
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: implicitBonding=False, _netinfo=_netinfo)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/api.py",
line 213, in wrapped
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: ret = func(**attrs)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/api.py",
line 429, in delNetwork
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: netEnt.remove()
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/models.py",
line 100, in remove
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: self.configurator.removeNic(self)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/configurators/ifcfg.py", line 215, in removeNic
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: self.configApplier.removeNic(nic.name)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/configurators/ifcfg.py", line 657, in removeNic
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: with open(cf) as nicFile:
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: IOError: [Errno 2] No such file or directory:
u'/etc/sysconfig/network-scripts/ifcfg-p4p1'
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call last):
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/bin/vdsm-tool", line 219, in
main
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: return
tool_command[cmd]["command"](*args)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 40, in
restore_command
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: exec_restore(cmd)
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 53, in
exec_restore
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: raise EnvironmentError('Failed to restore the
persisted networks')
Sep 3 17:27:28 vhm-prd-02 vdsm-tool: EnvironmentError: Failed to restore the persisted
networks
Sep 3 17:27:28 vhm-prd-02 systemd: vdsm-network.service: main process exited,
code=exited, status=1/FAILURE
Sep 3 17:27:28 vhm-prd-02 systemd: Failed to start Virtual Desktop Server Manager
network restoration.
Sep 3 17:27:28 vhm-prd-02 systemd: Dependency failed for Virtual Desktop Server
Manager.
Sep 3 17:27:28 vhm-prd-02 systemd:
Sep 3 17:27:28 vhm-prd-02 systemd: Unit vdsm-network.service entered failed state.
Sep 3 17:27:33 vhm-prd-02 systemd: Started Postfix Mail Transport Agent.
Sep 3 17:27:33 vhm-prd-02 systemd: Starting Multi-User System.
Sep 3 17:27:33 vhm-prd-02 systemd: Reached target Multi-User System.
Sep 3 17:27:33 vhm-prd-02 systemd: Starting Update UTMP about System Runlevel
Changes...
Sep 3 17:27:33 vhm-prd-02 systemd: Starting Stop Read-Ahead Data Collection 10s After
Completed Startup.
Sep 3 17:27:33 vhm-prd-02 systemd: Started Stop Read-Ahead Data Collection 10s After
Completed Startup.
Sep 3 17:27:33 vhm-prd-02 systemd: Started Update UTMP about System Runlevel Changes.
Sep 3 17:27:33 vhm-prd-02 systemd: Startup finished in 2.964s (kernel) + 2.507s (initrd)
+ 15.996s (userspace) = 21.468s.
So, as I have two more hosts, that need updating, I'm happy to assist in
bisecting and debugging this update issue. Suggestions and help are very
welcome.
Thanks for this important report. I assume that calling
vdsClient -s 0 setSafeNetworkConfig
on the host before upgrade would make your problems go away, please do
not do that yet - your assistence in debugging this further is
important.