On 06.09.2015 11:30, Dan Kenigsberg wrote:
On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
> Hi all,
>
> I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine on
> CentOS 7.1). The engine update went fine. Updating the hosts succeeds until the
> first reboot. After a reboot the host does not come up again. It is missing all
> network configuration. All network cfgs in /etc/sysconfig/network-scripts are
> missing except ifcfg-lo. The host boots up without working networking. Using
> IPMI and config backups, I was able to restore the lost network configs. Once
> these are restored and the host is rebooted again all seems to be back to good.
> This has now happend to 2 updated hosts (this installation has a total of 4
> hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.
>
> Before updating the second host, I gathered some information. All these hosts
> have 3 physical nics. One is used for the ovirtmgmt bridge and the other 2 are
> used for iSCSI storage vlans.
>
> ifcfgs before update:
>
> /etc/sysconfig/network-scripts/ifcfg-em1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=em1
> HWADDR=d0:67:e5:f0:e5:c6
> BRIDGE=ovirtmgmt
> ONBOOT=yes
> NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-lo
> DEVICE=lo
> IPADDR=127.0.0.1
> NETMASK=255.0.0.0
> NETWORK=127.0.0.0
> # If you're having problems with gated making 127.0.0.0/8 a martian,
> # you can change this to something else (255.255.255.255, for example)
> BROADCAST=127.255.255.255
> ONBOOT=yes
> NAME=loopback
>
> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=ovirtmgmt
> TYPE=Bridge
> DELAY=0
> STP=off
> ONBOOT=yes
> IPADDR=1.2.3.16
> NETMASK=255.255.255.0
> GATEWAY=1.2.3.11
> BOOTPROTO=none
> DEFROUTE=yes
> NM_CONTROLLED=no
> HOTPLUG=no
>
> /etc/sysconfig/network-scripts/ifcfg-p4p1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=p4p1
> HWADDR=68:05:ca:01:bc:0c
> ONBOOT=no
> IPADDR=4.5.7.102
> NETMASK=255.255.255.0
> BOOTPROTO=none
> MTU=9000
> DEFROUTE=no
> NM_CONTROLLED=no
>
> /etc/sysconfig/network-scripts/ifcfg-p3p1
> # Generated by VDSM version 4.16.20-0.el7.centos
> DEVICE=p3p1
> HWADDR=68:05:ca:18:86:45
> ONBOOT=no
> IPADDR=4.5.6.102
> NETMASK=255.255.255.0
> BOOTPROTO=none
> MTU=9000
> DEFROUTE=no
> NM_CONTROLLED=no
>
> /etc/sysconfig/network-scripts/ifcfg-lo
>
>
> ip link before update:
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode
DEFAULT
> link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
> 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt
state UP mode DEFAULT qlen 1000
> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> 4: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP
mode DEFAULT qlen 1000
> link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
> 5: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP
mode DEFAULT qlen 1000
> link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
mode DEFAULT
> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> 8: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT
> link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
>
> vdsm files before update:
> /var/lib/vdsm
> /var/lib/vdsm/bonding-defaults.json
> /var/lib/vdsm/netconfback
> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
> /var/lib/vdsm/netconfback/ifcfg-em1
> /var/lib/vdsm/netconfback/route-ovirtmgmt
> /var/lib/vdsm/netconfback/rule-ovirtmgmt
> /var/lib/vdsm/netconfback/ifcfg-p4p1
> /var/lib/vdsm/netconfback/ifcfg-p3p1
> /var/lib/vdsm/persistence
> /var/lib/vdsm/persistence/netconf
> /var/lib/vdsm/persistence/netconf.1416666697752319079
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
> /var/lib/vdsm/upgrade
> /var/lib/vdsm/upgrade/upgrade-unified-persistence
> /var/lib/vdsm/transient
>
>
> File in /var/lib/vdsm/netconfback each only contained a comment:
> # original file did not exist
This is quite peculiar. Do you know when these where created?
Have you made any networking changes on 3.5.3 just before boot?
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
> {"nic": "em1", "netmask": "255.255.255.0",
"bootproto": "none", "ipaddr": "1.2.3.16",
"gateway": "1.2.3.11"}
>
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
> {"nic": "p3p1", "netmask": "255.255.255.0",
"ipaddr": "4.5.6.102", "bridged": "false",
"mtu": "9000"}
>
> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
> {"nic": "p4p1", "netmask": "255.255.255.0",
"ipaddr": "4.5.7.102", "bridged": "false",
"mtu": "9000"}
>
>
> After update and reboot, no ifcfg scripts are left. Only interface lo is up.
> Syslog doess not seem to contain anything suspicious before refore reboot.
Have you tweaked vdsm.conf in any way? In particular did you set
net_persistence?
> Log excerpts from bootup:
>
> Sep 3 17:27:23 vhm-prd-02 network: Bringing up loopback interface: [ OK ]
> Sep 3 17:27:23 vhm-prd-02 systemd-ovirt-ha-agent: Starting ovirt-ha-agent: [ OK ]
> Sep 3 17:27:23 vhm-prd-02 systemd: Started oVirt Hosted Engine High Availability
Monitoring Agent.
> Sep 3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
> Sep 3 17:27:23 vhm-prd-02 kernel: device em1 entered promiscuous mode
> Sep 3 17:27:23 vhm-prd-02 network: Bringing up interface em1: [ OK ]
> Sep 3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): ovirtmgmt: link is not
ready
> Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: Joining mDNS multicast group on
interface ovirtmgmt.IPv4 with address 1.2.3.16.
> Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: New relevant interface ovirtmgmt.IPv4
for mDNS.
> Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: Registering new address record for
1.2.3.16 on ovirtmgmt.IPv4.
> Sep 3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Link is up at 1000 Mbps,
full duplex
> Sep 3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Flow control is off for TX
and off for RX
> Sep 3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes
ready
> Sep 3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered forwarding state
> Sep 3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered forwarding state
> Sep 3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ovirtmgmt: link
becomes ready
> Sep 3 17:27:26 vhm-prd-02 network: Bringing up interface ovirtmgmt: [ OK ]
> Sep 3 17:27:26 vhm-prd-02 systemd: Started LSB: Bring up/down networking.
> Sep 3 17:27:26 vhm-prd-02 systemd: Starting Network.
> Sep 3 17:27:26 vhm-prd-02 systemd: Reached target Network.
>
> So ovirtmgmt and em1 were restore and initialized just fine (p3p1 and p4p1
> should have been started, too, but engine configured them as ONBOOT=no).
>
> Further in messages (full log is attached):
would you also attach your post-boot supervdsm.log?
> Sep 3 17:27:26 vhm-prd-02 systemd: Starting Virtual Desktop Server Manager network
restoration...
> Sep 3 17:27:26 vhm-prd-02 systemd: Started OSAD daemon.
> Sep 3 17:27:27 vhm-prd-02 systemd: Started Terminate Plymouth Boot Screen.
> Sep 3 17:27:27 vhm-prd-02 systemd: Started Wait for Plymouth Boot Screen to Quit.
> Sep 3 17:27:27 vhm-prd-02 systemd: Starting Serial Getty on ttyS1...
> Sep 3 17:27:27 vhm-prd-02 systemd: Started Serial Getty on ttyS1.
> Sep 3 17:27:27 vhm-prd-02 systemd: Starting Getty on tty1...
> Sep 3 17:27:27 vhm-prd-02 systemd: Started Getty on tty1.
> Sep 3 17:27:27 vhm-prd-02 systemd: Starting Login Prompts.
> Sep 3 17:27:27 vhm-prd-02 systemd: Reached target Login Prompts.
> Sep 3 17:27:27 vhm-prd-02 iscsid: iSCSI daemon with pid=1300 started!
> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address record for
fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.*.
> Sep 3 17:27:27 vhm-prd-02 kdumpctl: kexec: loaded kdump kernel
> Sep 3 17:27:27 vhm-prd-02 kdumpctl: Starting kdump: [OK]
> Sep 3 17:27:27 vhm-prd-02 systemd: Started Crash recovery kernel arming.
> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address record for
fe80::d267:e5ff:fef0:e5c6 on em1.*.
> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for 1.2.3.16
on ovirtmgmt.
> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Leaving mDNS multicast group on
interface ovirtmgmt.IPv4 with address 1.2.3.16.
> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Interface ovirtmgmt.IPv4 no longer
relevant for mDNS.
> Sep 3 17:27:27 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered disabled state
> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for
fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.
> Sep 3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for
fe80::d267:e5ff:fef0:e5c6 on em1.
> Sep 3 17:27:28 vhm-prd-02 kernel: device em1 left promiscuous mode
> Sep 3 17:27:28 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered disabled state
> Sep 3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing workstation service for
ovirtmgmt.
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call last):
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/vdsm-restore-net-config", line 345, in <module>
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: restore(args)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/vdsm-restore-net-config", line 314, in restore
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: unified_restoration()
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/vdsm-restore-net-config", line 93, in unified_restoration
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: setupNetworks(nets, bonds,
connectivityCheck=False, _inRollback=True)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/api.py", line 642, in setupNetworks
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: implicitBonding=False, _netinfo=_netinfo)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/api.py", line 213, in wrapped
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: ret = func(**attrs)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/api.py", line 429, in delNetwork
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: netEnt.remove()
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/models.py", line 100, in remove
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: self.configurator.removeNic(self)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/configurators/ifcfg.py", line 215, in removeNic
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: self.configApplier.removeNic(nic.name)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/share/vdsm/network/configurators/ifcfg.py", line 657, in removeNic
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: with open(cf) as nicFile:
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: IOError: [Errno 2] No such file or directory:
u'/etc/sysconfig/network-scripts/ifcfg-p4p1'
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call last):
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/bin/vdsm-tool", line 219,
in main
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: return
tool_command[cmd]["command"](*args)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 40, in
restore_command
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: exec_restore(cmd)
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 53, in
exec_restore
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: raise EnvironmentError('Failed to restore
the persisted networks')
> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: EnvironmentError: Failed to restore the
persisted networks
> Sep 3 17:27:28 vhm-prd-02 systemd: vdsm-network.service: main process exited,
code=exited, status=1/FAILURE
> Sep 3 17:27:28 vhm-prd-02 systemd: Failed to start Virtual Desktop Server Manager
network restoration.
> Sep 3 17:27:28 vhm-prd-02 systemd: Dependency failed for Virtual Desktop Server
Manager.
> Sep 3 17:27:28 vhm-prd-02 systemd:
> Sep 3 17:27:28 vhm-prd-02 systemd: Unit vdsm-network.service entered failed state.
> Sep 3 17:27:33 vhm-prd-02 systemd: Started Postfix Mail Transport Agent.
> Sep 3 17:27:33 vhm-prd-02 systemd: Starting Multi-User System.
> Sep 3 17:27:33 vhm-prd-02 systemd: Reached target Multi-User System.
> Sep 3 17:27:33 vhm-prd-02 systemd: Starting Update UTMP about System Runlevel
Changes...
> Sep 3 17:27:33 vhm-prd-02 systemd: Starting Stop Read-Ahead Data Collection 10s
After Completed Startup.
> Sep 3 17:27:33 vhm-prd-02 systemd: Started Stop Read-Ahead Data Collection 10s After
Completed Startup.
> Sep 3 17:27:33 vhm-prd-02 systemd: Started Update UTMP about System Runlevel
Changes.
> Sep 3 17:27:33 vhm-prd-02 systemd: Startup finished in 2.964s (kernel) + 2.507s
(initrd) + 15.996s (userspace) = 21.468s.
>
> So, as I have two more hosts, that need updating, I'm happy to assist in
> bisecting and debugging this update issue. Suggestions and help are very
> welcome.
Thanks for this important report. I assume that calling
vdsClient -s 0 setSafeNetworkConfig
on the host before upgrade would make your problems go away, please do
not do that yet - your assistence in debugging this further is
important.
Hi Dan,
From backups I could extract the pre-update timestamps of the files in
/var/lib/vdsm/netconfback:
ifcfg-em1 2015-08-10 16:40:19
ifcfg-ovirtmgmt 2015-08-10 16:40:19
ifcfg-p3p1 2015-08-10 16:40:25
ifcfg-p4p1 2015-08-10 16:40:22
route-ovirtmgmt 2015-08-10 16:40:20
rule-ovirtmgmt 2015-08-10 16:40:20
The ifcfg-scripts had the same corresponding timestamps:
ifcfg-em1 2015-08-10 16:40:19
ifcfg-lo 2015-01-15 09:57:03
ifcfg-ovirtmgmt 2015-08-10 16:40:19
ifcfg-p3p1 2015-08-10 16:40:25
ifcfg-p4p1 2015-08-10 16:40:22
The attached supervdsm.log contains everything from network configuration
done on 2015-08-10 till vdsm update on 2015-09-03 at 17:20 and the reboot
performed afterwards.
The vdsm.conf on these hosts are not tweaked nor touched in any way. They
all contain the default as configured by engine-setup/deploy:
*[vars]
ssl = true
[addresses]
management_port = 54321
Thanks and best regards
Patrick
*
--
Lobster SCM GmbH, Hindenburgstraße 15, D-82343 Pöcking
HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich