[ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

Patrick Hurrelmann patrick.hurrelmann at lobster.de
Mon Sep 7 09:47:48 UTC 2015


On 06.09.2015 11:30, Dan Kenigsberg wrote:
> On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
>> Hi all,
>>
>> I just updated my existing oVirt 3.5.3 installation (iSCSI hosted-engine on
>> CentOS 7.1). The engine update went fine. Updating the hosts succeeds until the
>> first reboot. After a reboot the host does not come up again. It is missing all
>> network configuration. All network cfgs in /etc/sysconfig/network-scripts are
>> missing except ifcfg-lo. The host boots up without working networking. Using
>> IPMI and config backups, I was able to restore the lost network configs. Once
>> these are restored and the host is rebooted again all seems to be back to good.
>> This has now happend to 2 updated hosts (this installation has a total of 4
>> hosts, so 2 more to debug/try). I'm happy to assist in furter debugging.
>>
>> Before updating the second host, I gathered some information. All these hosts
>> have 3 physical nics. One is used for the ovirtmgmt bridge and the other 2 are
>> used for iSCSI storage vlans.
>>
>> ifcfgs before update:
>>
>> /etc/sysconfig/network-scripts/ifcfg-em1
>> # Generated by VDSM version 4.16.20-0.el7.centos
>> DEVICE=em1
>> HWADDR=d0:67:e5:f0:e5:c6
>> BRIDGE=ovirtmgmt
>> ONBOOT=yes
>> NM_CONTROLLED=no
> /etc/sysconfig/network-scripts/ifcfg-lo
>> DEVICE=lo
>> IPADDR=127.0.0.1
>> NETMASK=255.0.0.0
>> NETWORK=127.0.0.0
>> # If you're having problems with gated making 127.0.0.0/8 a martian,
>> # you can change this to something else (255.255.255.255, for example)
>> BROADCAST=127.255.255.255
>> ONBOOT=yes
>> NAME=loopback
>>
>> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
>> # Generated by VDSM version 4.16.20-0.el7.centos
>> DEVICE=ovirtmgmt
>> TYPE=Bridge
>> DELAY=0
>> STP=off
>> ONBOOT=yes
>> IPADDR=1.2.3.16
>> NETMASK=255.255.255.0
>> GATEWAY=1.2.3.11
>> BOOTPROTO=none
>> DEFROUTE=yes
>> NM_CONTROLLED=no
>> HOTPLUG=no
>>
>> /etc/sysconfig/network-scripts/ifcfg-p4p1
>> # Generated by VDSM version 4.16.20-0.el7.centos
>> DEVICE=p4p1
>> HWADDR=68:05:ca:01:bc:0c
>> ONBOOT=no
>> IPADDR=4.5.7.102
>> NETMASK=255.255.255.0
>> BOOTPROTO=none
>> MTU=9000
>> DEFROUTE=no
>> NM_CONTROLLED=no
>>
>> /etc/sysconfig/network-scripts/ifcfg-p3p1
>> # Generated by VDSM version 4.16.20-0.el7.centos
>> DEVICE=p3p1
>> HWADDR=68:05:ca:18:86:45
>> ONBOOT=no
>> IPADDR=4.5.6.102
>> NETMASK=255.255.255.0
>> BOOTPROTO=none
>> MTU=9000
>> DEFROUTE=no
>> NM_CONTROLLED=no
>>
>> /etc/sysconfig/network-scripts/ifcfg-lo
>>
>>
>> ip link before update:
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> 2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT
>>     link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
>> 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP mode DEFAULT qlen 1000
>>     link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
>> 4: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
>>     link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
>> 5: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
>>     link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
>> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
>>     link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
>> 8: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT
>>     link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
>>
>> vdsm files before update:
>> /var/lib/vdsm
>> /var/lib/vdsm/bonding-defaults.json
>> /var/lib/vdsm/netconfback
>> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
>> /var/lib/vdsm/netconfback/ifcfg-em1
>> /var/lib/vdsm/netconfback/route-ovirtmgmt
>> /var/lib/vdsm/netconfback/rule-ovirtmgmt
>> /var/lib/vdsm/netconfback/ifcfg-p4p1
>> /var/lib/vdsm/netconfback/ifcfg-p3p1
>> /var/lib/vdsm/persistence
>> /var/lib/vdsm/persistence/netconf
>> /var/lib/vdsm/persistence/netconf.1416666697752319079
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
>> /var/lib/vdsm/upgrade
>> /var/lib/vdsm/upgrade/upgrade-unified-persistence
>> /var/lib/vdsm/transient
>>
>>
>> File in /var/lib/vdsm/netconfback each only contained a comment:
>> # original file did not exist
> This is quite peculiar. Do you know when these where created?
> Have you made any networking changes on 3.5.3 just before boot?
>
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
>> {"nic": "em1", "netmask": "255.255.255.0", "bootproto": "none", "ipaddr": "1.2.3.16", "gateway": "1.2.3.11"}
>>
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
>> {"nic": "p3p1", "netmask": "255.255.255.0", "ipaddr": "4.5.6.102", "bridged": "false", "mtu": "9000"}
>>
>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
>> {"nic": "p4p1", "netmask": "255.255.255.0", "ipaddr": "4.5.7.102", "bridged": "false", "mtu": "9000"}
>>
>>
>> After update and reboot, no ifcfg scripts are left. Only interface lo is up.
>> Syslog doess not seem to contain anything suspicious before refore reboot.
>
> Have you tweaked vdsm.conf in any way? In particular did you set
> net_persistence?
>
>> Log excerpts from bootup:
>>
>> Sep  3 17:27:23 vhm-prd-02 network: Bringing up loopback interface:  [  OK  ]
>> Sep  3 17:27:23 vhm-prd-02 systemd-ovirt-ha-agent: Starting ovirt-ha-agent: [  OK  ]
>> Sep  3 17:27:23 vhm-prd-02 systemd: Started oVirt Hosted Engine High Availability Monitoring Agent.
>> Sep  3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
>> Sep  3 17:27:23 vhm-prd-02 kernel: device em1 entered promiscuous mode
>> Sep  3 17:27:23 vhm-prd-02 network: Bringing up interface em1:  [  OK  ]
>> Sep  3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): ovirtmgmt: link is not ready
>> Sep  3 17:27:25 vhm-prd-02 avahi-daemon[778]: Joining mDNS multicast group on interface ovirtmgmt.IPv4 with address 1.2.3.16.
>> Sep  3 17:27:25 vhm-prd-02 avahi-daemon[778]: New relevant interface ovirtmgmt.IPv4 for mDNS.
>> Sep  3 17:27:25 vhm-prd-02 avahi-daemon[778]: Registering new address record for 1.2.3.16 on ovirtmgmt.IPv4.
>> Sep  3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Link is up at 1000 Mbps, full duplex
>> Sep  3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Flow control is off for TX and off for RX
>> Sep  3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
>> Sep  3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered forwarding state
>> Sep  3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered forwarding state
>> Sep  3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ovirtmgmt: link becomes ready
>> Sep  3 17:27:26 vhm-prd-02 network: Bringing up interface ovirtmgmt:  [  OK  ]
>> Sep  3 17:27:26 vhm-prd-02 systemd: Started LSB: Bring up/down networking.
>> Sep  3 17:27:26 vhm-prd-02 systemd: Starting Network.
>> Sep  3 17:27:26 vhm-prd-02 systemd: Reached target Network.
>>
>> So ovirtmgmt and em1 were restore and initialized just fine (p3p1 and p4p1
>> should have been started, too, but engine configured them as ONBOOT=no).
>>
>> Further in messages (full log is attached):
> would you also attach your post-boot supervdsm.log?
>
>> Sep  3 17:27:26 vhm-prd-02 systemd: Starting Virtual Desktop Server Manager network restoration...
>> Sep  3 17:27:26 vhm-prd-02 systemd: Started OSAD daemon.
>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Terminate Plymouth Boot Screen.
>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Wait for Plymouth Boot Screen to Quit.
>> Sep  3 17:27:27 vhm-prd-02 systemd: Starting Serial Getty on ttyS1...
>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Serial Getty on ttyS1.
>> Sep  3 17:27:27 vhm-prd-02 systemd: Starting Getty on tty1...
>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Getty on tty1.
>> Sep  3 17:27:27 vhm-prd-02 systemd: Starting Login Prompts.
>> Sep  3 17:27:27 vhm-prd-02 systemd: Reached target Login Prompts.
>> Sep  3 17:27:27 vhm-prd-02 iscsid: iSCSI daemon with pid=1300 started!
>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address record for fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.*.
>> Sep  3 17:27:27 vhm-prd-02 kdumpctl: kexec: loaded kdump kernel
>> Sep  3 17:27:27 vhm-prd-02 kdumpctl: Starting kdump: [OK]
>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Crash recovery kernel arming.
>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address record for fe80::d267:e5ff:fef0:e5c6 on em1.*.
>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for 1.2.3.16 on ovirtmgmt.
>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Leaving mDNS multicast group on interface ovirtmgmt.IPv4 with address 1.2.3.16.
>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Interface ovirtmgmt.IPv4 no longer relevant for mDNS.
>> Sep  3 17:27:27 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered disabled state
>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.
>> Sep  3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing address record for fe80::d267:e5ff:fef0:e5c6 on em1.
>> Sep  3 17:27:28 vhm-prd-02 kernel: device em1 left promiscuous mode
>> Sep  3 17:27:28 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered disabled state
>> Sep  3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing workstation service for ovirtmgmt.
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call last):
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/vdsm-restore-net-config", line 345, in <module>
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: restore(args)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/vdsm-restore-net-config", line 314, in restore
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: unified_restoration()
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/vdsm-restore-net-config", line 93, in unified_restoration
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: setupNetworks(nets, bonds, connectivityCheck=False, _inRollback=True)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/api.py", line 642, in setupNetworks
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: implicitBonding=False, _netinfo=_netinfo)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/api.py", line 213, in wrapped
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: ret = func(**attrs)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/api.py", line 429, in delNetwork
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: netEnt.remove()
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/models.py", line 100, in remove
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: self.configurator.removeNic(self)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/configurators/ifcfg.py", line 215, in removeNic
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: self.configApplier.removeNic(nic.name)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/share/vdsm/network/configurators/ifcfg.py", line 657, in removeNic
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: with open(cf) as nicFile:
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: IOError: [Errno 2] No such file or directory: u'/etc/sysconfig/network-scripts/ifcfg-p4p1'
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call last):
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/bin/vdsm-tool", line 219, in main
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: return tool_command[cmd]["command"](*args)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 40, in restore_command
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: exec_restore(cmd)
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 53, in exec_restore
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: raise EnvironmentError('Failed to restore the persisted networks')
>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: EnvironmentError: Failed to restore the persisted networks
>> Sep  3 17:27:28 vhm-prd-02 systemd: vdsm-network.service: main process exited, code=exited, status=1/FAILURE
>> Sep  3 17:27:28 vhm-prd-02 systemd: Failed to start Virtual Desktop Server Manager network restoration.
>> Sep  3 17:27:28 vhm-prd-02 systemd: Dependency failed for Virtual Desktop Server Manager.
>> Sep  3 17:27:28 vhm-prd-02 systemd:
>> Sep  3 17:27:28 vhm-prd-02 systemd: Unit vdsm-network.service entered failed state.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Started Postfix Mail Transport Agent.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Starting Multi-User System.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Reached target Multi-User System.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Starting Update UTMP about System Runlevel Changes...
>> Sep  3 17:27:33 vhm-prd-02 systemd: Starting Stop Read-Ahead Data Collection 10s After Completed Startup.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Started Stop Read-Ahead Data Collection 10s After Completed Startup.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Started Update UTMP about System Runlevel Changes.
>> Sep  3 17:27:33 vhm-prd-02 systemd: Startup finished in 2.964s (kernel) + 2.507s (initrd) + 15.996s (userspace) = 21.468s.
>>
>> So, as I have two more hosts, that need updating, I'm happy to assist in
>> bisecting and debugging this update issue. Suggestions and help are very
>> welcome.
> Thanks for this important report. I assume that calling
>
>   vdsClient -s 0 setSafeNetworkConfig
>
> on the host before upgrade would make your problems go away, please do
> not do that yet - your assistence in debugging this further is
> important.
Hi Dan,

>From backups I could extract the pre-update timestamps of the files in
/var/lib/vdsm/netconfback:
ifcfg-em1       2015-08-10 16:40:19
ifcfg-ovirtmgmt 2015-08-10 16:40:19
ifcfg-p3p1      2015-08-10 16:40:25
ifcfg-p4p1      2015-08-10 16:40:22
route-ovirtmgmt 2015-08-10 16:40:20
rule-ovirtmgmt  2015-08-10 16:40:20

The ifcfg-scripts had the same corresponding timestamps:
ifcfg-em1       2015-08-10 16:40:19
ifcfg-lo        2015-01-15 09:57:03
ifcfg-ovirtmgmt 2015-08-10 16:40:19
ifcfg-p3p1      2015-08-10 16:40:25
ifcfg-p4p1      2015-08-10 16:40:22

The attached supervdsm.log contains everything from network configuration
done on 2015-08-10 till vdsm update on 2015-09-03 at 17:20 and the reboot
performed afterwards.

The vdsm.conf on these hosts are not tweaked nor touched in any way. They
all contain the default as configured by engine-setup/deploy:

*[vars]
ssl = true

[addresses]
management_port = 54321

Thanks and best regards
Patrick
*

-- 
Lobster SCM GmbH, Hindenburgstraße 15, D-82343 Pöcking
HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

-------------- next part --------------
A non-text attachment was scrubbed...
Name: supervdsm.log
Type: text/x-log
Size: 151902 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150907/273cf334/attachment-0001.bin>


More information about the Users mailing list