[ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

Michael Burman mburman at redhat.com
Wed Sep 9 11:24:03 EDT 2015


Ho Ondra and All

I tested your patch and seems to work, i will explain my test and steps -->
Note- no upgrade was involved here, yet.

1) Clean rhel 7.1 3.5.3(vt15.3) with  vdsm-4.16.20-1.el7ev.x86_64
2) Installed in rhev-m 3.5.4 latest vt16.9
3) configured some networks via SN
[root at orchid-vds2 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
;vdsmdummy;             8000.000000000000       no
rhevm           8000.001a647a9462       no              enp4s0
t1              8000.0015173dcdce       no              ens1f0
t2              8000.0015173dcdcf       no              ens1f1.151
vdsm_net                8000.001a647a9464       no              enp6s0

4) removed ifcfg files for the networks 't1', 't2', vdsm_net'
5) Applied Ondra's patch
6) Rebooted server

Result :
Host is UP in rhev-m, all networks restored and recovered, as well their ifcfg files.

[root at orchid-vds2 ~]# tree /var/lib/vdsm/netconfback/ifcfg-*
/var/lib/vdsm/netconfback/ifcfg-enp4s0 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-enp6s0 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-ens1f0 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-ens1f1 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-ens1f1.151 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-rhevm [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-t1 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-t2 [error opening dir]
/var/lib/vdsm/netconfback/ifcfg-vdsm_net [error opening dir]

0 directories, 0 files


- Let me know if it's good enough for you guys, if you want to run upgrade from here to latest 3.5.4 - vdsm 4.16.26- 
- Attaching vdsm logs, if needed.

Kind regards,
Michael B 

----- Original Message -----
From: "Ondřej Svoboda" <osvoboda at redhat.com>
To: "Patrick Hurrelmann" <patrick.hurrelmann at lobster.de>
Cc: "Dan Kenigsberg" <danken at redhat.com>, "oVirt Mailing List" <users at ovirt.org>, "Michael Burman" <mburman at redhat.com>
Sent: Wednesday, September 9, 2015 2:19:23 PM
Subject: Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4

Hi everyone,

it turns out that ifcfg files can be lost even in this very simple scenario:

1) Install/upgrade to VDSM 4.16.21/oVirt 3.5.4
2) Setup a network over eth0
   vdsClient -s 0 setupNetworks 'networks={pokus:{nic:eth0,bootproto:dhcp,blockingdhcp:true,bridged:false}}'
3) Persist the configuration (declare it safe)
   vdsClient -s 0 setSafeNetworkConfig
4) Add a placeholder in /var/lib/vdsm/netconfback/ifcfg-eth0 with:
# original file did not exist
5) Reboot

I created a fix [1] and prepared it for backport to 3.6 [2] and 3.5 branches [3] (so as to appear in 3.5.5) and linked it to https://bugzilla.redhat.com/show_bug.cgi?id=1256252

Patrick, to apply the patch you can also run the two commands and paste it (the line after "nicFile.writelines(l)" is a single space, so please add it if it gets eaten by e-mail goblins):

cd /usr/share/vdsm/
patch -p1

diff --git vdsm/network/configurators/ifcfg.py vdsm/network/configurators/ifcfg.py
index 161a3b2..8332224 100644
--- vdsm/network/configurators/ifcfg.py
+++ vdsm/network/configurators/ifcfg.py
@@ -647,11 +647,21 @@ class ConfigWriter(object):
     def removeNic(self, nic):
         cf = netinfo.NET_CONF_PREF + nic
         self._backup(cf)
-        with open(cf) as nicFile:
-            hwlines = [line for line in nicFile if line.startswith('HWADDR=')]
+        try:
+            with open(cf) as nicFile:
+                hwlines = [line for line in nicFile if line.startswith(
+                    'HWADDR=')]
+        except IOError as e:
+            logging.warning("%s couldn't be read (errno %s)", cf, e.errno)
+            try:
+                hwlines = ['HWADDR=%s\n' % netinfo.gethwaddr(nic)]
+            except IOError as e:
+                logging.exception("couldn't determine hardware address of %s "
+                                  "(errno %s)", nic, e.errno)
+                hwlines = []
         l = [self.CONFFILE_HEADER + '\n', 'DEVICE=%s\n' % nic, 'ONBOOT=yes\n',
              'MTU=%s\n' % netinfo.DEFAULT_MTU] + hwlines
-        l += 'NM_CONTROLLED=no\n'
+        l.append('NM_CONTROLLED=no\n')
         with open(cf, 'w') as nicFile:
             nicFile.writelines(l)
 

Michael, will you please give it a try as well?

Thanks,
Ondra

[1] https://gerrit.ovirt.org/#/c/45893/
[2] https://gerrit.ovirt.org/#/c/45932/
[3] https://gerrit.ovirt.org/#/c/45933/

----- Original Message -----
> From: "Patrick Hurrelmann" <patrick.hurrelmann at lobster.de>
> To: "Dan Kenigsberg" <danken at redhat.com>
> Cc: "oVirt Mailing List" <users at ovirt.org>
> Sent: Monday, September 7, 2015 2:46:05 PM
> Subject: Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4
> 
> On 07.09.2015 14:44, Patrick Hurrelmann wrote:
> > On 07.09.2015 13:54, Dan Kenigsberg wrote:
> >> On Mon, Sep 07, 2015 at 11:47:48AM +0200, Patrick Hurrelmann wrote:
> >>> On 06.09.2015 11:30, Dan Kenigsberg wrote:
> >>>> On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I just updated my existing oVirt 3.5.3 installation (iSCSI
> >>>>> hosted-engine on
> >>>>> CentOS 7.1). The engine update went fine. Updating the hosts succeeds
> >>>>> until the
> >>>>> first reboot. After a reboot the host does not come up again. It is
> >>>>> missing all
> >>>>> network configuration. All network cfgs in
> >>>>> /etc/sysconfig/network-scripts are
> >>>>> missing except ifcfg-lo. The host boots up without working networking.
> >>>>> Using
> >>>>> IPMI and config backups, I was able to restore the lost network
> >>>>> configs. Once
> >>>>> these are restored and the host is rebooted again all seems to be back
> >>>>> to good.
> >>>>> This has now happend to 2 updated hosts (this installation has a total
> >>>>> of 4
> >>>>> hosts, so 2 more to debug/try). I'm happy to assist in furter
> >>>>> debugging.
> >>>>>
> >>>>> Before updating the second host, I gathered some information. All these
> >>>>> hosts
> >>>>> have 3 physical nics. One is used for the ovirtmgmt bridge and the
> >>>>> other 2 are
> >>>>> used for iSCSI storage vlans.
> >>>>>
> >>>>> ifcfgs before update:
> >>>>>
> >>>>> /etc/sysconfig/network-scripts/ifcfg-em1
> >>>>> # Generated by VDSM version 4.16.20-0.el7.centos
> >>>>> DEVICE=em1
> >>>>> HWADDR=d0:67:e5:f0:e5:c6
> >>>>> BRIDGE=ovirtmgmt
> >>>>> ONBOOT=yes
> >>>>> NM_CONTROLLED=no
> >>>> /etc/sysconfig/network-scripts/ifcfg-lo
> >>>>> DEVICE=lo
> >>>>> IPADDR=127.0.0.1
> >>>>> NETMASK=255.0.0.0
> >>>>> NETWORK=127.0.0.0
> >>>>> # If you're having problems with gated making 127.0.0.0/8 a martian,
> >>>>> # you can change this to something else (255.255.255.255, for example)
> >>>>> BROADCAST=127.255.255.255
> >>>>> ONBOOT=yes
> >>>>> NAME=loopback
> >>>>>
> >>>>> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> >>>>> # Generated by VDSM version 4.16.20-0.el7.centos
> >>>>> DEVICE=ovirtmgmt
> >>>>> TYPE=Bridge
> >>>>> DELAY=0
> >>>>> STP=off
> >>>>> ONBOOT=yes
> >>>>> IPADDR=1.2.3.16
> >>>>> NETMASK=255.255.255.0
> >>>>> GATEWAY=1.2.3.11
> >>>>> BOOTPROTO=none
> >>>>> DEFROUTE=yes
> >>>>> NM_CONTROLLED=no
> >>>>> HOTPLUG=no
> >>>>>
> >>>>> /etc/sysconfig/network-scripts/ifcfg-p4p1
> >>>>> # Generated by VDSM version 4.16.20-0.el7.centos
> >>>>> DEVICE=p4p1
> >>>>> HWADDR=68:05:ca:01:bc:0c
> >>>>> ONBOOT=no
> >>>>> IPADDR=4.5.7.102
> >>>>> NETMASK=255.255.255.0
> >>>>> BOOTPROTO=none
> >>>>> MTU=9000
> >>>>> DEFROUTE=no
> >>>>> NM_CONTROLLED=no
> >>>>>
> >>>>> /etc/sysconfig/network-scripts/ifcfg-p3p1
> >>>>> # Generated by VDSM version 4.16.20-0.el7.centos
> >>>>> DEVICE=p3p1
> >>>>> HWADDR=68:05:ca:18:86:45
> >>>>> ONBOOT=no
> >>>>> IPADDR=4.5.6.102
> >>>>> NETMASK=255.255.255.0
> >>>>> BOOTPROTO=none
> >>>>> MTU=9000
> >>>>> DEFROUTE=no
> >>>>> NM_CONTROLLED=no
> >>>>>
> >>>>> /etc/sysconfig/network-scripts/ifcfg-lo
> >>>>>
> >>>>>
> >>>>> ip link before update:
> >>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> >>>>> mode DEFAULT
> >>>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >>>>> 2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
> >>>>> mode DEFAULT
> >>>>>     link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
> >>>>> 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> >>>>> ovirtmgmt state UP mode DEFAULT qlen 1000
> >>>>>     link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> >>>>> 4: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast
> >>>>> state UP mode DEFAULT qlen 1000
> >>>>>     link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
> >>>>> 5: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast
> >>>>> state UP mode DEFAULT qlen 1000
> >>>>>     link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
> >>>>> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> >>>>> state UP mode DEFAULT
> >>>>>     link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
> >>>>> 8: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> >>>>> mode DEFAULT
> >>>>>     link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
> >>>>>
> >>>>> vdsm files before update:
> >>>>> /var/lib/vdsm
> >>>>> /var/lib/vdsm/bonding-defaults.json
> >>>>> /var/lib/vdsm/netconfback
> >>>>> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
> >>>>> /var/lib/vdsm/netconfback/ifcfg-em1
> >>>>> /var/lib/vdsm/netconfback/route-ovirtmgmt
> >>>>> /var/lib/vdsm/netconfback/rule-ovirtmgmt
> >>>>> /var/lib/vdsm/netconfback/ifcfg-p4p1
> >>>>> /var/lib/vdsm/netconfback/ifcfg-p3p1
> >>>>> /var/lib/vdsm/persistence
> >>>>> /var/lib/vdsm/persistence/netconf
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
> >>>>> /var/lib/vdsm/upgrade
> >>>>> /var/lib/vdsm/upgrade/upgrade-unified-persistence
> >>>>> /var/lib/vdsm/transient
> >>>>>
> >>>>>
> >>>>> File in /var/lib/vdsm/netconfback each only contained a comment:
> >>>>> # original file did not exist
> >>>> This is quite peculiar. Do you know when these where created?
> >>>> Have you made any networking changes on 3.5.3 just before boot?
> >>>>
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
> >>>>> {"nic": "em1", "netmask": "255.255.255.0", "bootproto": "none",
> >>>>> "ipaddr": "1.2.3.16", "gateway": "1.2.3.11"}
> >>>>>
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
> >>>>> {"nic": "p3p1", "netmask": "255.255.255.0", "ipaddr": "4.5.6.102",
> >>>>> "bridged": "false", "mtu": "9000"}
> >>>>>
> >>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
> >>>>> {"nic": "p4p1", "netmask": "255.255.255.0", "ipaddr": "4.5.7.102",
> >>>>> "bridged": "false", "mtu": "9000"}
> >>>>>
> >>>>>
> >>>>> After update and reboot, no ifcfg scripts are left. Only interface lo
> >>>>> is up.
> >>>>> Syslog doess not seem to contain anything suspicious before refore
> >>>>> reboot.
> >>>> Have you tweaked vdsm.conf in any way? In particular did you set
> >>>> net_persistence?
> >>>>
> >>>>> Log excerpts from bootup:
> >>>>>
> >>>>> Sep  3 17:27:23 vhm-prd-02 network: Bringing up loopback interface:  [
> >>>>> OK  ]
> >>>>> Sep  3 17:27:23 vhm-prd-02 systemd-ovirt-ha-agent: Starting
> >>>>> ovirt-ha-agent: [  OK  ]
> >>>>> Sep  3 17:27:23 vhm-prd-02 systemd: Started oVirt Hosted Engine High
> >>>>> Availability Monitoring Agent.
> >>>>> Sep  3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link
> >>>>> is not ready
> >>>>> Sep  3 17:27:23 vhm-prd-02 kernel: device em1 entered promiscuous mode
> >>>>> Sep  3 17:27:23 vhm-prd-02 network: Bringing up interface em1:  [  OK
> >>>>> ]
> >>>>> Sep  3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP):
> >>>>> ovirtmgmt: link is not ready
> >>>>> Sep  3 17:27:25 vhm-prd-02 avahi-daemon[778]: Joining mDNS multicast
> >>>>> group on interface ovirtmgmt.IPv4 with address 1.2.3.16.
> >>>>> Sep  3 17:27:25 vhm-prd-02 avahi-daemon[778]: New relevant interface
> >>>>> ovirtmgmt.IPv4 for mDNS.
> >>>>> Sep  3 17:27:25 vhm-prd-02 avahi-daemon[778]: Registering new address
> >>>>> record for 1.2.3.16 on ovirtmgmt.IPv4.
> >>>>> Sep  3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Link is up at
> >>>>> 1000 Mbps, full duplex
> >>>>> Sep  3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Flow control
> >>>>> is off for TX and off for RX
> >>>>> Sep  3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1:
> >>>>> link becomes ready
> >>>>> Sep  3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
> >>>>> forwarding state
> >>>>> Sep  3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
> >>>>> forwarding state
> >>>>> Sep  3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
> >>>>> ovirtmgmt: link becomes ready
> >>>>> Sep  3 17:27:26 vhm-prd-02 network: Bringing up interface ovirtmgmt:  [
> >>>>> OK  ]
> >>>>> Sep  3 17:27:26 vhm-prd-02 systemd: Started LSB: Bring up/down
> >>>>> networking.
> >>>>> Sep  3 17:27:26 vhm-prd-02 systemd: Starting Network.
> >>>>> Sep  3 17:27:26 vhm-prd-02 systemd: Reached target Network.
> >>>>>
> >>>>> So ovirtmgmt and em1 were restore and initialized just fine (p3p1 and
> >>>>> p4p1
> >>>>> should have been started, too, but engine configured them as
> >>>>> ONBOOT=no).
> >>>>>
> >>>>> Further in messages (full log is attached):
> >>>> would you also attach your post-boot supervdsm.log?
> >>>>
> >>>>> Sep  3 17:27:26 vhm-prd-02 systemd: Starting Virtual Desktop Server
> >>>>> Manager network restoration...
> >>>>> Sep  3 17:27:26 vhm-prd-02 systemd: Started OSAD daemon.
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Terminate Plymouth Boot
> >>>>> Screen.
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Wait for Plymouth Boot
> >>>>> Screen to Quit.
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Starting Serial Getty on ttyS1...
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Serial Getty on ttyS1.
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Starting Getty on tty1...
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Getty on tty1.
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Starting Login Prompts.
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Reached target Login Prompts.
> >>>>> Sep  3 17:27:27 vhm-prd-02 iscsid: iSCSI daemon with pid=1300 started!
> >>>>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address
> >>>>> record for fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.*.
> >>>>> Sep  3 17:27:27 vhm-prd-02 kdumpctl: kexec: loaded kdump kernel
> >>>>> Sep  3 17:27:27 vhm-prd-02 kdumpctl: Starting kdump: [OK]
> >>>>> Sep  3 17:27:27 vhm-prd-02 systemd: Started Crash recovery kernel
> >>>>> arming.
> >>>>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new address
> >>>>> record for fe80::d267:e5ff:fef0:e5c6 on em1.*.
> >>>>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address
> >>>>> record for 1.2.3.16 on ovirtmgmt.
> >>>>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Leaving mDNS multicast
> >>>>> group on interface ovirtmgmt.IPv4 with address 1.2.3.16.
> >>>>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Interface ovirtmgmt.IPv4
> >>>>> no longer relevant for mDNS.
> >>>>> Sep  3 17:27:27 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
> >>>>> disabled state
> >>>>> Sep  3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address
> >>>>> record for fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.
> >>>>> Sep  3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing address
> >>>>> record for fe80::d267:e5ff:fef0:e5c6 on em1.
> >>>>> Sep  3 17:27:28 vhm-prd-02 kernel: device em1 left promiscuous mode
> >>>>> Sep  3 17:27:28 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
> >>>>> disabled state
> >>>>> Sep  3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing workstation
> >>>>> service for ovirtmgmt.
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call
> >>>>> last):
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/vdsm-restore-net-config", line 345, in <module>
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: restore(args)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/vdsm-restore-net-config", line 314, in restore
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: unified_restoration()
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/vdsm-restore-net-config", line 93, in
> >>>>> unified_restoration
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: setupNetworks(nets, bonds,
> >>>>> connectivityCheck=False, _inRollback=True)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/network/api.py", line 642, in setupNetworks
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: implicitBonding=False,
> >>>>> _netinfo=_netinfo)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/network/api.py", line 213, in wrapped
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: ret = func(**attrs)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/network/api.py", line 429, in delNetwork
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: netEnt.remove()
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/network/models.py", line 100, in remove
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: self.configurator.removeNic(self)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/network/configurators/ifcfg.py", line 215, in
> >>>>> removeNic
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool:
> >>>>> self.configApplier.removeNic(nic.name)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/share/vdsm/network/configurators/ifcfg.py", line 657, in
> >>>>> removeNic
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: with open(cf) as nicFile:
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: IOError: [Errno 2] No such file
> >>>>> or directory: u'/etc/sysconfig/network-scripts/ifcfg-p4p1'
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call
> >>>>> last):
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File "/usr/bin/vdsm-tool", line
> >>>>> 219, in main
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: return
> >>>>> tool_command[cmd]["command"](*args)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 40,
> >>>>> in restore_command
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: exec_restore(cmd)
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: File
> >>>>> "/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 53,
> >>>>> in exec_restore
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: raise EnvironmentError('Failed to
> >>>>> restore the persisted networks')
> >>>>> Sep  3 17:27:28 vhm-prd-02 vdsm-tool: EnvironmentError: Failed to
> >>>>> restore the persisted networks
> >>>>> Sep  3 17:27:28 vhm-prd-02 systemd: vdsm-network.service: main process
> >>>>> exited, code=exited, status=1/FAILURE
> >>>>> Sep  3 17:27:28 vhm-prd-02 systemd: Failed to start Virtual Desktop
> >>>>> Server Manager network restoration.
> >>>>> Sep  3 17:27:28 vhm-prd-02 systemd: Dependency failed for Virtual
> >>>>> Desktop Server Manager.
> >>>>> Sep  3 17:27:28 vhm-prd-02 systemd:
> >>>>> Sep  3 17:27:28 vhm-prd-02 systemd: Unit vdsm-network.service entered
> >>>>> failed state.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Started Postfix Mail Transport
> >>>>> Agent.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Starting Multi-User System.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Reached target Multi-User System.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Starting Update UTMP about System
> >>>>> Runlevel Changes...
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Starting Stop Read-Ahead Data
> >>>>> Collection 10s After Completed Startup.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Started Stop Read-Ahead Data
> >>>>> Collection 10s After Completed Startup.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Started Update UTMP about System
> >>>>> Runlevel Changes.
> >>>>> Sep  3 17:27:33 vhm-prd-02 systemd: Startup finished in 2.964s (kernel)
> >>>>> + 2.507s (initrd) + 15.996s (userspace) = 21.468s.
> >>>>>
> >>>>> So, as I have two more hosts, that need updating, I'm happy to assist
> >>>>> in
> >>>>> bisecting and debugging this update issue. Suggestions and help are
> >>>>> very
> >>>>> welcome.
> >>>> Thanks for this important report. I assume that calling
> >>>>
> >>>>   vdsClient -s 0 setSafeNetworkConfig
> >>>>
> >>>> on the host before upgrade would make your problems go away, please do
> >>>> not do that yet - your assistence in debugging this further is
> >>>> important.
> >>> Hi Dan,
> >>>
> >>> >From backups I could extract the pre-update timestamps of the files in
> >>> /var/lib/vdsm/netconfback:
> >>> ifcfg-em1       2015-08-10 16:40:19
> >>> ifcfg-ovirtmgmt 2015-08-10 16:40:19
> >>> ifcfg-p3p1      2015-08-10 16:40:25
> >>> ifcfg-p4p1      2015-08-10 16:40:22
> >>> route-ovirtmgmt 2015-08-10 16:40:20
> >>> rule-ovirtmgmt  2015-08-10 16:40:20
> >>>
> >>> The ifcfg-scripts had the same corresponding timestamps:
> >>> ifcfg-em1       2015-08-10 16:40:19
> >>> ifcfg-lo        2015-01-15 09:57:03
> >>> ifcfg-ovirtmgmt 2015-08-10 16:40:19
> >>> ifcfg-p3p1      2015-08-10 16:40:25
> >>> ifcfg-p4p1      2015-08-10 16:40:22
> >> Do you recall what has been done on 2015-08-10?
> >> Was your 3.5.3 host rebooted ever since?
> > I just tried to reconstruct the happings on 2015-08-10 and it seems, that
> > in fact
> > the network configuration was not touched. I was mislead by the dates. At
> > that
> > date/time an updated kernel and some more CentOS rpms where updated (the
> > whole cluster was updated one by one). A reboot on this specific host was
> > initiated after the update at 2015-08-10 16:40:04. The timestamps from my
> > previous email seem still to be _within_ the bootup-process. So yes, the
> > host
> > was rebooted ever since update to 3.5.3 (that happened on 2015-06-15).
> >
> > Reboots since 2015-06-15:
> > reboot   system boot  3.10.0-229.11.1. Mon Aug 10 16:56 - 14:34 (27+21:37)
> > reboot   system boot  3.10.0-229.7.2.e Mon Jul 27 17:48 - 16:53 (13+23:05)
> > reboot   system boot  3.10.0-229.7.2.e Wed Jun 24 16:46 - 17:46 (33+00:59)
> > reboot   system boot  3.10.0-229.4.2.e Mon Jun 15 18:10 - 16:44 (8+22:34)
> Wrong reboot list. The correct reboots for this host are:
> reboot   system boot  3.10.0-229.11.1. Thu Sep  3 17:42 - 13:58 (3+20:16)
> reboot   system boot  3.10.0-229.11.1. Thu Sep  3 17:27 - 17:40  (00:12)
> reboot   system boot  3.10.0-229.11.1. Mon Aug 10 16:40 - 17:23 (24+00:43)
> reboot   system boot  3.10.0-229.7.2.e Mon Jul 27 16:52 - 16:33 (13+23:40)
> reboot   system boot  3.10.0-229.7.2.e Thu Jul  9 11:10 - 16:49 (18+05:38)
> reboot   system boot  3.10.0-229.4.2.e Wed Jun 17 17:27 - 11:07 (21+17:40)
> reboot   system boot  3.10.0-229.4.2.e Mon Jun 15 17:22 - 17:23 (2+00:01)
> 
> > I checked the 2 remaining hosts (still 3.5.3) and both do not have any
> > different
> > content in /var/lib/vdsm/netconfback. Again only single line comments:
> > # original file did not exist
> >
> > My other productive oVirt 3.4 hosts don't even have these. The directory
> > /var/lib/vdsm/netconfback is empy on those.
> >
> > What should/could I check on the remaining 2 hosts prior to the update?
> > Try syncing the network-configuration and verify the contents in
> > /var/lib/vdsm/netconfback?
> >
> >> If the networks have been configured on the host back then, but never
> >> persisted, any reboot (regardless of upgrade) would cause their removal.
> >>
> >> Vdsm should be more robust in handling missing ifcfg; but that's a
> >> second-order bug
> >>
> >>     1256252     Vdsm should recover ifcfg files in case they are no
> >>     longer exist and recover all networks on the server
> >>
> >> I'd like to first understand how come you have these placeholders left
> >> behind.
> >>
> >>> The attached supervdsm.log contains everything from network configuration
> >>> done on 2015-08-10 till vdsm update on 2015-09-03 at 17:20 and the reboot
> >>> performed afterwards.
> >> Thanks. Maybe Ido could find further hints inside it
> 
> 
> --
> Lobster SCM GmbH, Hindenburgstraße 15, D-82343 Pöcking
> HRB 178831, Amtsgericht München
> Geschäftsführer: Dr. Martin Fischer, Rolf Henrich
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 

-- 
Michael Burman
RedHat Israel, RHEV-M QE Network Team

Mobile: 054-5355725
IRC: mburman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm logs.tar.gz
Type: application/x-compressed-tar
Size: 283385 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150909/0fb0e175/attachment-0001.bin>


More information about the Users mailing list