Hi everyone,
it turns out that ifcfg files can be lost even in this very simple scenario:
1) Install/upgrade to VDSM 4.16.21/oVirt 3.5.4
2) Setup a network over eth0
vdsClient -s 0 setupNetworks
'networks={pokus:{nic:eth0,bootproto:dhcp,blockingdhcp:true,bridged:false}}'
3) Persist the configuration (declare it safe)
vdsClient -s 0 setSafeNetworkConfig
4) Add a placeholder in /var/lib/vdsm/netconfback/ifcfg-eth0 with:
# original file did not exist
5) Reboot
I created a fix [1] and prepared it for backport to 3.6 [2] and 3.5 branches [3] (so as to
appear in 3.5.5) and linked it to
Patrick, to apply the patch you can also run the two commands and paste it (the line after
"nicFile.writelines(l)" is a single space, so please add it if it gets eaten by
e-mail goblins):
cd /usr/share/vdsm/
patch -p1
diff --git vdsm/network/configurators/ifcfg.py vdsm/network/configurators/ifcfg.py
index 161a3b2..8332224 100644
--- vdsm/network/configurators/ifcfg.py
+++ vdsm/network/configurators/ifcfg.py
@@ -647,11 +647,21 @@ class ConfigWriter(object):
def removeNic(self, nic):
cf = netinfo.NET_CONF_PREF + nic
self._backup(cf)
- with open(cf) as nicFile:
- hwlines = [line for line in nicFile if line.startswith('HWADDR=')]
+ try:
+ with open(cf) as nicFile:
+ hwlines = [line for line in nicFile if line.startswith(
+ 'HWADDR=')]
+ except IOError as e:
+ logging.warning("%s couldn't be read (errno %s)", cf, e.errno)
+ try:
+ hwlines = ['HWADDR=%s\n' % netinfo.gethwaddr(nic)]
+ except IOError as e:
+ logging.exception("couldn't determine hardware address of %s
"
+ "(errno %s)", nic, e.errno)
+ hwlines = []
l = [self.CONFFILE_HEADER + '\n', 'DEVICE=%s\n' % nic,
'ONBOOT=yes\n',
'MTU=%s\n' % netinfo.DEFAULT_MTU] + hwlines
- l += 'NM_CONTROLLED=no\n'
+ l.append('NM_CONTROLLED=no\n')
with open(cf, 'w') as nicFile:
nicFile.writelines(l)
Michael, will you please give it a try as well?
Thanks,
Ondra
[1]
From: "Patrick Hurrelmann"
<patrick.hurrelmann(a)lobster.de>
To: "Dan Kenigsberg" <danken(a)redhat.com>
Cc: "oVirt Mailing List" <users(a)ovirt.org>
Sent: Monday, September 7, 2015 2:46:05 PM
Subject: Re: [ovirt-users] Host loses all network configuration on update to oVirt 3.5.4
On 07.09.2015 14:44, Patrick Hurrelmann wrote:
> On 07.09.2015 13:54, Dan Kenigsberg wrote:
>> On Mon, Sep 07, 2015 at 11:47:48AM +0200, Patrick Hurrelmann wrote:
>>> On 06.09.2015 11:30, Dan Kenigsberg wrote:
>>>> On Fri, Sep 04, 2015 at 10:26:39AM +0200, Patrick Hurrelmann wrote:
>>>>> Hi all,
>>>>>
>>>>> I just updated my existing oVirt 3.5.3 installation (iSCSI
>>>>> hosted-engine on
>>>>> CentOS 7.1). The engine update went fine. Updating the hosts
succeeds
>>>>> until the
>>>>> first reboot. After a reboot the host does not come up again. It is
>>>>> missing all
>>>>> network configuration. All network cfgs in
>>>>> /etc/sysconfig/network-scripts are
>>>>> missing except ifcfg-lo. The host boots up without working
networking.
>>>>> Using
>>>>> IPMI and config backups, I was able to restore the lost network
>>>>> configs. Once
>>>>> these are restored and the host is rebooted again all seems to be
back
>>>>> to good.
>>>>> This has now happend to 2 updated hosts (this installation has a
total
>>>>> of 4
>>>>> hosts, so 2 more to debug/try). I'm happy to assist in furter
>>>>> debugging.
>>>>>
>>>>> Before updating the second host, I gathered some information. All
these
>>>>> hosts
>>>>> have 3 physical nics. One is used for the ovirtmgmt bridge and the
>>>>> other 2 are
>>>>> used for iSCSI storage vlans.
>>>>>
>>>>> ifcfgs before update:
>>>>>
>>>>> /etc/sysconfig/network-scripts/ifcfg-em1
>>>>> # Generated by VDSM version 4.16.20-0.el7.centos
>>>>> DEVICE=em1
>>>>> HWADDR=d0:67:e5:f0:e5:c6
>>>>> BRIDGE=ovirtmgmt
>>>>> ONBOOT=yes
>>>>> NM_CONTROLLED=no
>>>> /etc/sysconfig/network-scripts/ifcfg-lo
>>>>> DEVICE=lo
>>>>> IPADDR=127.0.0.1
>>>>> NETMASK=255.0.0.0
>>>>> NETWORK=127.0.0.0
>>>>> # If you're having problems with gated making 127.0.0.0/8 a
martian,
>>>>> # you can change this to something else (255.255.255.255, for
example)
>>>>> BROADCAST=127.255.255.255
>>>>> ONBOOT=yes
>>>>> NAME=loopback
>>>>>
>>>>> /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
>>>>> # Generated by VDSM version 4.16.20-0.el7.centos
>>>>> DEVICE=ovirtmgmt
>>>>> TYPE=Bridge
>>>>> DELAY=0
>>>>> STP=off
>>>>> ONBOOT=yes
>>>>> IPADDR=1.2.3.16
>>>>> NETMASK=255.255.255.0
>>>>> GATEWAY=1.2.3.11
>>>>> BOOTPROTO=none
>>>>> DEFROUTE=yes
>>>>> NM_CONTROLLED=no
>>>>> HOTPLUG=no
>>>>>
>>>>> /etc/sysconfig/network-scripts/ifcfg-p4p1
>>>>> # Generated by VDSM version 4.16.20-0.el7.centos
>>>>> DEVICE=p4p1
>>>>> HWADDR=68:05:ca:01:bc:0c
>>>>> ONBOOT=no
>>>>> IPADDR=4.5.7.102
>>>>> NETMASK=255.255.255.0
>>>>> BOOTPROTO=none
>>>>> MTU=9000
>>>>> DEFROUTE=no
>>>>> NM_CONTROLLED=no
>>>>>
>>>>> /etc/sysconfig/network-scripts/ifcfg-p3p1
>>>>> # Generated by VDSM version 4.16.20-0.el7.centos
>>>>> DEVICE=p3p1
>>>>> HWADDR=68:05:ca:18:86:45
>>>>> ONBOOT=no
>>>>> IPADDR=4.5.6.102
>>>>> NETMASK=255.255.255.0
>>>>> BOOTPROTO=none
>>>>> MTU=9000
>>>>> DEFROUTE=no
>>>>> NM_CONTROLLED=no
>>>>>
>>>>> /etc/sysconfig/network-scripts/ifcfg-lo
>>>>>
>>>>>
>>>>> ip link before update:
>>>>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state
UNKNOWN
>>>>> mode DEFAULT
>>>>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>>> 2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop
state DOWN
>>>>> mode DEFAULT
>>>>> link/ether 46:50:22:7a:f3:9d brd ff:ff:ff:ff:ff:ff
>>>>> 3: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
master
>>>>> ovirtmgmt state UP mode DEFAULT qlen 1000
>>>>> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
>>>>> 4: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
pfifo_fast
>>>>> state UP mode DEFAULT qlen 1000
>>>>> link/ether 68:05:ca:18:86:45 brd ff:ff:ff:ff:ff:ff
>>>>> 5: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
pfifo_fast
>>>>> state UP mode DEFAULT qlen 1000
>>>>> link/ether 68:05:ca:01:bc:0c brd ff:ff:ff:ff:ff:ff
>>>>> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue
>>>>> state UP mode DEFAULT
>>>>> link/ether d0:67:e5:f0:e5:c6 brd ff:ff:ff:ff:ff:ff
>>>>> 8: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
state DOWN
>>>>> mode DEFAULT
>>>>> link/ether ce:0f:16:49:a7:da brd ff:ff:ff:ff:ff:ff
>>>>>
>>>>> vdsm files before update:
>>>>> /var/lib/vdsm
>>>>> /var/lib/vdsm/bonding-defaults.json
>>>>> /var/lib/vdsm/netconfback
>>>>> /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt
>>>>> /var/lib/vdsm/netconfback/ifcfg-em1
>>>>> /var/lib/vdsm/netconfback/route-ovirtmgmt
>>>>> /var/lib/vdsm/netconfback/rule-ovirtmgmt
>>>>> /var/lib/vdsm/netconfback/ifcfg-p4p1
>>>>> /var/lib/vdsm/netconfback/ifcfg-p3p1
>>>>> /var/lib/vdsm/persistence
>>>>> /var/lib/vdsm/persistence/netconf
>>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079
>>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets
>>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
>>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
>>>>>
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
>>>>> /var/lib/vdsm/upgrade
>>>>> /var/lib/vdsm/upgrade/upgrade-unified-persistence
>>>>> /var/lib/vdsm/transient
>>>>>
>>>>>
>>>>> File in /var/lib/vdsm/netconfback each only contained a comment:
>>>>> # original file did not exist
>>>> This is quite peculiar. Do you know when these where created?
>>>> Have you made any networking changes on 3.5.3 just before boot?
>>>>
>>>>>
/var/lib/vdsm/persistence/netconf.1416666697752319079/nets/ovirtmgmt
>>>>> {"nic": "em1", "netmask":
"255.255.255.0", "bootproto": "none",
>>>>> "ipaddr": "1.2.3.16", "gateway":
"1.2.3.11"}
>>>>>
>>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san1
>>>>> {"nic": "p3p1", "netmask":
"255.255.255.0", "ipaddr": "4.5.6.102",
>>>>> "bridged": "false", "mtu":
"9000"}
>>>>>
>>>>> /var/lib/vdsm/persistence/netconf.1416666697752319079/nets/san2
>>>>> {"nic": "p4p1", "netmask":
"255.255.255.0", "ipaddr": "4.5.7.102",
>>>>> "bridged": "false", "mtu":
"9000"}
>>>>>
>>>>>
>>>>> After update and reboot, no ifcfg scripts are left. Only interface
lo
>>>>> is up.
>>>>> Syslog doess not seem to contain anything suspicious before refore
>>>>> reboot.
>>>> Have you tweaked vdsm.conf in any way? In particular did you set
>>>> net_persistence?
>>>>
>>>>> Log excerpts from bootup:
>>>>>
>>>>> Sep 3 17:27:23 vhm-prd-02 network: Bringing up loopback interface:
[
>>>>> OK ]
>>>>> Sep 3 17:27:23 vhm-prd-02 systemd-ovirt-ha-agent: Starting
>>>>> ovirt-ha-agent: [ OK ]
>>>>> Sep 3 17:27:23 vhm-prd-02 systemd: Started oVirt Hosted Engine
High
>>>>> Availability Monitoring Agent.
>>>>> Sep 3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP): em1:
link
>>>>> is not ready
>>>>> Sep 3 17:27:23 vhm-prd-02 kernel: device em1 entered promiscuous
mode
>>>>> Sep 3 17:27:23 vhm-prd-02 network: Bringing up interface em1: [
OK
>>>>> ]
>>>>> Sep 3 17:27:23 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_UP):
>>>>> ovirtmgmt: link is not ready
>>>>> Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: Joining mDNS
multicast
>>>>> group on interface ovirtmgmt.IPv4 with address 1.2.3.16.
>>>>> Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: New relevant
interface
>>>>> ovirtmgmt.IPv4 for mDNS.
>>>>> Sep 3 17:27:25 vhm-prd-02 avahi-daemon[778]: Registering new
address
>>>>> record for 1.2.3.16 on ovirtmgmt.IPv4.
>>>>> Sep 3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Link is up
at
>>>>> 1000 Mbps, full duplex
>>>>> Sep 3 17:27:26 vhm-prd-02 kernel: tg3 0000:03:00.0 em1: Flow
control
>>>>> is off for TX and off for RX
>>>>> Sep 3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
em1:
>>>>> link becomes ready
>>>>> Sep 3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
>>>>> forwarding state
>>>>> Sep 3 17:27:26 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
>>>>> forwarding state
>>>>> Sep 3 17:27:26 vhm-prd-02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
>>>>> ovirtmgmt: link becomes ready
>>>>> Sep 3 17:27:26 vhm-prd-02 network: Bringing up interface ovirtmgmt:
[
>>>>> OK ]
>>>>> Sep 3 17:27:26 vhm-prd-02 systemd: Started LSB: Bring up/down
>>>>> networking.
>>>>> Sep 3 17:27:26 vhm-prd-02 systemd: Starting Network.
>>>>> Sep 3 17:27:26 vhm-prd-02 systemd: Reached target Network.
>>>>>
>>>>> So ovirtmgmt and em1 were restore and initialized just fine (p3p1
and
>>>>> p4p1
>>>>> should have been started, too, but engine configured them as
>>>>> ONBOOT=no).
>>>>>
>>>>> Further in messages (full log is attached):
>>>> would you also attach your post-boot supervdsm.log?
>>>>
>>>>> Sep 3 17:27:26 vhm-prd-02 systemd: Starting Virtual Desktop Server
>>>>> Manager network restoration...
>>>>> Sep 3 17:27:26 vhm-prd-02 systemd: Started OSAD daemon.
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Started Terminate Plymouth Boot
>>>>> Screen.
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Started Wait for Plymouth Boot
>>>>> Screen to Quit.
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Starting Serial Getty on
ttyS1...
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Started Serial Getty on ttyS1.
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Starting Getty on tty1...
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Started Getty on tty1.
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Starting Login Prompts.
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Reached target Login Prompts.
>>>>> Sep 3 17:27:27 vhm-prd-02 iscsid: iSCSI daemon with pid=1300
started!
>>>>> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new
address
>>>>> record for fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.*.
>>>>> Sep 3 17:27:27 vhm-prd-02 kdumpctl: kexec: loaded kdump kernel
>>>>> Sep 3 17:27:27 vhm-prd-02 kdumpctl: Starting kdump: [OK]
>>>>> Sep 3 17:27:27 vhm-prd-02 systemd: Started Crash recovery kernel
>>>>> arming.
>>>>> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Registering new
address
>>>>> record for fe80::d267:e5ff:fef0:e5c6 on em1.*.
>>>>> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address
>>>>> record for 1.2.3.16 on ovirtmgmt.
>>>>> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Leaving mDNS
multicast
>>>>> group on interface ovirtmgmt.IPv4 with address 1.2.3.16.
>>>>> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Interface
ovirtmgmt.IPv4
>>>>> no longer relevant for mDNS.
>>>>> Sep 3 17:27:27 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
>>>>> disabled state
>>>>> Sep 3 17:27:27 vhm-prd-02 avahi-daemon[778]: Withdrawing address
>>>>> record for fe80::d267:e5ff:fef0:e5c6 on ovirtmgmt.
>>>>> Sep 3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing address
>>>>> record for fe80::d267:e5ff:fef0:e5c6 on em1.
>>>>> Sep 3 17:27:28 vhm-prd-02 kernel: device em1 left promiscuous mode
>>>>> Sep 3 17:27:28 vhm-prd-02 kernel: ovirtmgmt: port 1(em1) entered
>>>>> disabled state
>>>>> Sep 3 17:27:28 vhm-prd-02 avahi-daemon[778]: Withdrawing
workstation
>>>>> service for ovirtmgmt.
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call
>>>>> last):
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/vdsm-restore-net-config", line 345, in
<module>
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: restore(args)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/vdsm-restore-net-config", line 314, in
restore
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: unified_restoration()
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/vdsm-restore-net-config", line 93, in
>>>>> unified_restoration
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: setupNetworks(nets, bonds,
>>>>> connectivityCheck=False, _inRollback=True)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/network/api.py", line 642, in
setupNetworks
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: implicitBonding=False,
>>>>> _netinfo=_netinfo)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/network/api.py", line 213, in wrapped
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: ret = func(**attrs)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/network/api.py", line 429, in delNetwork
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: netEnt.remove()
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/network/models.py", line 100, in remove
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool:
self.configurator.removeNic(self)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/network/configurators/ifcfg.py", line
215, in
>>>>> removeNic
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool:
>>>>> self.configApplier.removeNic(nic.name)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>> "/usr/share/vdsm/network/configurators/ifcfg.py", line
657, in
>>>>> removeNic
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: with open(cf) as nicFile:
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: IOError: [Errno 2] No such
file
>>>>> or directory: u'/etc/sysconfig/network-scripts/ifcfg-p4p1'
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: Traceback (most recent call
>>>>> last):
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
"/usr/bin/vdsm-tool", line
>>>>> 219, in main
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: return
>>>>> tool_command[cmd]["command"](*args)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>>
"/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 40,
>>>>> in restore_command
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: exec_restore(cmd)
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: File
>>>>>
"/usr/lib/python2.7/site-packages/vdsm/tool/restore_nets.py", line 53,
>>>>> in exec_restore
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: raise
EnvironmentError('Failed to
>>>>> restore the persisted networks')
>>>>> Sep 3 17:27:28 vhm-prd-02 vdsm-tool: EnvironmentError: Failed to
>>>>> restore the persisted networks
>>>>> Sep 3 17:27:28 vhm-prd-02 systemd: vdsm-network.service: main
process
>>>>> exited, code=exited, status=1/FAILURE
>>>>> Sep 3 17:27:28 vhm-prd-02 systemd: Failed to start Virtual Desktop
>>>>> Server Manager network restoration.
>>>>> Sep 3 17:27:28 vhm-prd-02 systemd: Dependency failed for Virtual
>>>>> Desktop Server Manager.
>>>>> Sep 3 17:27:28 vhm-prd-02 systemd:
>>>>> Sep 3 17:27:28 vhm-prd-02 systemd: Unit vdsm-network.service
entered
>>>>> failed state.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Started Postfix Mail Transport
>>>>> Agent.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Starting Multi-User System.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Reached target Multi-User
System.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Starting Update UTMP about
System
>>>>> Runlevel Changes...
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Starting Stop Read-Ahead Data
>>>>> Collection 10s After Completed Startup.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Started Stop Read-Ahead Data
>>>>> Collection 10s After Completed Startup.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Started Update UTMP about
System
>>>>> Runlevel Changes.
>>>>> Sep 3 17:27:33 vhm-prd-02 systemd: Startup finished in 2.964s
(kernel)
>>>>> + 2.507s (initrd) + 15.996s (userspace) = 21.468s.
>>>>>
>>>>> So, as I have two more hosts, that need updating, I'm happy to
assist
>>>>> in
>>>>> bisecting and debugging this update issue. Suggestions and help are
>>>>> very
>>>>> welcome.
>>>> Thanks for this important report. I assume that calling
>>>>
>>>> vdsClient -s 0 setSafeNetworkConfig
>>>>
>>>> on the host before upgrade would make your problems go away, please do
>>>> not do that yet - your assistence in debugging this further is
>>>> important.
>>> Hi Dan,
>>>
>>> >From backups I could extract the pre-update timestamps of the files in
>>> /var/lib/vdsm/netconfback:
>>> ifcfg-em1 2015-08-10 16:40:19
>>> ifcfg-ovirtmgmt 2015-08-10 16:40:19
>>> ifcfg-p3p1 2015-08-10 16:40:25
>>> ifcfg-p4p1 2015-08-10 16:40:22
>>> route-ovirtmgmt 2015-08-10 16:40:20
>>> rule-ovirtmgmt 2015-08-10 16:40:20
>>>
>>> The ifcfg-scripts had the same corresponding timestamps:
>>> ifcfg-em1 2015-08-10 16:40:19
>>> ifcfg-lo 2015-01-15 09:57:03
>>> ifcfg-ovirtmgmt 2015-08-10 16:40:19
>>> ifcfg-p3p1 2015-08-10 16:40:25
>>> ifcfg-p4p1 2015-08-10 16:40:22
>> Do you recall what has been done on 2015-08-10?
>> Was your 3.5.3 host rebooted ever since?
> I just tried to reconstruct the happings on 2015-08-10 and it seems, that
> in fact
> the network configuration was not touched. I was mislead by the dates. At
> that
> date/time an updated kernel and some more CentOS rpms where updated (the
> whole cluster was updated one by one). A reboot on this specific host was
> initiated after the update at 2015-08-10 16:40:04. The timestamps from my
> previous email seem still to be _within_ the bootup-process. So yes, the
> host
> was rebooted ever since update to 3.5.3 (that happened on 2015-06-15).
>
> Reboots since 2015-06-15:
> reboot system boot 3.10.0-229.11.1. Mon Aug 10 16:56 - 14:34 (27+21:37)
> reboot system boot 3.10.0-229.7.2.e Mon Jul 27 17:48 - 16:53 (13+23:05)
> reboot system boot 3.10.0-229.7.2.e Wed Jun 24 16:46 - 17:46 (33+00:59)
> reboot system boot 3.10.0-229.4.2.e Mon Jun 15 18:10 - 16:44 (8+22:34)
Wrong reboot list. The correct reboots for this host are:
reboot system boot 3.10.0-229.11.1. Thu Sep 3 17:42 - 13:58 (3+20:16)
reboot system boot 3.10.0-229.11.1. Thu Sep 3 17:27 - 17:40 (00:12)
reboot system boot 3.10.0-229.11.1. Mon Aug 10 16:40 - 17:23 (24+00:43)
reboot system boot 3.10.0-229.7.2.e Mon Jul 27 16:52 - 16:33 (13+23:40)
reboot system boot 3.10.0-229.7.2.e Thu Jul 9 11:10 - 16:49 (18+05:38)
reboot system boot 3.10.0-229.4.2.e Wed Jun 17 17:27 - 11:07 (21+17:40)
reboot system boot 3.10.0-229.4.2.e Mon Jun 15 17:22 - 17:23 (2+00:01)
> I checked the 2 remaining hosts (still 3.5.3) and both do not have any
> different
> content in /var/lib/vdsm/netconfback. Again only single line comments:
> # original file did not exist
>
> My other productive oVirt 3.4 hosts don't even have these. The directory
> /var/lib/vdsm/netconfback is empy on those.
>
> What should/could I check on the remaining 2 hosts prior to the update?
> Try syncing the network-configuration and verify the contents in
> /var/lib/vdsm/netconfback?
>
>> If the networks have been configured on the host back then, but never
>> persisted, any reboot (regardless of upgrade) would cause their removal.
>>
>> Vdsm should be more robust in handling missing ifcfg; but that's a
>> second-order bug
>>
>> 1256252 Vdsm should recover ifcfg files in case they are no
>> longer exist and recover all networks on the server
>>
>> I'd like to first understand how come you have these placeholders left
>> behind.
>>
>>> The attached supervdsm.log contains everything from network configuration
>>> done on 2015-08-10 till vdsm update on 2015-09-03 at 17:20 and the reboot
>>> performed afterwards.
>> Thanks. Maybe Ido could find further hints inside it
--
Lobster SCM GmbH, Hindenburgstraße 15, D-82343 Pöcking
HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users