
On Thu, Mar 3, 2016 at 2:54 AM, David LeVene <David.LeVene@blackboard.com> wrote:
Hi,
Thanks for the quick responses & help.. answers in-line at the end of this email.
Cheers David
-----Original Message----- From: Edward Haas [mailto:edwardh@redhat.com] Sent: Wednesday, March 02, 2016 20:05 To: David LeVene <David.LeVene@blackboard.com>; Dan Kenigsberg < danken@redhat.com> Cc: users@ovirt.org Subject: Re: [ovirt-users] 3.6 looses network on reboot
Hi Dan,
I missed the email as the subject line changed!
So we use and run IPv6 in our network - not sure if this is related. The Addresses are handed out via SLAAC so that would be where the IPv6 address is coming from.
My memory is a bit sketchy... but I think if I remove the vmfex/SRIOV vNIC and only run with the one vNIC it works fine, it's when I bring the second NIC into play with SRIOV the issues arise.
Answers inline.
-----Original Message----- From: Dan Kenigsberg [mailto:danken@redhat.com] Sent: Tuesday, March 01, 2016 00:28 To: David LeVene <David.LeVene@blackboard.com> Cc: edwardh@redhat.com; users@ovirt.org Subject: Re: [ovirt-users] 3.6 looses network on reboot
This sounds very bad. Changing the subject, so the wider, more
On 03/02/2016 01:36 AM, David LeVene wrote: problematic issue is visible.
Did any other user see this behavior?
On Mon, Feb 29, 2016 at 06:27:46AM +0000, David LeVene wrote:
Hi Dan,
Answers as follows;
# rpm -qa | grep -i vdsm vdsm-jsonrpc-4.17.18-1.el7.noarch vdsm-hook-vmfex-4.17.18-1.el7.noarch vdsm-infra-4.17.18-1.el7.noarch vdsm-4.17.18-1.el7.noarch vdsm-python-4.17.18-1.el7.noarch vdsm-yajsonrpc-4.17.18-1.el7.noarch vdsm-cli-4.17.18-1.el7.noarch vdsm-xmlrpc-4.17.18-1.el7.noarch vdsm-hook-vmfex-dev-4.17.18-1.el7.noarch
There was in this folder ifcfg-ovirtmgnt bridge setup, and also
# ls -althr | grep ifcfg -rw-r--r--. 1 root root 254 Sep 16 21:21 ifcfg-lo -rw-r--r--. 1 root root 120 Feb 25 14:07 ifcfg-enp7s0f0 -rw-rw-r--. 1 root root 174 Feb 25 14:40 ifcfg-enp6s0
I think I modified ifcfg-enp6s0 to get networking up again (eg was set
to bridge.. but the bridge wasn't configured).. it was a few days ago.. if it's important I can reboot the box again to see what state it comes up with.
# cat ifcfg-enp6s0 BOOTPROTO="none" IPADDR="10.80.10.117" NETMASK="255.255.255.0" GATEWAY="10.80.10.1" DEVICE="enp6s0" HWADDR="00:25:b5:00:0b:4f" ONBOOT=yes PEERDNS=yes PEERROUTES=yes MTU=1500
# cat ifcfg-enp7s0f0 # Generated by VDSM version 4.17.18-1.el7 DEVICE=enp7s0f0 ONBOOT=yes MTU=1500 HWADDR=00:25:b5:00:0b:0f NM_CONTROLLED=no
# find /var/lib/vdsm/persistence /var/lib/vdsm/persistence /var/lib/vdsm/persistence/netconf /var/lib/vdsm/persistence/netconf.1456371473833165545 /var/lib/vdsm/persistence/netconf.1456371473833165545/nets /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt
# cat /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt { "nic": "enp6s0", "ipaddr": "10.80.10.117", "mtu": "1500", "netmask": "255.255.255.0", "STP": "no", "bridged": "true", "gateway": "10.80.10.1", "defaultRoute": true }
Supervdsm log is attached.
Have you editted ifcfg-ovirtmgmt manually? Nope
Can you somehow reproduce it, and share its content? Yea, I should be able to reproduce it - just gotta fix it first (create
route-ovirtmgnt & rule-ovirtmgmt.. but they were removed after the reboot. the networking manually and get VDSM on-line). Also it’s a side project/investigation at the moment so time isn't on my side...
Would it help if I take an sosreport before and after? I don’t' mine
emailing these directly to yourself.
Do you have NetworkManager running? which version? NM is disabled, but the version is... # rpm -q NetworkManager NetworkManager-1.0.6-27.el7.x86_64 # systemctl status NetworkManager.service ● NetworkManager.service - Network Manager Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service;
Active: inactive (dead)
It seems that Vdsm has two bugs: on boot, initscripts end up setting an ipv6 address that Vdsm never requested
As mentioned above this would have come from SLAAC which we have setup in our network
restore-net::INFO::2016-02-25 14:14:58,024::vdsm-restore-net-config::261::root::(_find_changed_or_mi ssing) ovirtmgmt is different or missing from persistent configuration. current: {'nic': 'enp6s0', 'dhcpv6': False, 'ipaddr': '10.80.10.117', 'mtu': '1500', 'netmask': '255.255.255.0', 'bootproto': 'none', 'stp': False, 'bridged': True, 'ipv6addr': ['2400:7d00:110:3:225:b5ff:fe00:b4f/64'], 'gateway': '10.80.10.1', 'defaultRoute': True}, persisted: {u'nic': u'enp6s0', 'dhcpv6': False, u'ipaddr': u'10.80.10.117', u'mtu': '1500', u'netmask': u'255.255.255.0', 'bootproto': 'none', 'stp': False, u'bridged': True, u'gateway': u'10.80.10.1', u'defaultRoute': True}
Then, Vdsm tries to drop the unsolicited address, but fails. Both must be fixed ASAP.
restore-net::ERROR::2016-02-25 14:14:59,490::__init__::58::root::(__exit__) Failed rollback transaction last known good network. Traceback (most recent call last): File "/usr/share/vdsm/network/api.py", line 918, in setupNetworks keep_bridge=keep_bridge) File "/usr/share/vdsm/network/api.py", line 222, in wrapped ret = func(**attrs) File "/usr/share/vdsm/network/api.py", line 502, in _delNetwork configurator.removeQoS(net_ent) File "/usr/share/vdsm/network/configurators/__init__.py", line 122, in removeQoS qos.remove_outbound(top_device) File "/usr/share/vdsm/network/configurators/qos.py", line 60, in remove_outbound device, pref=_NON_VLANNED_ID if vlan_tag is None else vlan_tag) File "/usr/share/vdsm/network/tc/filter.py", line 31, in delete _wrapper.process_request(command) File "/usr/share/vdsm/network/tc/_wrapper.py", line 38, in
disabled; vendor preset: enabled) process_request
raise TrafficControlException(retcode, err, command) TrafficControlException: (None, 'Message truncated', ['/usr/sbin/tc', 'filter', 'del', 'dev', 'enp6s0', 'pref', '5000'])
Regards, Dan.
Hi David,
You have encountered two issues, the first with IPv6, which we do not fully support in 3.6 and a the second with an unmanaged failure during network setup on boot. We are going to back-port both fixes very soon.
Can you check our patches? They should resolve the problem we saw in the log: https://gerrit.ovirt.org/#/c/54237 (based on oVirt-3.6.3)
-- I've manually applied the patch to the node that I was testing on and the networking comes on-line correctly - now I'm encountering a gluster issue with cannot find master domain.
Please attach vdsm logs showing gluster connection attempts. You should have also interesting logs in /var/log/glusterfs/ - there should be a log for each gluster connection (server:/path). Nir