[ovirt-users] 3.6 looses network on reboot
David LeVene
David.LeVene at blackboard.com
Wed Mar 2 19:54:25 EST 2016
Hi,
Thanks for the quick responses & help.. answers in-line at the end of this email.
Cheers
David
-----Original Message-----
From: Edward Haas [mailto:edwardh at redhat.com]
Sent: Wednesday, March 02, 2016 20:05
To: David LeVene <David.LeVene at blackboard.com>; Dan Kenigsberg <danken at redhat.com>
Cc: users at ovirt.org
Subject: Re: [ovirt-users] 3.6 looses network on reboot
On 03/02/2016 01:36 AM, David LeVene wrote:
> Hi Dan,
>
> I missed the email as the subject line changed!
>
> So we use and run IPv6 in our network - not sure if this is related. The Addresses are handed out via SLAAC so that would be where the IPv6 address is coming from.
>
> My memory is a bit sketchy... but I think if I remove the vmfex/SRIOV vNIC and only run with the one vNIC it works fine, it's when I bring the second NIC into play with SRIOV the issues arise.
>
> Answers inline.
>
> -----Original Message-----
> From: Dan Kenigsberg [mailto:danken at redhat.com]
> Sent: Tuesday, March 01, 2016 00:28
> To: David LeVene <David.LeVene at blackboard.com>
> Cc: edwardh at redhat.com; users at ovirt.org
> Subject: Re: [ovirt-users] 3.6 looses network on reboot
>
> This sounds very bad. Changing the subject, so the wider, more problematic issue is visible.
>
> Did any other user see this behavior?
>
> On Mon, Feb 29, 2016 at 06:27:46AM +0000, David LeVene wrote:
>> Hi Dan,
>>
>> Answers as follows;
>>
>> # rpm -qa | grep -i vdsm
>> vdsm-jsonrpc-4.17.18-1.el7.noarch
>> vdsm-hook-vmfex-4.17.18-1.el7.noarch
>> vdsm-infra-4.17.18-1.el7.noarch
>> vdsm-4.17.18-1.el7.noarch
>> vdsm-python-4.17.18-1.el7.noarch
>> vdsm-yajsonrpc-4.17.18-1.el7.noarch
>> vdsm-cli-4.17.18-1.el7.noarch
>> vdsm-xmlrpc-4.17.18-1.el7.noarch
>> vdsm-hook-vmfex-dev-4.17.18-1.el7.noarch
>>
>>
>> There was in this folder ifcfg-ovirtmgnt bridge setup, and also route-ovirtmgnt & rule-ovirtmgmt.. but they were removed after the reboot.
>>
>> # ls -althr | grep ifcfg
>> -rw-r--r--. 1 root root 254 Sep 16 21:21 ifcfg-lo -rw-r--r--. 1 root
>> root 120 Feb 25 14:07 ifcfg-enp7s0f0 -rw-rw-r--. 1 root root 174
>> Feb
>> 25 14:40 ifcfg-enp6s0
>>
>> I think I modified ifcfg-enp6s0 to get networking up again (eg was set to bridge.. but the bridge wasn't configured).. it was a few days ago.. if it's important I can reboot the box again to see what state it comes up with.
>>
>> # cat ifcfg-enp6s0
>> BOOTPROTO="none"
>> IPADDR="10.80.10.117"
>> NETMASK="255.255.255.0"
>> GATEWAY="10.80.10.1"
>> DEVICE="enp6s0"
>> HWADDR="00:25:b5:00:0b:4f"
>> ONBOOT=yes
>> PEERDNS=yes
>> PEERROUTES=yes
>> MTU=1500
>>
>> # cat ifcfg-enp7s0f0
>> # Generated by VDSM version 4.17.18-1.el7
>> DEVICE=enp7s0f0
>> ONBOOT=yes
>> MTU=1500
>> HWADDR=00:25:b5:00:0b:0f
>> NM_CONTROLLED=no
>>
>> # find /var/lib/vdsm/persistence
>> /var/lib/vdsm/persistence
>> /var/lib/vdsm/persistence/netconf
>> /var/lib/vdsm/persistence/netconf.1456371473833165545
>> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets
>> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt
>>
>> # cat
>> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt
>> {
>> "nic": "enp6s0",
>> "ipaddr": "10.80.10.117",
>> "mtu": "1500",
>> "netmask": "255.255.255.0",
>> "STP": "no",
>> "bridged": "true",
>> "gateway": "10.80.10.1",
>> "defaultRoute": true
>> }
>>
>> Supervdsm log is attached.
>
> Have you editted ifcfg-ovirtmgmt manually?
> Nope
>
> Can you somehow reproduce it, and share its content?
> Yea, I should be able to reproduce it - just gotta fix it first (create the networking manually and get VDSM on-line). Also it’s a side project/investigation at the moment so time isn't on my side...
>
> Would it help if I take an sosreport before and after? I don’t' mine emailing these directly to yourself.
>
> Do you have NetworkManager running? which version?
> NM is disabled, but the version is...
> # rpm -q NetworkManager
> NetworkManager-1.0.6-27.el7.x86_64
> # systemctl status NetworkManager.service ● NetworkManager.service -
> Network Manager
> Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled)
> Active: inactive (dead)
>
> It seems that Vdsm has two bugs: on boot, initscripts end up setting
> an
> ipv6 address that Vdsm never requested
>
> As mentioned above this would have come from SLAAC which we have setup
> in our network
>
> restore-net::INFO::2016-02-25
> 14:14:58,024::vdsm-restore-net-config::261::root::(_find_changed_or_mi
> ssing) ovirtmgmt is different or missing from persistent
> configuration. current: {'nic': 'enp6s0', 'dhcpv6': False, 'ipaddr':
> '10.80.10.117', 'mtu': '1500', 'netmask': '255.255.255.0',
> 'bootproto': 'none', 'stp': False, 'bridged': True, 'ipv6addr':
> ['2400:7d00:110:3:225:b5ff:fe00:b4f/64'], 'gateway': '10.80.10.1',
> 'defaultRoute': True}, persisted: {u'nic': u'enp6s0', 'dhcpv6': False,
> u'ipaddr': u'10.80.10.117', u'mtu': '1500', u'netmask':
> u'255.255.255.0', 'bootproto': 'none', 'stp': False, u'bridged': True,
> u'gateway': u'10.80.10.1', u'defaultRoute': True}
>
>
> Then, Vdsm tries to drop the
> unsolicited address, but fails. Both must be fixed ASAP.
>
> restore-net::ERROR::2016-02-25 14:14:59,490::__init__::58::root::(__exit__) Failed rollback transaction last known good network.
> Traceback (most recent call last):
> File "/usr/share/vdsm/network/api.py", line 918, in setupNetworks
> keep_bridge=keep_bridge)
> File "/usr/share/vdsm/network/api.py", line 222, in wrapped
> ret = func(**attrs)
> File "/usr/share/vdsm/network/api.py", line 502, in _delNetwork
> configurator.removeQoS(net_ent)
> File "/usr/share/vdsm/network/configurators/__init__.py", line 122, in removeQoS
> qos.remove_outbound(top_device)
> File "/usr/share/vdsm/network/configurators/qos.py", line 60, in remove_outbound
> device, pref=_NON_VLANNED_ID if vlan_tag is None else vlan_tag)
> File "/usr/share/vdsm/network/tc/filter.py", line 31, in delete
> _wrapper.process_request(command)
> File "/usr/share/vdsm/network/tc/_wrapper.py", line 38, in process_request
> raise TrafficControlException(retcode, err, command)
> TrafficControlException: (None, 'Message truncated',
> ['/usr/sbin/tc', 'filter', 'del', 'dev', 'enp6s0', 'pref', '5000'])
>
> Regards,
> Dan.
>
Hi David,
You have encountered two issues, the first with IPv6, which we do not fully support in 3.6 and a the second with an unmanaged failure during network setup on boot.
We are going to back-port both fixes very soon.
Can you check our patches? They should resolve the problem we saw in the
log: https://gerrit.ovirt.org/#/c/54237 (based on oVirt-3.6.3)
-- I've manually applied the patch to the node that I was testing on and the networking comes on-line correctly - now I'm encountering a gluster issue with cannot find master domain.
Without the fixes, as a workaround, I would suggest (if possible) to disable IPv6 on your host boot line and check if all works out for you.
-- Ok, but as I can manually apply the patch its good now. Do you know what version are we hoping to have this put into as I won't perform an ovirt/vdsm update until its part of the upstream RPM's
Do you need IPv6 connectivity? If so, you'll need to use a vdsm hook or another interface that is not controlled by oVirt.
-- Ideally I'd prefer not to have it, but the way our network has been configured some hosts are IPv6 only, so at a min the guests need it.. the hypervisors not so much.
-- I've now hit an issue with it not starting up the master storage gluster domain - as it’s a separate issue I'll review the mailing lists & create a new item if its related.. I've attached the supervdsm.log incase you can save me some time and point me in the right direction!
Thanks
Edy.
This email and any attachments may contain confidential and proprietary information of Blackboard that is for the sole use of the intended recipient. If you are not the intended recipient, disclosure, copying, re-distribution or other use of any of this information is strictly prohibited. Please immediately notify the sender and delete this transmission if you received this email in error.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: supervdsm.zip
Type: application/x-zip-compressed
Size: 58005 bytes
Desc: supervdsm.zip
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160303/193d9ab0/attachment-0001.bin>
More information about the Users
mailing list