[ovirt-users] 3.6 looses network on reboot
Nir Soffer
nsoffer at redhat.com
Thu Mar 3 07:06:11 UTC 2016
On Thu, Mar 3, 2016 at 2:54 AM, David LeVene <David.LeVene at blackboard.com>
wrote:
> Hi,
>
> Thanks for the quick responses & help.. answers in-line at the end of this
> email.
>
> Cheers
> David
>
> -----Original Message-----
> From: Edward Haas [mailto:edwardh at redhat.com]
> Sent: Wednesday, March 02, 2016 20:05
> To: David LeVene <David.LeVene at blackboard.com>; Dan Kenigsberg <
> danken at redhat.com>
> Cc: users at ovirt.org
> Subject: Re: [ovirt-users] 3.6 looses network on reboot
>
> On 03/02/2016 01:36 AM, David LeVene wrote:
> > Hi Dan,
> >
> > I missed the email as the subject line changed!
> >
> > So we use and run IPv6 in our network - not sure if this is related. The
> Addresses are handed out via SLAAC so that would be where the IPv6 address
> is coming from.
> >
> > My memory is a bit sketchy... but I think if I remove the vmfex/SRIOV
> vNIC and only run with the one vNIC it works fine, it's when I bring the
> second NIC into play with SRIOV the issues arise.
> >
> > Answers inline.
> >
> > -----Original Message-----
> > From: Dan Kenigsberg [mailto:danken at redhat.com]
> > Sent: Tuesday, March 01, 2016 00:28
> > To: David LeVene <David.LeVene at blackboard.com>
> > Cc: edwardh at redhat.com; users at ovirt.org
> > Subject: Re: [ovirt-users] 3.6 looses network on reboot
> >
> > This sounds very bad. Changing the subject, so the wider, more
> problematic issue is visible.
> >
> > Did any other user see this behavior?
> >
> > On Mon, Feb 29, 2016 at 06:27:46AM +0000, David LeVene wrote:
> >> Hi Dan,
> >>
> >> Answers as follows;
> >>
> >> # rpm -qa | grep -i vdsm
> >> vdsm-jsonrpc-4.17.18-1.el7.noarch
> >> vdsm-hook-vmfex-4.17.18-1.el7.noarch
> >> vdsm-infra-4.17.18-1.el7.noarch
> >> vdsm-4.17.18-1.el7.noarch
> >> vdsm-python-4.17.18-1.el7.noarch
> >> vdsm-yajsonrpc-4.17.18-1.el7.noarch
> >> vdsm-cli-4.17.18-1.el7.noarch
> >> vdsm-xmlrpc-4.17.18-1.el7.noarch
> >> vdsm-hook-vmfex-dev-4.17.18-1.el7.noarch
> >>
> >>
> >> There was in this folder ifcfg-ovirtmgnt bridge setup, and also
> route-ovirtmgnt & rule-ovirtmgmt.. but they were removed after the reboot.
> >>
> >> # ls -althr | grep ifcfg
> >> -rw-r--r--. 1 root root 254 Sep 16 21:21 ifcfg-lo -rw-r--r--. 1 root
> >> root 120 Feb 25 14:07 ifcfg-enp7s0f0 -rw-rw-r--. 1 root root 174
> >> Feb
> >> 25 14:40 ifcfg-enp6s0
> >>
> >> I think I modified ifcfg-enp6s0 to get networking up again (eg was set
> to bridge.. but the bridge wasn't configured).. it was a few days ago.. if
> it's important I can reboot the box again to see what state it comes up
> with.
> >>
> >> # cat ifcfg-enp6s0
> >> BOOTPROTO="none"
> >> IPADDR="10.80.10.117"
> >> NETMASK="255.255.255.0"
> >> GATEWAY="10.80.10.1"
> >> DEVICE="enp6s0"
> >> HWADDR="00:25:b5:00:0b:4f"
> >> ONBOOT=yes
> >> PEERDNS=yes
> >> PEERROUTES=yes
> >> MTU=1500
> >>
> >> # cat ifcfg-enp7s0f0
> >> # Generated by VDSM version 4.17.18-1.el7
> >> DEVICE=enp7s0f0
> >> ONBOOT=yes
> >> MTU=1500
> >> HWADDR=00:25:b5:00:0b:0f
> >> NM_CONTROLLED=no
> >>
> >> # find /var/lib/vdsm/persistence
> >> /var/lib/vdsm/persistence
> >> /var/lib/vdsm/persistence/netconf
> >> /var/lib/vdsm/persistence/netconf.1456371473833165545
> >> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets
> >> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt
> >>
> >> # cat
> >> /var/lib/vdsm/persistence/netconf.1456371473833165545/nets/ovirtmgmt
> >> {
> >> "nic": "enp6s0",
> >> "ipaddr": "10.80.10.117",
> >> "mtu": "1500",
> >> "netmask": "255.255.255.0",
> >> "STP": "no",
> >> "bridged": "true",
> >> "gateway": "10.80.10.1",
> >> "defaultRoute": true
> >> }
> >>
> >> Supervdsm log is attached.
> >
> > Have you editted ifcfg-ovirtmgmt manually?
> > Nope
> >
> > Can you somehow reproduce it, and share its content?
> > Yea, I should be able to reproduce it - just gotta fix it first (create
> the networking manually and get VDSM on-line). Also it’s a side
> project/investigation at the moment so time isn't on my side...
> >
> > Would it help if I take an sosreport before and after? I don’t' mine
> emailing these directly to yourself.
> >
> > Do you have NetworkManager running? which version?
> > NM is disabled, but the version is...
> > # rpm -q NetworkManager
> > NetworkManager-1.0.6-27.el7.x86_64
> > # systemctl status NetworkManager.service ● NetworkManager.service -
> > Network Manager
> > Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service;
> disabled; vendor preset: enabled)
> > Active: inactive (dead)
> >
> > It seems that Vdsm has two bugs: on boot, initscripts end up setting
> > an
> > ipv6 address that Vdsm never requested
> >
> > As mentioned above this would have come from SLAAC which we have setup
> > in our network
> >
> > restore-net::INFO::2016-02-25
> > 14:14:58,024::vdsm-restore-net-config::261::root::(_find_changed_or_mi
> > ssing) ovirtmgmt is different or missing from persistent
> > configuration. current: {'nic': 'enp6s0', 'dhcpv6': False, 'ipaddr':
> > '10.80.10.117', 'mtu': '1500', 'netmask': '255.255.255.0',
> > 'bootproto': 'none', 'stp': False, 'bridged': True, 'ipv6addr':
> > ['2400:7d00:110:3:225:b5ff:fe00:b4f/64'], 'gateway': '10.80.10.1',
> > 'defaultRoute': True}, persisted: {u'nic': u'enp6s0', 'dhcpv6': False,
> > u'ipaddr': u'10.80.10.117', u'mtu': '1500', u'netmask':
> > u'255.255.255.0', 'bootproto': 'none', 'stp': False, u'bridged': True,
> > u'gateway': u'10.80.10.1', u'defaultRoute': True}
> >
> >
> > Then, Vdsm tries to drop the
> > unsolicited address, but fails. Both must be fixed ASAP.
> >
> > restore-net::ERROR::2016-02-25
> 14:14:59,490::__init__::58::root::(__exit__) Failed rollback transaction
> last known good network.
> > Traceback (most recent call last):
> > File "/usr/share/vdsm/network/api.py", line 918, in setupNetworks
> > keep_bridge=keep_bridge)
> > File "/usr/share/vdsm/network/api.py", line 222, in wrapped
> > ret = func(**attrs)
> > File "/usr/share/vdsm/network/api.py", line 502, in _delNetwork
> > configurator.removeQoS(net_ent)
> > File "/usr/share/vdsm/network/configurators/__init__.py", line
> 122, in removeQoS
> > qos.remove_outbound(top_device)
> > File "/usr/share/vdsm/network/configurators/qos.py", line 60, in
> remove_outbound
> > device, pref=_NON_VLANNED_ID if vlan_tag is None else vlan_tag)
> > File "/usr/share/vdsm/network/tc/filter.py", line 31, in delete
> > _wrapper.process_request(command)
> > File "/usr/share/vdsm/network/tc/_wrapper.py", line 38, in
> process_request
> > raise TrafficControlException(retcode, err, command)
> > TrafficControlException: (None, 'Message truncated',
> > ['/usr/sbin/tc', 'filter', 'del', 'dev', 'enp6s0', 'pref', '5000'])
> >
> > Regards,
> > Dan.
> >
>
> Hi David,
>
> You have encountered two issues, the first with IPv6, which we do not
> fully support in 3.6 and a the second with an unmanaged failure during
> network setup on boot.
> We are going to back-port both fixes very soon.
>
> Can you check our patches? They should resolve the problem we saw in the
> log: https://gerrit.ovirt.org/#/c/54237 (based on oVirt-3.6.3)
>
> -- I've manually applied the patch to the node that I was testing on and
> the networking comes on-line correctly - now I'm encountering a gluster
> issue with cannot find master domain.
>
Please attach vdsm logs showing gluster connection attempts.
You should have also interesting logs in /var/log/glusterfs/ - there should
be a log for each
gluster connection (server:/path).
Nir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160303/66c7c0a6/attachment-0001.html>
More information about the Users
mailing list