[ovirt-users] Failure during self-hosted deployment: exception configuring management bridge

Antoni Segura Puimedon asegurap at redhat.com
Tue May 13 05:34:55 EDT 2014



----- Original Message -----
> From: "David Sommerseth" <davids at redhat.com>
> To: "Bob Doolittle" <bob at doolittle.us.com>, "Dan Kenigsberg" <danken at redhat.com>, asegurap at redhat.com
> Cc: "users" <users at ovirt.org>
> Sent: Tuesday, May 13, 2014 10:59:47 AM
> Subject: Re: Failure during self-hosted deployment: exception configuring management bridge
> 
> On 13/05/14 00:35, Bob Doolittle wrote:
> > Also - is there a bugID for this new issue?
> > 
> > The one I quoted is supposed to only affect non-existent device names.
> > Why is this affecting valid device names as well, and only in the VDSM
> > context?
> 
> Antonio may correct me here, but I believe it's caused by vdsm using
> libnl-1.x and py-ethtool using libnl3.  We've discovered an issue with
> this combination, where libnl-1.x is able to invalidate the netlink
> socket libnl3 gives py-ethtool; rendering py-ethtool useless.
> 
> This issue is somewhat tracked in this bz:
> <https://bugzilla.redhat.com/show_bug.cgi?id=1078312>
> 
> This is actually quite a delicate issue, as I believe there are some
> fixes in vdsm, py-ethtool have some patches to improve the error
> handling (which should help vdsm too) and we're waiting for an official
> libnl3 update to tackle the socket handling better.
> 
> I have hopes that once the libnl3 fixes gets out, much of this will be
> solved.

In vdsm's ovirt-3.4 branch we have detection of ethtool's version and use the
same libnl version, as seen in:
http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob_plain;f=lib/vdsm/netlink.py;hb=7e306159d5f10f67197d499daa282d3d4c1bef73

    if _ethtool_uses_libnl3():

This looks to me like there might be a python-ethtool 0.9.2 bug for devices
that do not get ipv6 autoconf addresses. I'll investigate.

> 
> 
> David S.
> 
> 
> > On 05/12/2014 06:21 PM, Dan Kenigsberg wrote:
> >> On Mon, May 12, 2014 at 05:53:10PM -0400, Bob Doolittle wrote:
> >>> On 05/12/2014 02:49 PM, Bob Doolittle wrote:
> >>>> Hi,
> >>>>
> >>>> I'm trying to set up a fresh system on F19, using oVirt 3.4.
> >>>>
> >>>> When running hosted-engine --deploy, it fails during "Configuring the
> >>>> management bridge". The ovirt-hosted-engine-setup log shows:
> >>>>
> >>>> 2014-05-12 13:59:35 INFO
> >>>> otopi.plugins.ovirt_hosted_engine_setup.network.bridge bridge._misc:196
> >>>> Configuring the management bridge
> >>>> 2014-05-12 13:59:35 DEBUG otopi.context context._executeMethod:152
> >>>> method
> >>>> exception
> >>>> Traceback (most recent call last):
> >>>>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line
> >>>> 142, in
> >>>> _executeMethod
> >>>>     method['method']()
> >>>>   File
> >>>> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py",
> >>>>
> >>>> line 201, in _misc
> >>>>     ].s.getVdsCapabilities()['info']['nics'][nics]
> >>>> KeyError: 'info'
> >>>> 2014-05-12 13:59:35 ERROR otopi.context context._executeMethod:161
> >>>> Failed
> >>>> to execute stage 'Misc configuration': 'info'
> >>>>
> >>>>
> >>>> The vdsm.log shows:
> >>>>
> >>>> Thread-14::DEBUG::2014-05-12
> >>>> 13:59:35,840::BindingXMLRPC::1067::vds::(wrapper) client
> >>>> [127.0.0.1]::call
> >>>> getCapabilities with () {}
> >>>> Thread-14::DEBUG::2014-05-12 13:59:35,875::utils::642::root::(execCmd)
> >>>> '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
> >>>> Thread-14::DEBUG::2014-05-12 13:59:35,879::utils::662::root::(execCmd)
> >>>> SUCCESS: <err> = ''; <rc> = 0
> >>>> Thread-14::ERROR::2014-05-12
> >>>> 13:59:35,882::BindingXMLRPC::1086::vds::(wrapper) unexpected error
> >>>> Traceback (most recent call last):
> >>>>   File "/usr/share/vdsm/BindingXMLRPC.py", line 1070, in wrapper
> >>>>     res = f(*args, **kwargs)
> >>>>   File "/usr/share/vdsm/BindingXMLRPC.py", line 393, in getCapabilities
> >>>>     ret = api.getCapabilities()
> >>>>   File "/usr/share/vdsm/API.py", line 1185, in getCapabilities
> >>>>     c = caps.get()
> >>>>   File "/usr/share/vdsm/caps.py", line 369, in get
> >>>>     caps.update(netinfo.get())
> >>>>   File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line
> >>>> 566, in
> >>>> get
> >>>>     d['nics'][dev.name] = _nicinfo(dev.name, paddr)
> >>>>   File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line
> >>>> 516, in
> >>>> _nicinfo
> >>>>     info = _devinfo(nic)
> >>>>   File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line
> >>>> 536, in
> >>>> _devinfo
> >>>>     ipv4addr, ipv4netmask, ipv6addrs = getIpInfo(dev)
> >>>>   File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line
> >>>> 317, in
> >>>> getIpInfo
> >>>>     ipv6addrs = devInfo.get_ipv6_addresses()
> >>>> SystemError: error return without exception set
> >>>>
> >>>>
> >>>> I have two NICs - a wireless NIC which is disabled, and an ethernet NIC
> >>>> "p3p1" which is statically configured via network-scripts.
> >>>>
> >>>> I've also attached the output of "ip addr".
> >>>>
> >>>> I also notice some disturbing looking messages in the vdsm log during
> >>>> setupMultipath, including "Panic: Error initializing IRS" and then
> >>>> subsequent lvm-related errors during StorageRefresh. Those did not
> >>>> abort
> >>>> the deployment, however. What do those failures indicate?
> >>> This looks a lot like a new manifestation of:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1057772
> >> Which version of Vdsm are you using? ovirt-3.4.1's vdsm-4.14.7 should
> >> have fixed the that problem.
> >>
> >>> I even instrumented the code in
> >>> /usr/lib64/python2.7/site-packages/vdsm/netinfo.py
> >>>
> >>> The device name ("p3p1") being passed in is correct (I even tried
> >>> setting
> >>> the string directly), but the returned object is empty.
> >>>
> >>> If I start python by hand and run ethtool.get_interfaces_info("p3p1") it
> >>> returns the correct data.
> >>>
> >>> So it seems as though the code is somehow environmentally sensitive.
> >>> I'm not
> >>> sure what it is about my environment that would cause issues here
> >>> however,
> >>> since presumably this is working for others...
> >> I'm afraid this has recently been tickled by a relase of python-ethtool
> >> to Fedora 19.
> > 
> 
> 


More information about the Users mailing list