From: "David Sommerseth" <davids(a)redhat.com>
To: "Bob Doolittle" <bob(a)doolittle.us.com>, "Dan Kenigsberg"
<danken(a)redhat.com>, asegurap(a)redhat.com
Cc: "users" <users(a)ovirt.org>
Sent: Tuesday, May 13, 2014 10:59:47 AM
Subject: Re: Failure during self-hosted deployment: exception configuring management
bridge
On 13/05/14 00:35, Bob Doolittle wrote:
> Also - is there a bugID for this new issue?
>
> The one I quoted is supposed to only affect non-existent device names.
> Why is this affecting valid device names as well, and only in the VDSM
> context?
Antonio may correct me here, but I believe it's caused by vdsm using
libnl-1.x and py-ethtool using libnl3. We've discovered an issue with
this combination, where libnl-1.x is able to invalidate the netlink
socket libnl3 gives py-ethtool; rendering py-ethtool useless.
This issue is somewhat tracked in this bz:
<
https://bugzilla.redhat.com/show_bug.cgi?id=1078312>
This is actually quite a delicate issue, as I believe there are some
fixes in vdsm, py-ethtool have some patches to improve the error
handling (which should help vdsm too) and we're waiting for an official
libnl3 update to tackle the socket handling better.
I have hopes that once the libnl3 fixes gets out, much of this will be
solved.
In vdsm's ovirt-3.4 branch we have detection of ethtool's version and use the
same libnl version, as seen in:
if _ethtool_uses_libnl3():
This looks to me like there might be a python-ethtool 0.9.2 bug for devices
that do not get ipv6 autoconf addresses. I'll investigate.
David S.
> On 05/12/2014 06:21 PM, Dan Kenigsberg wrote:
>> On Mon, May 12, 2014 at 05:53:10PM -0400, Bob Doolittle wrote:
>>> On 05/12/2014 02:49 PM, Bob Doolittle wrote:
>>>> Hi,
>>>>
>>>> I'm trying to set up a fresh system on F19, using oVirt 3.4.
>>>>
>>>> When running hosted-engine --deploy, it fails during "Configuring
the
>>>> management bridge". The ovirt-hosted-engine-setup log shows:
>>>>
>>>> 2014-05-12 13:59:35 INFO
>>>> otopi.plugins.ovirt_hosted_engine_setup.network.bridge bridge._misc:196
>>>> Configuring the management bridge
>>>> 2014-05-12 13:59:35 DEBUG otopi.context context._executeMethod:152
>>>> method
>>>> exception
>>>> Traceback (most recent call last):
>>>> File "/usr/lib/python2.7/site-packages/otopi/context.py",
line
>>>> 142, in
>>>> _executeMethod
>>>> method['method']()
>>>> File
>>>>
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py",
>>>>
>>>> line 201, in _misc
>>>> ].s.getVdsCapabilities()['info']['nics'][nics]
>>>> KeyError: 'info'
>>>> 2014-05-12 13:59:35 ERROR otopi.context context._executeMethod:161
>>>> Failed
>>>> to execute stage 'Misc configuration': 'info'
>>>>
>>>>
>>>> The vdsm.log shows:
>>>>
>>>> Thread-14::DEBUG::2014-05-12
>>>> 13:59:35,840::BindingXMLRPC::1067::vds::(wrapper) client
>>>> [127.0.0.1]::call
>>>> getCapabilities with () {}
>>>> Thread-14::DEBUG::2014-05-12 13:59:35,875::utils::642::root::(execCmd)
>>>> '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
>>>> Thread-14::DEBUG::2014-05-12 13:59:35,879::utils::662::root::(execCmd)
>>>> SUCCESS: <err> = ''; <rc> = 0
>>>> Thread-14::ERROR::2014-05-12
>>>> 13:59:35,882::BindingXMLRPC::1086::vds::(wrapper) unexpected error
>>>> Traceback (most recent call last):
>>>> File "/usr/share/vdsm/BindingXMLRPC.py", line 1070, in
wrapper
>>>> res = f(*args, **kwargs)
>>>> File "/usr/share/vdsm/BindingXMLRPC.py", line 393, in
getCapabilities
>>>> ret = api.getCapabilities()
>>>> File "/usr/share/vdsm/API.py", line 1185, in
getCapabilities
>>>> c = caps.get()
>>>> File "/usr/share/vdsm/caps.py", line 369, in get
>>>> caps.update(netinfo.get())
>>>> File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py",
line
>>>> 566, in
>>>> get
>>>> d['nics'][dev.name] = _nicinfo(dev.name, paddr)
>>>> File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py",
line
>>>> 516, in
>>>> _nicinfo
>>>> info = _devinfo(nic)
>>>> File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py",
line
>>>> 536, in
>>>> _devinfo
>>>> ipv4addr, ipv4netmask, ipv6addrs = getIpInfo(dev)
>>>> File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py",
line
>>>> 317, in
>>>> getIpInfo
>>>> ipv6addrs = devInfo.get_ipv6_addresses()
>>>> SystemError: error return without exception set
>>>>
>>>>
>>>> I have two NICs - a wireless NIC which is disabled, and an ethernet NIC
>>>> "p3p1" which is statically configured via network-scripts.
>>>>
>>>> I've also attached the output of "ip addr".
>>>>
>>>> I also notice some disturbing looking messages in the vdsm log during
>>>> setupMultipath, including "Panic: Error initializing IRS" and
then
>>>> subsequent lvm-related errors during StorageRefresh. Those did not
>>>> abort
>>>> the deployment, however. What do those failures indicate?
>>> This looks a lot like a new manifestation of:
>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1057772
>> Which version of Vdsm are you using? ovirt-3.4.1's vdsm-4.14.7 should
>> have fixed the that problem.
>>
>>> I even instrumented the code in
>>> /usr/lib64/python2.7/site-packages/vdsm/netinfo.py
>>>
>>> The device name ("p3p1") being passed in is correct (I even tried
>>> setting
>>> the string directly), but the returned object is empty.
>>>
>>> If I start python by hand and run
ethtool.get_interfaces_info("p3p1") it
>>> returns the correct data.
>>>
>>> So it seems as though the code is somehow environmentally sensitive.
>>> I'm not
>>> sure what it is about my environment that would cause issues here
>>> however,
>>> since presumably this is working for others...
>> I'm afraid this has recently been tickled by a relase of python-ethtool
>> to Fedora 19.
>