
----- Original Message -----
From: "David Sommerseth" <davids@redhat.com> To: "Bob Doolittle" <bob@doolittle.us.com>, "Dan Kenigsberg" <danken@redhat.com>, asegurap@redhat.com Cc: "users" <users@ovirt.org> Sent: Tuesday, May 13, 2014 10:59:47 AM Subject: Re: Failure during self-hosted deployment: exception configuring management bridge
On 13/05/14 00:35, Bob Doolittle wrote:
Also - is there a bugID for this new issue?
The one I quoted is supposed to only affect non-existent device names. Why is this affecting valid device names as well, and only in the VDSM context?
Antonio may correct me here, but I believe it's caused by vdsm using libnl-1.x and py-ethtool using libnl3. We've discovered an issue with this combination, where libnl-1.x is able to invalidate the netlink socket libnl3 gives py-ethtool; rendering py-ethtool useless.
This issue is somewhat tracked in this bz: <https://bugzilla.redhat.com/show_bug.cgi?id=1078312>
This is actually quite a delicate issue, as I believe there are some fixes in vdsm, py-ethtool have some patches to improve the error handling (which should help vdsm too) and we're waiting for an official libnl3 update to tackle the socket handling better.
I have hopes that once the libnl3 fixes gets out, much of this will be solved.
In vdsm's ovirt-3.4 branch we have detection of ethtool's version and use the same libnl version, as seen in: http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob_plain;f=lib/vdsm/netlink.py... if _ethtool_uses_libnl3(): This looks to me like there might be a python-ethtool 0.9.2 bug for devices that do not get ipv6 autoconf addresses. I'll investigate.
David S.
On 05/12/2014 06:21 PM, Dan Kenigsberg wrote:
On Mon, May 12, 2014 at 05:53:10PM -0400, Bob Doolittle wrote:
On 05/12/2014 02:49 PM, Bob Doolittle wrote:
Hi,
I'm trying to set up a fresh system on F19, using oVirt 3.4.
When running hosted-engine --deploy, it fails during "Configuring the management bridge". The ovirt-hosted-engine-setup log shows:
2014-05-12 13:59:35 INFO otopi.plugins.ovirt_hosted_engine_setup.network.bridge bridge._misc:196 Configuring the management bridge 2014-05-12 13:59:35 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py",
line 201, in _misc ].s.getVdsCapabilities()['info']['nics'][nics] KeyError: 'info' 2014-05-12 13:59:35 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Misc configuration': 'info'
The vdsm.log shows:
Thread-14::DEBUG::2014-05-12 13:59:35,840::BindingXMLRPC::1067::vds::(wrapper) client [127.0.0.1]::call getCapabilities with () {} Thread-14::DEBUG::2014-05-12 13:59:35,875::utils::642::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None) Thread-14::DEBUG::2014-05-12 13:59:35,879::utils::662::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-14::ERROR::2014-05-12 13:59:35,882::BindingXMLRPC::1086::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 1070, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 393, in getCapabilities ret = api.getCapabilities() File "/usr/share/vdsm/API.py", line 1185, in getCapabilities c = caps.get() File "/usr/share/vdsm/caps.py", line 369, in get caps.update(netinfo.get()) File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 566, in get d['nics'][dev.name] = _nicinfo(dev.name, paddr) File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 516, in _nicinfo info = _devinfo(nic) File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 536, in _devinfo ipv4addr, ipv4netmask, ipv6addrs = getIpInfo(dev) File "/usr/lib64/python2.7/site-packages/vdsm/netinfo.py", line 317, in getIpInfo ipv6addrs = devInfo.get_ipv6_addresses() SystemError: error return without exception set
I have two NICs - a wireless NIC which is disabled, and an ethernet NIC "p3p1" which is statically configured via network-scripts.
I've also attached the output of "ip addr".
I also notice some disturbing looking messages in the vdsm log during setupMultipath, including "Panic: Error initializing IRS" and then subsequent lvm-related errors during StorageRefresh. Those did not abort the deployment, however. What do those failures indicate? This looks a lot like a new manifestation of: https://bugzilla.redhat.com/show_bug.cgi?id=1057772 Which version of Vdsm are you using? ovirt-3.4.1's vdsm-4.14.7 should have fixed the that problem.
I even instrumented the code in /usr/lib64/python2.7/site-packages/vdsm/netinfo.py
The device name ("p3p1") being passed in is correct (I even tried setting the string directly), but the returned object is empty.
If I start python by hand and run ethtool.get_interfaces_info("p3p1") it returns the correct data.
So it seems as though the code is somehow environmentally sensitive. I'm not sure what it is about my environment that would cause issues here however, since presumably this is working for others... I'm afraid this has recently been tickled by a relase of python-ethtool to Fedora 19.