[ovirt-users] VDSM Network Bug or Feature?
Dan Kenigsberg
danken at redhat.com
Thu Apr 16 10:12:08 UTC 2015
On Sun, Apr 05, 2015 at 01:19:43PM +0200, ml at ohnewald.net wrote:
>
> Am 25.03.2015 um 15:34 schrieb Dan Kenigsberg:
> >On Wed, Mar 25, 2015 at 10:31:14AM +0100, ml at ohnewald.net wrote:
> >>Hello List,
> >>
> >>i think i found a nasty Bug (or feature) of ovirt.
> >>
> >>One of my network cards was set up with dhcp. At this specific time there
> >>was not yet a dhcp server set up which could respond to dhcp requests.
> >>
> >>Therefore my network interface was not able to obtain an ip address. This
> >>„failure“ leaded to that my ovirtmgnt bride would not get startet.
> >>
> >>__Maybe__ because ovirtmgmt is alpha numeric after dbvlan116? Because all my
> >>bonding interfaces bond0 and bond1 started just fine.
> >>
> >>I was able to solve it by moving my /sbin/dhclient to /sbin/dhclient.backup
> >>and creating a dummy exit0 bash script as /sbin/dhclient.
> >>
> >>Then the network startup process seems to progress to my ovirtmgmt
> >>interface. From now on i was able to connect and manage my host again and to
> >>set up my dbvlan116 interface from dhcp to none.
> >>
> >>
> >>Here is the process list it seems to loop in:
> >>
> >>
> >>root 2554 0.0 0.0 115612 1988 ? S< 10:06 0:00 /bin/bash
> >>/etc/sysconfig/network-scripts/ifup-eth ifcfg-dbvlan116
> >>root 2594 0.0 0.0 104208 15620 ? S< 10:06 0:00
> >>/sbin/dhclient -H ovirt-node06-stgt -1 -q -lf
> >>/var/lib/dhclient/dhclient--dbvlan116.lease -pf /var/run/
> >>root 32047 0.0 0.0 115348 1676 ? S<s 10:06 0:00 /bin/sh
> >>/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
> >>root 32142 1.5 0.0 348460 24952 ? S< 10:06 0:00
> >>/usr/bin/python /usr/share/vdsm/vdsm-restore-net-config
> >>
> >>
> >>Just killing the dhclient does not seem to work. It keeps retrying.
> >>
> >>
> >>I reported a bug before, but maybe its better to discuss it here first and
> >>explain the bug properly to that the Bugtracker guys know what i mean and
> >>what the problem is? :)
> >Good. But could you share the bug number?
>
> I have not created a bug yet.
> >
> >>Maybe its best to start the ovirtmgmt interface first? Otherwise a wrong
> >>configured interface will lock you out of the system.
> >>
> >I don't think I understood what is the bug, and when does it show up.
> >Let's start with the basics. Which platform are you using? el6? el7?
> CentOS7 + EL7
> >- do you have NetworkManager or firewalld running?
> No
> >- Which vdsm version are you using?
>
> vdsm-jsonrpc-4.16.10-8.gitc937927.el7.noarch
> vdsm-yajsonrpc-4.16.10-8.gitc937927.el7.noarch
> vdsm-python-zombiereaper-4.16.10-8.gitc937927.el7.noarch
> vdsm-cli-4.16.10-8.gitc937927.el7.noarch
> vdsm-python-4.16.10-8.gitc937927.el7.noarch
> vdsm-4.16.10-8.gitc937927.el7.x86_64
> vdsm-gluster-4.16.10-8.gitc937927.el7.noarch
> vdsm-xmlrpc-4.16.10-8.gitc937927.el7.noarch
>
> >- How did you configure the networks? From Engine? Manually?
> From Engine.
> >- Can you share your /var/lib/vdsm/persistence/netconf
> find /var/lib/vdsm/persistence/netconf/
> /var/lib/vdsm/persistence/netconf/
> /var/lib/vdsm/persistence/netconf/bonds
> /var/lib/vdsm/persistence/netconf/bonds/bond1
> /var/lib/vdsm/persistence/netconf/bonds/bond0
> /var/lib/vdsm/persistence/netconf/nets
> /var/lib/vdsm/persistence/netconf/nets/dbvlan116
> /var/lib/vdsm/persistence/netconf/nets/san5nach7
> /var/lib/vdsm/persistence/netconf/nets/san5nach6
> /var/lib/vdsm/persistence/netconf/nets/vlan111
> /var/lib/vdsm/persistence/netconf/nets/ovirtmgmt
>
>
>
> cat /var/lib/vdsm/persistence/netconf/bonds/bond1
> {"nics": ["enp5s0f0", "enp5s0f1"], "options": "mode=0 miimon=100"}
>
>
> cat /var/lib/vdsm/persistence/netconf/bonds/bond0
> {"nics": ["enp3s0f0", "enp3s0f1"], "options": "mode=0 miimon=100"}
>
>
>
> cat /var/lib/vdsm/persistence/netconf/nets/dbvlan116 => this was set to
> DHCP
> {"nic": "enp7s0f1", "vlan": "116", "STP": "no", "bridged": "true", "mtu":
> "1500"}[
>
> cat /var/lib/vdsm/persistence/netconf/nets/san5nach7
> {"bondingOptions": "mode=0 miimon=100", "ipaddr": "10.10.3.5", "bonding":
> "bond1", "mtu": "9000", "netmask": "255.255.255.0", "STP": "no", "bridged":
> "true"}
>
>
> cat /var/lib/vdsm/persistence/netconf/nets/san5nach6
> {"bondingOptions": "mode=0 miimon=100", "ipaddr": "10.10.1.5", "bonding":
> "bond0", "mtu": "9000", "netmask": "255.255.255.0", "bridged": "false"}
>
> cat /var/lib/vdsm/persistence/netconf/nets/ovirtmgmt
> {"nic": "enp7s0f0", "ipaddr": "192.168.43.124", "mtu": "1500", "netmask":
> "255.255.255.0", "STP": "no", "bridged": "true"}
>
> >
> >Do you say that `service vdsm start` hangs forever?
>
> What im am saying is:
>
> IF a interface A is set up with DHCP (which does not get an ip address for
> whatever reason) then it will not move on to interface B.
>
>
> In my case:
> ===========
> IF dbvlan116 does not get a DHCP response, it will NOT move on and bring
> up my ovirtmgmt interface.
>
> However:
> ===========
> My bond0+1 interfaces were there.
>
>
> I guess this is because it starts with the B* (as in bond) interfaces, then
> moves on to my D* (as in dbvlan) interfaces and then the rest of the
> alphanummeric chain...
>
> I hope i was able to explain it well enough.
Could you share your supervdsm.log during the restoration attempt?
Only recently did Ido change restore to be synchronous, i.e. to wait
until dhcp success/failure https://gerrit.ovirt.org/39381 so I do not
understand what causes your perpetual blockage.
>
> I think the managment interface should always start first, otherwise you are
> not able to correct configurations problems like this.
You are right; and actually it's the first bullet on
http://www.ovirt.org/Vdsm_TODO#Networking
More information about the Users
mailing list