feature suggestion: initial generation of management network

Moti Asayag masayag at redhat.com
Sun May 12 08:37:01 UTC 2013



----- Original Message -----
> From: "Alon Bar-Lev" <alonbl at redhat.com>
> To: "Barak Azulay" <bazulay at redhat.com>
> Cc: "arch" <arch at ovirt.org>, "Simon Grinberg" <sgrinber at redhat.com>
> Sent: Sunday, May 12, 2013 11:25:45 AM
> Subject: Re: feature suggestion: initial generation of management network
> 
> 
> 
> ----- Original Message -----
> > From: "Barak Azulay" <bazulay at redhat.com>
> > To: "Livnat Peer" <lpeer at redhat.com>
> > Cc: "Alon Bar-Lev" <abarlev at redhat.com>, "arch" <arch at ovirt.org>, "Simon
> > Grinberg" <sgrinber at redhat.com>
> > Sent: Sunday, May 12, 2013 11:15:20 AM
> > Subject: Re: feature suggestion: initial generation of management network
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Livnat Peer" <lpeer at redhat.com>
> > > To: "Moti Asayag" <masayag at redhat.com>
> > > Cc: "arch" <arch at ovirt.org>, "Alon Bar-Lev" <abarlev at redhat.com>, "Barak
> > > Azulay" <bazulay at redhat.com>, "Simon
> > > Grinberg" <sgrinber at redhat.com>
> > > Sent: Sunday, May 12, 2013 9:59:07 AM
> > > Subject: Re: feature suggestion: initial generation of management network
> > > 
> > > Thread Summary -
> > > 
> > > 1. We all agree the automatic reboot after host installation is not
> > > needed anymore and can be removed.
> > > 
> > > 2. There is a vast agreement that we need to add a new VDSM verb for
> > > reboot.
> > 
> > I disagree with the above
> > 
> > In addition to the fact that it will not work when VDSM is not responsive
> > (when this action will be needed the most)
> 
> If vdsm is unresponsive because of a fault in vdsm we can add a fail safe
> mechanism for critical commands within vdsm.
> And we can always fallback to the standard fencing in such cases.
> 
> Can you please describe the scenario of which host-deploy succeeds and vdsm
> is unresponsive?
> 
> Current sequence:
> 1. host-deploy + reboot - all via single ssh session.
> 
> New sequence:
> 1. host-deploy - via ssh.
> 2. network setup - via vdsm.

I'd like to add that if step 2 fails, VDSM should rollback to the last known
network configuration, therefore it shouldn't remain non-responsive in case the
setup network command caused a communication lose.

> 3. optional reboot - via vdsm.
> 
> In the new sequence, vdsm must be responsive to accomplish (2), and if (2)
> succeeds vdsm, again, must be responsive.
> 
> Thanks!
> 
> > 
> > 
> > > 
> > > 3. There was a suggestion to add a checkbox when adding a host to reboot
> > > the host after installation, default would be not to reboot. (leaving
> > > the option to reboot to the administrator).
> > > 
> > > 
> > > If there is no objection we'll go with the above.
> > > 
> > > Thanks, Livnat
> > > 
> > > 
> > > On 05/07/2013 02:22 PM, Moti Asayag wrote:
> > > > I stumbled upon few issues with the current design while implementing
> > > > it:
> > > > 
> > > > There seems to be a requirement to reboot the host after the
> > > > installation
> > > > is completed in order to assure the host is recoverable.
> > > > 
> > > > Therefore, the building blocks of the installation process of 3.3 are:
> > > > 1. host deploy which installs the host expect configuring its
> > > > management
> > > > network.
> > > > 2. SetupNetwork (and CommitNetworkChanges) - for creating the
> > > > management
> > > > network
> > > > on the host and persisting the network configuration.
> > > > 3. Reboot the host - This is a missing piece. (engine has FenceVds
> > > > command,
> > > > but it
> > > > requires the power management to be configured prior to the
> > > > installation
> > > > and might
> > > > be irrelevant for hosts without PM.)
> > > > 
> > > > So, there are couple of issues here:
> > > > 1. How to reboot the host?
> > > > 1.1. By exposing new RebootNode verb in VDSM and invoking it from the
> > > > engine
> > > > 1.2. By opening ssh dialog to the host in order to execute the reboot
> > > > 
> > > > 2. When to perform the reboot?
> > > > 2.1. After host deploy, by utilizing the host deploy to perform the
> > > > reboot.
> > > > It requires to configure the network by the monitor when the host is
> > > > detected by the engine,
> > > > detached from the installation flow. However it is a step toward the
> > > > non-persistent network feature
> > > > yet to be defined.
> > > > 2.2. After setupNetwork is done and network was configured and
> > > > persisted
> > > > on
> > > > the host.
> > > > There is no special advantage from recoverable aspect, as setupNetwork
> > > > is
> > > > constantly
> > > > used to persist the network configuration (by the complementary
> > > > CommitNetworkChanges command).
> > > > In case and network configuration fails, VDSM will revert to the last
> > > > well
> > > > known configuration
> > > > - so connectivity with engine should be restored. Design wise, it fits
> > > > to
> > > > configure the management
> > > >  network as part of the installation sequence.
> > > > If the network configuration fails in this context, the host status
> > > > will
> > > > be
> > > > set to "InstallFailed" rather than "NonOperational",
> > > > as might occur as a result of a failed setupNetwork command.
> > > > 
> > > > 
> > > > Your inputs are welcome.
> > > > 
> > > > Thanks,
> > > > Moti
> > > > ----- Original Message -----
> > > >> From: "Dan Kenigsberg" <danken at redhat.com>
> > > >> To: "Simon Grinberg" <simon at redhat.com>, "Moti Asayag"
> > > >> <masayag at redhat.com>
> > > >> Cc: "arch" <arch at ovirt.org>
> > > >> Sent: Tuesday, January 1, 2013 2:47:57 PM
> > > >> Subject: Re: feature suggestion: initial generation of management
> > > >> network
> > > >>
> > > >> On Thu, Dec 27, 2012 at 07:36:40AM -0500, Simon Grinberg wrote:
> > > >>>
> > > >>>
> > > >>> ----- Original Message -----
> > > >>>> From: "Dan Kenigsberg" <danken at redhat.com>
> > > >>>> To: "Simon Grinberg" <simon at redhat.com>
> > > >>>> Cc: "arch" <arch at ovirt.org>
> > > >>>> Sent: Thursday, December 27, 2012 2:14:06 PM
> > > >>>> Subject: Re: feature suggestion: initial generation of management
> > > >>>> network
> > > >>>>
> > > >>>> On Tue, Dec 25, 2012 at 09:29:26AM -0500, Simon Grinberg wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>> ----- Original Message -----
> > > >>>>>> From: "Dan Kenigsberg" <danken at redhat.com>
> > > >>>>>> To: "arch" <arch at ovirt.org>
> > > >>>>>> Sent: Tuesday, December 25, 2012 2:27:22 PM
> > > >>>>>> Subject: feature suggestion: initial generation of management
> > > >>>>>> network
> > > >>>>>>
> > > >>>>>> Current condition:
> > > >>>>>> ==================
> > > >>>>>> The management network, named ovirtmgmt, is created during host
> > > >>>>>> bootstrap. It consists of a bridge device, connected to the
> > > >>>>>> network
> > > >>>>>> device that was used to communicate with Engine (nic, bonding or
> > > >>>>>> vlan).
> > > >>>>>> It inherits its ip settings from the latter device.
> > > >>>>>>
> > > >>>>>> Why Is the Management Network Needed?
> > > >>>>>> =====================================
> > > >>>>>> Understandably, some may ask why do we need to have a management
> > > >>>>>> network - why having a host with IPv4 configured on it is not
> > > >>>>>> enough.
> > > >>>>>> The answer is twofold:
> > > >>>>>> 1. In oVirt, a network is an abstraction of the resources
> > > >>>>>> required
> > > >>>>>> for
> > > >>>>>>    connectivity of a host for a specific usage. This is true for
> > > >>>>>>    the
> > > >>>>>>    management network just as it is for VM network or a display
> > > >>>>>>    network.
> > > >>>>>>    The network entity is the key for adding/changing nics and IP
> > > >>>>>>    address.
> > > >>>>>> 2. In many occasions (such as small setups) the management
> > > >>>>>> network is
> > > >>>>>>    used as a VM/display network as well.
> > > >>>>>>
> > > >>>>>> Problems in current connectivity:
> > > >>>>>> ================================
> > > >>>>>> According to alonbl of ovirt-host-deploy fame, and with no
> > > >>>>>> conflict
> > > >>>>>> to
> > > >>>>>> my own experience, creating the management network is the most
> > > >>>>>> fragile,
> > > >>>>>> error-prone step of bootstrap.
> > > >>>>>
> > > >>>>> +1,
> > > >>>>> I've raise that repeatedly in the past, bootstrap should not create
> > > >>>>> the management network but pick up the existing configuration and
> > > >>>>> let the engine override later with it's own configuration if it
> > > >>>>> differs , I'm glad that we finally get to that.
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Currently it always creates a bridged network (even if the DC
> > > >>>>>> requires a
> > > >>>>>> non-bridged ovirtmgmt), it knows nothing about the defined MTU
> > > >>>>>> for
> > > >>>>>> ovirtmgmt, it uses ping to guess on top of which device to build
> > > >>>>>> (and
> > > >>>>>> thus requires Vdsm-to-Engine reverse connectivity), and is the
> > > >>>>>> sole
> > > >>>>>> remaining user of the addNetwork/vdsm-store-net-conf scripts.
> > > >>>>>>
> > > >>>>>> Suggested feature:
> > > >>>>>> ==================
> > > >>>>>> Bootstrap would avoid creating a management network. Instead,
> > > >>>>>> after
> > > >>>>>> bootstrapping a host, Engine would send a getVdsCaps probe to the
> > > >>>>>> installed host, receiving a complete picture of the network
> > > >>>>>> configuration on the host. Among this picture is the device that
> > > >>>>>> holds
> > > >>>>>> the host's management IP address.
> > > >>>>>>
> > > >>>>>> Engine would send setupNetwork command to generate ovirtmgmt with
> > > >>>>>> details devised from this picture, and according to the DC
> > > >>>>>> definition
> > > >>>>>> of
> > > >>>>>> ovirtmgmt.  For example, if Vdsm reports:
> > > >>>>>>
> > > >>>>>> - vlan bond4.3000 has the host's IP, configured to use dhcp.
> > > >>>>>> - bond4 is comprises eth2 and eth3
> > > >>>>>> - ovirtmgmt is defined as a VM network with MTU 9000
> > > >>>>>>
> > > >>>>>> then Engine sends the likes of:
> > > >>>>>>   setupNetworks(ovirtmgmt: {bridged=True, vlan=3000, iface=bond4,
> > > >>>>>>                 bonding=bond4: {eth2,eth3}, MTU=9000)
> > > >>>>>
> > > >>>>> Just one comment here,
> > > >>>>> In order to save time and confusion - if the ovirtmgmt is defined
> > > >>>>> with default values meaning the user did not bother to touch it,
> > > >>>>> let it pick up the VLAN configuration from the first host added in
> > > >>>>> the Data Center.
> > > >>>>>
> > > >>>>> Otherwise, you may override the host VLAN and loose connectivity.
> > > >>>>>
> > > >>>>> This will also solve the situation many users encounter today.
> > > >>>>> 1. The engine in on a host that actually has VLAN defined
> > > >>>>> 2. The ovirtmgmt network was not updated in the DC
> > > >>>>> 3. A host, with VLAN already defined is added - everything works
> > > >>>>> fine
> > > >>>>> 4. Any number of hosts are now added, again everything seems to
> > > >>>>> work fine.
> > > >>>>>
> > > >>>>> But, now try to use setupNetworks, and you'll find out that you
> > > >>>>> can't do much on the interface that contains the ovirtmgmt since
> > > >>>>> the definition does not match. You can't sync (Since this will
> > > >>>>> remove the VLAN and cause connectivity lose) you can't add more
> > > >>>>> networks on top since it already has non-VLAN network on top
> > > >>>>> according to the DC definition, etc.
> > > >>>>>
> > > >>>>> On the other hand you can't update the ovirtmgmt definition on the
> > > >>>>> DC since there are clusters in the DC that use the network.
> > > >>>>>
> > > >>>>> The only workaround not involving DB hack to change the VLAN on the
> > > >>>>> network is to:
> > > >>>>> 1. Create new DC
> > > >>>>> 2. Do not use the wizard that pops up to create your cluster.
> > > >>>>> 3. Modify the ovirtmgmt network to have VLANs
> > > >>>>> 4. Now create a cluster and add your hosts.
> > > >>>>>
> > > >>>>> If you insist on using the default DC and cluster then before
> > > >>>>> adding the first host, create an additional DC and move the
> > > >>>>> Default cluster over there. You may then change the network on the
> > > >>>>> Default cluster and then move the Default cluster back
> > > >>>>>
> > > >>>>> Both are ugly. And should be solved by the proposal above.
> > > >>>>>
> > > >>>>> We do something similar for the Default cluster CPU level, where we
> > > >>>>> set the intial level based on the first host added to the cluster.
> > > >>>>
> > > >>>> I'm not sure what Engine has for Default cluster CPU level. But I
> > > >>>> have
> > > >>>> reservation of the hysteresis in your proposal - after a host is
> > > >>>> added,
> > > >>>> the DC cannot forget ovirtmgmt's vlan.
> > > >>>>
> > > >>>> How about letting the admin edit ovirtmgmt's vlan in the DC level,
> > > >>>> thus
> > > >>>> rendering all hosts out-of-sync. The the admin could manually, or
> > > >>>> through a script, or in the future through a distributed operation,
> > > >>>> sync
> > > >>>> all the hosts to the definition?
> > > >>>
> > > >>> Usually if you do that you will loose connectivity to the hosts.
> > > >>
> > > >> Yes, changing the management vlan id (or ip address) is never fun, and
> > > >> requires out-of-band intervention.
> > > >>
> > > >>> I'm not insisting on the automatic adjustment of the ovirtmgmt
> > > >>> network
> > > >>> to
> > > >>> match the hosts' (that is just a nice touch) we can take the allow
> > > >>> edit
> > > >>> approach.
> > > >>>
> > > >>> But allow to change VLAN on the ovirtmgmt network will indeed solve
> > > >>> the
> > > >>> issue I'm trying to solve while creating another issue of user
> > > >>> expecting
> > > >>> that we'll be able to re-tag the host from the engine side, which is
> > > >>> challenging to do.
> > > >>>
> > > >>> On the other hand, if we allow to change the VLAN as long as the
> > > >>> change
> > > >>> matches the hosts' configuration, it will both solve the issue while
> > > >>> not
> > > >>> eluding the user to think that we really can solve the chicken and
> > > >>> egg
> > > >>> issue of re-tag the entire system.
> > > >>>
> > > >>> Now with the above ability you do get a flow to do the re-tag.
> > > >>> 1. Place all the hosts in maintenance
> > > >>> 2. Re-tag the ovirtmgmt on all the hosts
> > > >>> 3. Re-tag the hosts on which the engine on
> > > >>> 4. Activate the hosts - this should work well now since connectivity
> > > >>> exist
> > > >>> 5. Change the tag on ovirtmgmt on the engine to match the hosts'
> > > >>>
> > > >>> Simple and clear process.
> > > >>>
> > > >>> When the workaround of creating another DC was not possible since the
> > > >>> system was already long in use and the need was re-tag of the network
> > > >>> the
> > > >>> above is what I've recommended in the, except that steps 4-5 where
> > > >>> done
> > > >>> as:
> > > >>> 4. Stop the engine
> > > >>> 5. Change the tag in the DB
> > > >>> 6. Start the engine
> > > >>> 7. Activate the hosts
> > > >>
> > > >> Sounds reasonable to me - but as far as I am aware this is not tightly
> > > >> related to the $Subject, which is the post-boot ovirtmgmt definition.
> > > >>
> > > >> I've added a few details to
> > > >> http://www.ovirt.org/Features/Normalized_ovirtmgmt_Initialization#Engine
> > > >> and I would apreciate a review from someone with intimate Engine
> > > >> know-how.
> > > >>
> > > >> Dan.
> > > >>
> > > > _______________________________________________
> > > > Arch mailing list
> > > > Arch at ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/arch
> > > > 
> > > > 
> > > 
> > > 
> > _______________________________________________
> > Arch mailing list
> > Arch at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/arch
> > 
> _______________________________________________
> Arch mailing list
> Arch at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/arch
> 



More information about the Arch mailing list