feature suggestion: initial generation of management network

Livnat Peer lpeer at redhat.com
Sun May 12 08:46:46 UTC 2013


On 05/12/2013 11:25 AM, Alon Bar-Lev wrote:
> 
> 
> ----- Original Message -----
>> From: "Barak Azulay" <bazulay at redhat.com>
>> To: "Livnat Peer" <lpeer at redhat.com>
>> Cc: "Alon Bar-Lev" <abarlev at redhat.com>, "arch" <arch at ovirt.org>, "Simon Grinberg" <sgrinber at redhat.com>
>> Sent: Sunday, May 12, 2013 11:15:20 AM
>> Subject: Re: feature suggestion: initial generation of management network
>>
>>
>>
>> ----- Original Message -----
>>> From: "Livnat Peer" <lpeer at redhat.com>
>>> To: "Moti Asayag" <masayag at redhat.com>
>>> Cc: "arch" <arch at ovirt.org>, "Alon Bar-Lev" <abarlev at redhat.com>, "Barak
>>> Azulay" <bazulay at redhat.com>, "Simon
>>> Grinberg" <sgrinber at redhat.com>
>>> Sent: Sunday, May 12, 2013 9:59:07 AM
>>> Subject: Re: feature suggestion: initial generation of management network
>>>
>>> Thread Summary -
>>>
>>> 1. We all agree the automatic reboot after host installation is not
>>> needed anymore and can be removed.
>>>
>>> 2. There is a vast agreement that we need to add a new VDSM verb for
>>> reboot.
>>
>> I disagree with the above
>>
>> In addition to the fact that it will not work when VDSM is not responsive
>> (when this action will be needed the most)
> 
> If vdsm is unresponsive because of a fault in vdsm we can add a fail safe mechanism for critical commands within vdsm.
> And we can always fallback to the standard fencing in such cases.
> 
> Can you please describe the scenario of which host-deploy succeeds and vdsm is unresponsive?
> 
> Current sequence:
> 1. host-deploy + reboot - all via single ssh session.
> 
> New sequence:
> 1. host-deploy - via ssh.
> 2. network setup - via vdsm.
> 3. optional reboot - via vdsm.
> 
> In the new sequence, vdsm must be responsive to accomplish (2), and if (2) succeeds vdsm, again, must be responsive.
> 


+1, fully agree with the above.

> Thanks!
> 
>>
>>
>>>
>>> 3. There was a suggestion to add a checkbox when adding a host to reboot
>>> the host after installation, default would be not to reboot. (leaving
>>> the option to reboot to the administrator).
>>>
>>>
>>> If there is no objection we'll go with the above.
>>>
>>> Thanks, Livnat
>>>
>>>
>>> On 05/07/2013 02:22 PM, Moti Asayag wrote:
>>>> I stumbled upon few issues with the current design while implementing it:
>>>>
>>>> There seems to be a requirement to reboot the host after the installation
>>>> is completed in order to assure the host is recoverable.
>>>>
>>>> Therefore, the building blocks of the installation process of 3.3 are:
>>>> 1. host deploy which installs the host expect configuring its management
>>>> network.
>>>> 2. SetupNetwork (and CommitNetworkChanges) - for creating the management
>>>> network
>>>> on the host and persisting the network configuration.
>>>> 3. Reboot the host - This is a missing piece. (engine has FenceVds
>>>> command,
>>>> but it
>>>> requires the power management to be configured prior to the installation
>>>> and might
>>>> be irrelevant for hosts without PM.)
>>>>
>>>> So, there are couple of issues here:
>>>> 1. How to reboot the host?
>>>> 1.1. By exposing new RebootNode verb in VDSM and invoking it from the
>>>> engine
>>>> 1.2. By opening ssh dialog to the host in order to execute the reboot
>>>>
>>>> 2. When to perform the reboot?
>>>> 2.1. After host deploy, by utilizing the host deploy to perform the
>>>> reboot.
>>>> It requires to configure the network by the monitor when the host is
>>>> detected by the engine,
>>>> detached from the installation flow. However it is a step toward the
>>>> non-persistent network feature
>>>> yet to be defined.
>>>> 2.2. After setupNetwork is done and network was configured and persisted
>>>> on
>>>> the host.
>>>> There is no special advantage from recoverable aspect, as setupNetwork is
>>>> constantly
>>>> used to persist the network configuration (by the complementary
>>>> CommitNetworkChanges command).
>>>> In case and network configuration fails, VDSM will revert to the last
>>>> well
>>>> known configuration
>>>> - so connectivity with engine should be restored. Design wise, it fits to
>>>> configure the management
>>>>  network as part of the installation sequence.
>>>> If the network configuration fails in this context, the host status will
>>>> be
>>>> set to "InstallFailed" rather than "NonOperational",
>>>> as might occur as a result of a failed setupNetwork command.
>>>>
>>>>
>>>> Your inputs are welcome.
>>>>
>>>> Thanks,
>>>> Moti
>>>> ----- Original Message -----
>>>>> From: "Dan Kenigsberg" <danken at redhat.com>
>>>>> To: "Simon Grinberg" <simon at redhat.com>, "Moti Asayag"
>>>>> <masayag at redhat.com>
>>>>> Cc: "arch" <arch at ovirt.org>
>>>>> Sent: Tuesday, January 1, 2013 2:47:57 PM
>>>>> Subject: Re: feature suggestion: initial generation of management
>>>>> network
>>>>>
>>>>> On Thu, Dec 27, 2012 at 07:36:40AM -0500, Simon Grinberg wrote:
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Dan Kenigsberg" <danken at redhat.com>
>>>>>>> To: "Simon Grinberg" <simon at redhat.com>
>>>>>>> Cc: "arch" <arch at ovirt.org>
>>>>>>> Sent: Thursday, December 27, 2012 2:14:06 PM
>>>>>>> Subject: Re: feature suggestion: initial generation of management
>>>>>>> network
>>>>>>>
>>>>>>> On Tue, Dec 25, 2012 at 09:29:26AM -0500, Simon Grinberg wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Dan Kenigsberg" <danken at redhat.com>
>>>>>>>>> To: "arch" <arch at ovirt.org>
>>>>>>>>> Sent: Tuesday, December 25, 2012 2:27:22 PM
>>>>>>>>> Subject: feature suggestion: initial generation of management
>>>>>>>>> network
>>>>>>>>>
>>>>>>>>> Current condition:
>>>>>>>>> ==================
>>>>>>>>> The management network, named ovirtmgmt, is created during host
>>>>>>>>> bootstrap. It consists of a bridge device, connected to the
>>>>>>>>> network
>>>>>>>>> device that was used to communicate with Engine (nic, bonding or
>>>>>>>>> vlan).
>>>>>>>>> It inherits its ip settings from the latter device.
>>>>>>>>>
>>>>>>>>> Why Is the Management Network Needed?
>>>>>>>>> =====================================
>>>>>>>>> Understandably, some may ask why do we need to have a management
>>>>>>>>> network - why having a host with IPv4 configured on it is not
>>>>>>>>> enough.
>>>>>>>>> The answer is twofold:
>>>>>>>>> 1. In oVirt, a network is an abstraction of the resources
>>>>>>>>> required
>>>>>>>>> for
>>>>>>>>>    connectivity of a host for a specific usage. This is true for
>>>>>>>>>    the
>>>>>>>>>    management network just as it is for VM network or a display
>>>>>>>>>    network.
>>>>>>>>>    The network entity is the key for adding/changing nics and IP
>>>>>>>>>    address.
>>>>>>>>> 2. In many occasions (such as small setups) the management
>>>>>>>>> network is
>>>>>>>>>    used as a VM/display network as well.
>>>>>>>>>
>>>>>>>>> Problems in current connectivity:
>>>>>>>>> ================================
>>>>>>>>> According to alonbl of ovirt-host-deploy fame, and with no
>>>>>>>>> conflict
>>>>>>>>> to
>>>>>>>>> my own experience, creating the management network is the most
>>>>>>>>> fragile,
>>>>>>>>> error-prone step of bootstrap.
>>>>>>>>
>>>>>>>> +1,
>>>>>>>> I've raise that repeatedly in the past, bootstrap should not create
>>>>>>>> the management network but pick up the existing configuration and
>>>>>>>> let the engine override later with it's own configuration if it
>>>>>>>> differs , I'm glad that we finally get to that.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Currently it always creates a bridged network (even if the DC
>>>>>>>>> requires a
>>>>>>>>> non-bridged ovirtmgmt), it knows nothing about the defined MTU
>>>>>>>>> for
>>>>>>>>> ovirtmgmt, it uses ping to guess on top of which device to build
>>>>>>>>> (and
>>>>>>>>> thus requires Vdsm-to-Engine reverse connectivity), and is the
>>>>>>>>> sole
>>>>>>>>> remaining user of the addNetwork/vdsm-store-net-conf scripts.
>>>>>>>>>
>>>>>>>>> Suggested feature:
>>>>>>>>> ==================
>>>>>>>>> Bootstrap would avoid creating a management network. Instead,
>>>>>>>>> after
>>>>>>>>> bootstrapping a host, Engine would send a getVdsCaps probe to the
>>>>>>>>> installed host, receiving a complete picture of the network
>>>>>>>>> configuration on the host. Among this picture is the device that
>>>>>>>>> holds
>>>>>>>>> the host's management IP address.
>>>>>>>>>
>>>>>>>>> Engine would send setupNetwork command to generate ovirtmgmt with
>>>>>>>>> details devised from this picture, and according to the DC
>>>>>>>>> definition
>>>>>>>>> of
>>>>>>>>> ovirtmgmt.  For example, if Vdsm reports:
>>>>>>>>>
>>>>>>>>> - vlan bond4.3000 has the host's IP, configured to use dhcp.
>>>>>>>>> - bond4 is comprises eth2 and eth3
>>>>>>>>> - ovirtmgmt is defined as a VM network with MTU 9000
>>>>>>>>>
>>>>>>>>> then Engine sends the likes of:
>>>>>>>>>   setupNetworks(ovirtmgmt: {bridged=True, vlan=3000, iface=bond4,
>>>>>>>>>                 bonding=bond4: {eth2,eth3}, MTU=9000)
>>>>>>>>
>>>>>>>> Just one comment here,
>>>>>>>> In order to save time and confusion - if the ovirtmgmt is defined
>>>>>>>> with default values meaning the user did not bother to touch it,
>>>>>>>> let it pick up the VLAN configuration from the first host added in
>>>>>>>> the Data Center.
>>>>>>>>
>>>>>>>> Otherwise, you may override the host VLAN and loose connectivity.
>>>>>>>>
>>>>>>>> This will also solve the situation many users encounter today.
>>>>>>>> 1. The engine in on a host that actually has VLAN defined
>>>>>>>> 2. The ovirtmgmt network was not updated in the DC
>>>>>>>> 3. A host, with VLAN already defined is added - everything works
>>>>>>>> fine
>>>>>>>> 4. Any number of hosts are now added, again everything seems to
>>>>>>>> work fine.
>>>>>>>>
>>>>>>>> But, now try to use setupNetworks, and you'll find out that you
>>>>>>>> can't do much on the interface that contains the ovirtmgmt since
>>>>>>>> the definition does not match. You can't sync (Since this will
>>>>>>>> remove the VLAN and cause connectivity lose) you can't add more
>>>>>>>> networks on top since it already has non-VLAN network on top
>>>>>>>> according to the DC definition, etc.
>>>>>>>>
>>>>>>>> On the other hand you can't update the ovirtmgmt definition on the
>>>>>>>> DC since there are clusters in the DC that use the network.
>>>>>>>>
>>>>>>>> The only workaround not involving DB hack to change the VLAN on the
>>>>>>>> network is to:
>>>>>>>> 1. Create new DC
>>>>>>>> 2. Do not use the wizard that pops up to create your cluster.
>>>>>>>> 3. Modify the ovirtmgmt network to have VLANs
>>>>>>>> 4. Now create a cluster and add your hosts.
>>>>>>>>
>>>>>>>> If you insist on using the default DC and cluster then before
>>>>>>>> adding the first host, create an additional DC and move the
>>>>>>>> Default cluster over there. You may then change the network on the
>>>>>>>> Default cluster and then move the Default cluster back
>>>>>>>>
>>>>>>>> Both are ugly. And should be solved by the proposal above.
>>>>>>>>
>>>>>>>> We do something similar for the Default cluster CPU level, where we
>>>>>>>> set the intial level based on the first host added to the cluster.
>>>>>>>
>>>>>>> I'm not sure what Engine has for Default cluster CPU level. But I
>>>>>>> have
>>>>>>> reservation of the hysteresis in your proposal - after a host is
>>>>>>> added,
>>>>>>> the DC cannot forget ovirtmgmt's vlan.
>>>>>>>
>>>>>>> How about letting the admin edit ovirtmgmt's vlan in the DC level,
>>>>>>> thus
>>>>>>> rendering all hosts out-of-sync. The the admin could manually, or
>>>>>>> through a script, or in the future through a distributed operation,
>>>>>>> sync
>>>>>>> all the hosts to the definition?
>>>>>>
>>>>>> Usually if you do that you will loose connectivity to the hosts.
>>>>>
>>>>> Yes, changing the management vlan id (or ip address) is never fun, and
>>>>> requires out-of-band intervention.
>>>>>
>>>>>> I'm not insisting on the automatic adjustment of the ovirtmgmt network
>>>>>> to
>>>>>> match the hosts' (that is just a nice touch) we can take the allow edit
>>>>>> approach.
>>>>>>
>>>>>> But allow to change VLAN on the ovirtmgmt network will indeed solve the
>>>>>> issue I'm trying to solve while creating another issue of user
>>>>>> expecting
>>>>>> that we'll be able to re-tag the host from the engine side, which is
>>>>>> challenging to do.
>>>>>>
>>>>>> On the other hand, if we allow to change the VLAN as long as the change
>>>>>> matches the hosts' configuration, it will both solve the issue while
>>>>>> not
>>>>>> eluding the user to think that we really can solve the chicken and egg
>>>>>> issue of re-tag the entire system.
>>>>>>
>>>>>> Now with the above ability you do get a flow to do the re-tag.
>>>>>> 1. Place all the hosts in maintenance
>>>>>> 2. Re-tag the ovirtmgmt on all the hosts
>>>>>> 3. Re-tag the hosts on which the engine on
>>>>>> 4. Activate the hosts - this should work well now since connectivity
>>>>>> exist
>>>>>> 5. Change the tag on ovirtmgmt on the engine to match the hosts'
>>>>>>
>>>>>> Simple and clear process.
>>>>>>
>>>>>> When the workaround of creating another DC was not possible since the
>>>>>> system was already long in use and the need was re-tag of the network
>>>>>> the
>>>>>> above is what I've recommended in the, except that steps 4-5 where done
>>>>>> as:
>>>>>> 4. Stop the engine
>>>>>> 5. Change the tag in the DB
>>>>>> 6. Start the engine
>>>>>> 7. Activate the hosts
>>>>>
>>>>> Sounds reasonable to me - but as far as I am aware this is not tightly
>>>>> related to the $Subject, which is the post-boot ovirtmgmt definition.
>>>>>
>>>>> I've added a few details to
>>>>> http://www.ovirt.org/Features/Normalized_ovirtmgmt_Initialization#Engine
>>>>> and I would apreciate a review from someone with intimate Engine
>>>>> know-how.
>>>>>
>>>>> Dan.
>>>>>
>>>> _______________________________________________
>>>> Arch mailing list
>>>> Arch at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/arch
>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> Arch mailing list
>> Arch at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/arch
>>




More information about the Arch mailing list