feature suggestion: migration network

Simon Grinberg simon at redhat.com
Sun Jan 13 12:47:59 UTC 2013



----- Original Message -----
> From: "Livnat Peer" <lpeer at redhat.com>
> To: "Simon Grinberg" <simon at redhat.com>
> Cc: "Dan Kenigsberg" <danken at redhat.com>, arch at ovirt.org, "Orit Wasserman" <owasserm at redhat.com>, "Yuval M"
> <yuvalme at gmail.com>, "Laine Stump" <lstump at redhat.com>, "Limor Gavish" <lgavish at gmail.com>
> Sent: Sunday, January 13, 2013 1:53:23 PM
> Subject: Re: feature suggestion: migration network
> 
> On 01/10/2013 02:54 PM, Simon Grinberg wrote:
> > 
> > 
> > ----- Original Message -----
> >> From: "Dan Kenigsberg" <danken at redhat.com>
> >> To: "Doron Fediuck" <dfediuck at redhat.com>
> >> Cc: "Simon Grinberg" <simon at redhat.com>, "Orit Wasserman"
> >> <owasserm at redhat.com>, "Laine Stump" <lstump at redhat.com>,
> >> "Yuval M" <yuvalme at gmail.com>, "Limor Gavish" <lgavish at gmail.com>,
> >> arch at ovirt.org, "Mark Wu"
> >> <wudxw at linux.vnet.ibm.com>
> >> Sent: Thursday, January 10, 2013 1:46:08 PM
> >> Subject: Re: feature suggestion: migration network
> >>
> >> On Thu, Jan 10, 2013 at 04:43:45AM -0500, Doron Fediuck wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Simon Grinberg" <simon at redhat.com>
> >>>> To: "Mark Wu" <wudxw at linux.vnet.ibm.com>, "Doron Fediuck"
> >>>> <dfediuck at redhat.com>
> >>>> Cc: "Orit Wasserman" <owasserm at redhat.com>, "Laine Stump"
> >>>> <lstump at redhat.com>, "Yuval M" <yuvalme at gmail.com>, "Limor
> >>>> Gavish" <lgavish at gmail.com>, arch at ovirt.org, "Dan Kenigsberg"
> >>>> <danken at redhat.com>
> >>>> Sent: Thursday, January 10, 2013 10:38:56 AM
> >>>> Subject: Re: feature suggestion: migration network
> >>>>
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Mark Wu" <wudxw at linux.vnet.ibm.com>
> >>>>> To: "Dan Kenigsberg" <danken at redhat.com>
> >>>>> Cc: "Simon Grinberg" <simon at redhat.com>, "Orit Wasserman"
> >>>>> <owasserm at redhat.com>, "Laine Stump" <lstump at redhat.com>,
> >>>>> "Yuval M" <yuvalme at gmail.com>, "Limor Gavish"
> >>>>> <lgavish at gmail.com>,
> >>>>> arch at ovirt.org
> >>>>> Sent: Thursday, January 10, 2013 5:13:23 AM
> >>>>> Subject: Re: feature suggestion: migration network
> >>>>>
> >>>>> On 01/09/2013 03:34 AM, Dan Kenigsberg wrote:
> >>>>>> On Tue, Jan 08, 2013 at 01:23:02PM -0500, Simon Grinberg
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: "Yaniv Kaul" <ykaul at redhat.com>
> >>>>>>>> To: "Dan Kenigsberg" <danken at redhat.com>
> >>>>>>>> Cc: "Limor Gavish" <lgavish at gmail.com>, "Yuval M"
> >>>>>>>> <yuvalme at gmail.com>, arch at ovirt.org, "Simon Grinberg"
> >>>>>>>> <sgrinber at redhat.com>
> >>>>>>>> Sent: Tuesday, January 8, 2013 4:46:10 PM
> >>>>>>>> Subject: Re: feature suggestion: migration network
> >>>>>>>>
> >>>>>>>> On 08/01/13 15:04, Dan Kenigsberg wrote:
> >>>>>>>>> There's talk about this for ages, so it's time to have
> >>>>>>>>> proper
> >>>>>>>>> discussion
> >>>>>>>>> and a feature page about it: let us have a "migration"
> >>>>>>>>> network
> >>>>>>>>> role, and
> >>>>>>>>> use such networks to carry migration data
> >>>>>>>>>
> >>>>>>>>> When Engine requests to migrate a VM from one node to
> >>>>>>>>> another,
> >>>>>>>>> the
> >>>>>>>>> VM
> >>>>>>>>> state (Bios, IO devices, RAM) is transferred over a TCP/IP
> >>>>>>>>> connection
> >>>>>>>>> that is opened from the source qemu process to the
> >>>>>>>>> destination
> >>>>>>>>> qemu.
> >>>>>>>>> Currently, destination qemu listens for the incoming
> >>>>>>>>> connection
> >>>>>>>>> on
> >>>>>>>>> the
> >>>>>>>>> management IP address of the destination host. This has
> >>>>>>>>> serious
> >>>>>>>>> downsides: a "migration storm" may choke the destination's
> >>>>>>>>> management
> >>>>>>>>> interface; migration is plaintext and ovirtmgmt includes
> >>>>>>>>> Engine
> >>>>>>>>> which
> >>>>>>>>> sits may sit the node cluster.
> >>>>>>>>>
> >>>>>>>>> With this feature, a cluster administrator may grant the
> >>>>>>>>> "migration"
> >>>>>>>>> role to one of the cluster networks. Engine would use that
> >>>>>>>>> network's IP
> >>>>>>>>> address on the destination host when it requests a
> >>>>>>>>> migration
> >>>>>>>>> of
> >>>>>>>>> a
> >>>>>>>>> VM.
> >>>>>>>>> With proper network setup, migration data would be
> >>>>>>>>> separated
> >>>>>>>>> to
> >>>>>>>>> that
> >>>>>>>>> network.
> >>>>>>>>>
> >>>>>>>>> === Benefit to oVirt ===
> >>>>>>>>> * Users would be able to define and dedicate a separate
> >>>>>>>>> network
> >>>>>>>>> for
> >>>>>>>>>     migration. Users that need quick migration would use
> >>>>>>>>>     nics
> >>>>>>>>>     with
> >>>>>>>>>     high
> >>>>>>>>>     bandwidth. Users who want to cap the bandwidth
> >>>>>>>>>     consumed by
> >>>>>>>>>     migration
> >>>>>>>>>     could define a migration network over nics with
> >>>>>>>>>     bandwidth
> >>>>>>>>>     limitation.
> >>>>>>>>> * Migration data can be limited to a separate network,
> >>>>>>>>> that
> >>>>>>>>> has
> >>>>>>>>> no
> >>>>>>>>>     layer-2 access from Engine
> >>>>>>>>>
> >>>>>>>>> === Vdsm ===
> >>>>>>>>> The "migrate" verb should be extended with an additional
> >>>>>>>>> parameter,
> >>>>>>>>> specifying the address that the remote qemu process should
> >>>>>>>>> listen
> >>>>>>>>> on. A
> >>>>>>>>> new argument is to be added to the currently-defined
> >>>>>>>>> migration
> >>>>>>>>> arguments:
> >>>>>>>>> * vmId: UUID
> >>>>>>>>> * dst: management address of destination host
> >>>>>>>>> * dstparams: hibernation volumes definition
> >>>>>>>>> * mode: migration/hibernation
> >>>>>>>>> * method: rotten legacy
> >>>>>>>>> * ''New'': migration uri, according to
> >>>>>>>>> http://libvirt.org/html/libvirt-libvirt.html#virDomainMigrateToURI2
> >>>>>>>>> such as tcp://<ip of migration network on remote node>
> >>>>>>>>>
> >>>>>>>>> === Engine ===
> >>>>>>>>> As usual, complexity lies here, and several changes are
> >>>>>>>>> required:
> >>>>>>>>>
> >>>>>>>>> 1. Network definition.
> >>>>>>>>> 1.1 A new network role - not unlike "display network"
> >>>>>>>>> should
> >>>>>>>>> be
> >>>>>>>>>       added.Only one migration network should be defined
> >>>>>>>>>       on a
> >>>>>>>>>       cluster.
> >>>>>>> We are considering multiple display networks already, then
> >>>>>>> why
> >>>>>>> not
> >>>>>>> the
> >>>>>>> same for migration?
> >>>>>> What is the motivation of having multiple migration networks?
> >>>>>> Extending
> >>>>>> the bandwidth (and thus, any network can be taken when
> >>>>>> needed) or
> >>>>>> data separation (and thus, a migration network should be
> >>>>>> assigned
> >>>>>> to
> >>>>>> each VM in the cluster)? Or another morivation with
> >>>>>> consequence?
> >>>>> My suggestion is making the migration network role determined
> >>>>> dynamically on each migrate.  If we only define one migration
> >>>>> network
> >>>>> per cluster,
> >>>>> the migration storm could happen to that network. It could
> >>>>> cause
> >>>>> some
> >>>>> bad impact on VM applications.  So I think engine could choose
> >>>>> the
> >>>>> network which
> >>>>> has lower traffic load on migration, or leave the choice to
> >>>>> user.
> >>>>
> >>>> Dynamic migration selection is indeed desirable but only from
> >>>> migration networks - migration traffic is insecure so it's
> >>>> undesirable to have it mixed with VM traffic unless permitted by
> >>>> the
> >>>> admin by marking this network as migration network.
> >>>>
> >>>> To clarify what I've meant in the previous response to Livnat -
> >>>> When
> >>>> I've said "...if the customer due to the unsymmetrical nature of
> >>>> most bonding modes prefers to use muplitple networks for
> >>>> migration
> >>>> and will ask us to optimize migration across these..."
> >>>>
> >>>> But the dynamic selection should be based on SLA which the above
> >>>> is
> >>>> just part:
> >>>> 1. Need to consider tenant traffic segregation rules = security
> >>>> 2. SLA contracts
> >>
> >> We could devise a complex logic of assigning each Vm a pool of
> >> applicable migration networks, where one of them is chosen by
> >> Engine
> >> upon migration startup.
> >>
> >> I am, however, not at all sure that extending the migration
> >> bandwidth
> >> by
> >> means of multiple migration networks is worth the design hassle
> >> and
> >> the
> >> GUI noise. A simpler solution would be to build a single migration
> >> network on top of a fat bond, tweaked by a fine-tuned SLA.
> > 
> > Except for mod-4 most bonding modes are either optimized for
> > outbound optimization or inbound - not both. It's far from
> > optimal.
> > And you are forgetting the other reason I've raised, like isolation
> > of tenants traffic and not just from SLA reasons.
> > 
> 
> Why do we need isolation of tenants migration traffic if not for SLA
> reasons?


Security (migration is not encrypted) and segregation of resources (poor man's/simple stupid SLA or until you have real SLA) and as said before better utilization of resources (Bond are asymmetric). SLA in our discussion is maintained via traffic shaping and this has it's performance impact, the first 3 are not.  

Another reason would be to use with external network providers like CISCO or Mellanox who already have traffic control. There you may well easily have dedicated networks per tenant, including migration network (as part of a tenant dedicated resources and segregation of resources) 


> 
> > Even from pure active - active redundancy you may want to have more
> > then one or asymmetrical hosts
> 
> That's again going back to SLA policies and not specific for the
> migration network.
> 
> > Example.
> > We have a host with 3 nics - you dedicate each for management,
> > migration, storage - respectively. But if the migration fails, you
> > want the engagement network to become your migration
> > (automatically)
> > 
> 
> OR you may not want that.
> That's a policy for handling network roles, not related specifically
> to
> migration network.

right, but there is a chicken and egg thing here
Unless you have multiple migration networks, you won't be able to implement the above
If you implement the above without pre-defining multiple networks that are allowed to act as migration networks, the implementation may be more complex. 

> 
> 
> > Another:
> > A large host with many nics and smaller host with less - as long as
> > this a rout between the migration and management networks you
> > could think on a scenario where on the larger host you have
> > separate networks for each role while on the smaller you have a
> > single network assuming both rolls.
> > 
> 
> I'm not sure this is the main use case and if we want to make the
> general flow complicated because of exotic use cases.

What I'm trying to say here is:
Please do not look at each use case separately, I agree that estimating each one by one may lead you to say: This one is not worth it, and the other on stand alone not worth it, and so on. But looking at everything put together it accumulates.  

> 
> Maybe what you are looking for is override on host level to network
> roles. Not sure how useful this is though.

Maybe,
I've already suggested to allow override on per migration bases 

> > Other examples can be found.
> > 
> 
> If you have some main use cases I would love to here them maybe they
> can
> make the requirement more clear.

Gave some above,
I think for the immediate terms the most compelling is the external network provider use case, where you want to allow the external network management to rout/shape the traffic per tenant, something that will be hard to do if all is aggregated on the host.
 
But coming to think of it, I like more and more the idea of having migration network as part of the VM configuration. It's both simple to do now and later add logic on top if required, and VDSM supports that already now. 

So:
1. Have a default migration network per cluster (default is the management network as before)
2. This is the default migration network for all VMs created in that cluster
3. Allow in VM properties to override this (Tenant use case, and supports the external network manager use case)
4. Allow from the migration network to override as well. 

Simple, powerful, flexible, while the logic is not complicated since the engine has nothing to decide - everything is orchestrated by the admin while initial out of the box setup is very simple (one migration network for all which is by default the management network). 

Later you may apply policies on top of this. 

Thoughts? 

> 
> > It's really not just one reason to support more then one migration
> > network or display network or storage or any other 'facility'
> > network. Any facility network may apply for more then one on a
> > cluster.
> > 
> 
> I'm not sure display can be on the same bucket as migration
> management
> and storage.

I think it can in the tenant use case, but I will be happy to get a solution like the above (have a default network per cluster and allow to override per VM)


> 
> > 
> >>
> >>>>
> >>>> If you keep 2, migration storms mitigation is granted. But you
> >>>> are
> >>>> right that another feature required for #2 above is to control
> >>>> the
> >>>> migration bandwidth (BW) per migration. We had discussion in the
> >>>> past for VDSM to do dynamic calculation based on f(Line Speed,
> >>>> Max
> >>>> Migration BW, Max allowed per VM, Free BW, number of migrating
> >>>> machines) when starting migration. (I actually wanted to do so
> >>>> years
> >>>> ago, but never got to that - one of those things you always
> >>>> postpone
> >>>> to when you'll find the time). We did not think that the engine
> >>>> should provide some, but coming to think of it, you are right
> >>>> and
> >>>> it
> >>>> makes sense. For SLA - Max per VM + Min guaranteed should be
> >>>> provided by the engine to maintain SLA. And it's up to the
> >>>> engine
> >>>> not to VMs with Min-Guaranteed x number of concurrent migrations
> >>>> will exceed Max Migration BW.
> >>>>
> >>>> Dan this is way too much for initial implementation, but don't
> >>>> you
> >>>> think we should at least add place holders in the migration API?
> >>
> >> In my opinion this should wait for another feature. For each VM,
> >> I'd
> >> like to see a means to define the SLA of each of its vNIC. When we
> >> have
> >> that, we should similarly define how much bandwidth does it have
> >> for
> >> migration
> >>
> >>>> Maybe Doron can assist with the required verbs.
> >>>>
> >>>> (P.S., I don't want to alarm but we may need SLA parameters for
> >>>> setupNetworks as well :) unless we want these as separate API
> >>>> tough
> >>>> it means more calls during set up)
> >>
> >> Exactly - when we have a migration network concept, and when we
> >> have
> >> general network SLA defition, we could easily apply the latter on
> >> the
> >> former.
> >>
> >>>>
> >>>
> >>> As with other resources the bare minimum are usually MIN capacity
> >>> and
> >>> MAX to avoid choking of other tenants / VMs. In this context we
> >>> may
> >>> need
> >>> to consider other QoS elements (delays, etc) but indeed it can be
> >>> an additional
> >>> limitation on top of the basic one.
> >>>
> >>
> > _______________________________________________
> > Arch mailing list
> > Arch at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/arch
> > 
> 
> 



More information about the Arch mailing list