[ovirt-devel] migration enhancements feature

Michal Skrivanek mskrivan at redhat.com
Tue Sep 15 06:55:50 UTC 2015


On Sep 15, 2015, at 08:49 , Tomas Jelinek <tjelinek at redhat.com> wrote:

> 
> 
> ----- Original Message -----
>> From: "Yaniv Kaul" <ykaul at redhat.com>
>> To: "Tomas Jelinek" <tjelinek at redhat.com>
>> Cc: devel at ovirt.org, "Martin Polednik" <mpolednik at redhat.com>, "Michal Skrivanek" <mskrivan at redhat.com>
>> Sent: Monday, September 14, 2015 9:29:12 PM
>> Subject: Re: [ovirt-devel] migration enhancements feature
>> 
>> On Mon, Sep 14, 2015 at 3:35 PM, Tomas Jelinek <tjelinek at redhat.com> wrote:
>>> Hi all,
>>> 
>>> there is an effort for enhancing the speed and convergence of the
>>> migrations (especially for large VMs).
>>> 
>>> The feature page targeted for 4.0 is [1].
>>> 
>>> TL;DR:
>>> - remove current logic from VDSM and move to engine in form of policies
>>> - employ post-copy migration
>>> - employ traffic shaping
>>> - protect destination VDSM against migration storms
>>> 
>>> Any comments more than welcome!
>>> Tomas
>>> 
>>> [1]: http://www.ovirt.org/Features/Migration_Enhancements
>> 
>> I think we need to look at (any/the) feature from the user
>> perspective, first and foremost. How would the user use the feature?
>> What 'knobs' he may tweak to get better migration results? Which can

We already have too many knobs (around 8) most people do not know about, let alone how to use them properly.
simplification is one of the main goals

>> we do for him? Which ones will be used on the expense of others?
>> Do we truly believe a user will know what to tweak to get a better
>> result? Exposing every parameter, in that sense, is
>> counter-productive.
> 
> No, we do not want to expose all parameters to user and let him tweak each of them with no guidance.
> What we wanted to do from user perspective was to provide 3 policies:
> - "Safe but may not converge" - basically the same as today but with better downtime handling
> - "Should converge but guest may notice a pause" - if not converging sets the downtime to very high value (e.g. 90 seconds)
> - "Guaranteed to converge" - if not converging turns to post-copy mode which guarantees to converge but brings the risk of loosing the VM

I think we have yet to figure out the right settings/names based on testing
e.g. the impact of compression and current autoconverge is not completely clear (helps with some workloads, doesn't help with the others)
the whole post-copy migration is brand new (and we don't yet know if will be completely ready in 4.0)

> 
> These will be the prepared policies from user perspective (the details about how will they be configured are explained on the wiki). 
> The user may be allowed to create his own policies but not sure if it makes sense...
> 
> The parameters of the migration will be pre-filled by defualts - from user perspective it will be simply how much aggressive he want the migration to be.
> 
>> 
>> Specific example: should a user enable or not compression? What will
>> he gain? I assume, less bandwidth needed for migration. Would it help
>> for his migration (I assume it'll take longer, take more CPU, etc.) or
>> not? When migrating one big heavily-used VM? When migrating twenty
>> idle single-core VMs? Any point enabling it for 10Gb dedicated
>> migration network? And 1Gb shared network which is heavily used by
>> others? etc.
> 
> This options will be hidden under policies. From user perspective it would be: 
> 1: the "Safe but not may not converge" is selected by default but for whatever reason my VMs are not converging
> 2: hmmm, I want them to migrate, lets try something more aggressive (pick "Should converge but guest may notice a pause")
> 3: still nothing, something even more aggressive? (pick "Guaranteed to converge")
> 
> BTW there are also other enhancements planned which should help:
> - protect destination host from storms
> - use bigger bandwidth when faster network available (it will be a cluster level setting but by default engine will pre-fill by knowing how fast networks are there)
> - use traffic shaping
> - use more smart downtime algorithm (don't push unrealistically low downtimes etc)
> 
>> 
>> Y.
>> 
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>> 




More information about the Devel mailing list