migration enhancements feature

Hi all, there is an effort for enhancing the speed and convergence of the migrations (especially for large VMs). The feature page targeted for 4.0 is [1]. TL;DR: - remove current logic from VDSM and move to engine in form of policies - employ post-copy migration - employ traffic shaping - protect destination VDSM against migration storms Any comments more than welcome! Tomas [1]: http://www.ovirt.org/Features/Migration_Enhancements

On Mon, Sep 14, 2015 at 3:35 PM, Tomas Jelinek <tjelinek@redhat.com> wrote:
Hi all,
there is an effort for enhancing the speed and convergence of the migrations (especially for large VMs).
The feature page targeted for 4.0 is [1].
TL;DR: - remove current logic from VDSM and move to engine in form of policies - employ post-copy migration - employ traffic shaping - protect destination VDSM against migration storms
Any comments more than welcome! Tomas
I think we need to look at (any/the) feature from the user perspective, first and foremost. How would the user use the feature? What 'knobs' he may tweak to get better migration results? Which can we do for him? Which ones will be used on the expense of others? Do we truly believe a user will know what to tweak to get a better result? Exposing every parameter, in that sense, is counter-productive. Specific example: should a user enable or not compression? What will he gain? I assume, less bandwidth needed for migration. Would it help for his migration (I assume it'll take longer, take more CPU, etc.) or not? When migrating one big heavily-used VM? When migrating twenty idle single-core VMs? Any point enabling it for 10Gb dedicated migration network? And 1Gb shared network which is heavily used by others? etc. Y.
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

----- Original Message -----
From: "Yaniv Kaul" <ykaul@redhat.com> To: "Tomas Jelinek" <tjelinek@redhat.com> Cc: devel@ovirt.org, "Martin Polednik" <mpolednik@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Monday, September 14, 2015 9:29:12 PM Subject: Re: [ovirt-devel] migration enhancements feature
On Mon, Sep 14, 2015 at 3:35 PM, Tomas Jelinek <tjelinek@redhat.com> wrote:
Hi all,
there is an effort for enhancing the speed and convergence of the migrations (especially for large VMs).
The feature page targeted for 4.0 is [1].
TL;DR: - remove current logic from VDSM and move to engine in form of policies - employ post-copy migration - employ traffic shaping - protect destination VDSM against migration storms
Any comments more than welcome! Tomas
I think we need to look at (any/the) feature from the user perspective, first and foremost. How would the user use the feature? What 'knobs' he may tweak to get better migration results? Which can we do for him? Which ones will be used on the expense of others? Do we truly believe a user will know what to tweak to get a better result? Exposing every parameter, in that sense, is counter-productive.
No, we do not want to expose all parameters to user and let him tweak each of them with no guidance. What we wanted to do from user perspective was to provide 3 policies: - "Safe but may not converge" - basically the same as today but with better downtime handling - "Should converge but guest may notice a pause" - if not converging sets the downtime to very high value (e.g. 90 seconds) - "Guaranteed to converge" - if not converging turns to post-copy mode which guarantees to converge but brings the risk of loosing the VM These will be the prepared policies from user perspective (the details about how will they be configured are explained on the wiki). The user may be allowed to create his own policies but not sure if it makes sense... The parameters of the migration will be pre-filled by defualts - from user perspective it will be simply how much aggressive he want the migration to be.
Specific example: should a user enable or not compression? What will he gain? I assume, less bandwidth needed for migration. Would it help for his migration (I assume it'll take longer, take more CPU, etc.) or not? When migrating one big heavily-used VM? When migrating twenty idle single-core VMs? Any point enabling it for 10Gb dedicated migration network? And 1Gb shared network which is heavily used by others? etc.
This options will be hidden under policies. From user perspective it would be: 1: the "Safe but not may not converge" is selected by default but for whatever reason my VMs are not converging 2: hmmm, I want them to migrate, lets try something more aggressive (pick "Should converge but guest may notice a pause") 3: still nothing, something even more aggressive? (pick "Guaranteed to converge") BTW there are also other enhancements planned which should help: - protect destination host from storms - use bigger bandwidth when faster network available (it will be a cluster level setting but by default engine will pre-fill by knowing how fast networks are there) - use traffic shaping - use more smart downtime algorithm (don't push unrealistically low downtimes etc)
Y.
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Sep 15, 2015, at 08:49 , Tomas Jelinek <tjelinek@redhat.com> wrote:
----- Original Message -----
From: "Yaniv Kaul" <ykaul@redhat.com> To: "Tomas Jelinek" <tjelinek@redhat.com> Cc: devel@ovirt.org, "Martin Polednik" <mpolednik@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Monday, September 14, 2015 9:29:12 PM Subject: Re: [ovirt-devel] migration enhancements feature
On Mon, Sep 14, 2015 at 3:35 PM, Tomas Jelinek <tjelinek@redhat.com> wrote:
Hi all,
there is an effort for enhancing the speed and convergence of the migrations (especially for large VMs).
The feature page targeted for 4.0 is [1].
TL;DR: - remove current logic from VDSM and move to engine in form of policies - employ post-copy migration - employ traffic shaping - protect destination VDSM against migration storms
Any comments more than welcome! Tomas
I think we need to look at (any/the) feature from the user perspective, first and foremost. How would the user use the feature? What 'knobs' he may tweak to get better migration results? Which can
We already have too many knobs (around 8) most people do not know about, let alone how to use them properly. simplification is one of the main goals
we do for him? Which ones will be used on the expense of others? Do we truly believe a user will know what to tweak to get a better result? Exposing every parameter, in that sense, is counter-productive.
No, we do not want to expose all parameters to user and let him tweak each of them with no guidance. What we wanted to do from user perspective was to provide 3 policies: - "Safe but may not converge" - basically the same as today but with better downtime handling - "Should converge but guest may notice a pause" - if not converging sets the downtime to very high value (e.g. 90 seconds) - "Guaranteed to converge" - if not converging turns to post-copy mode which guarantees to converge but brings the risk of loosing the VM
I think we have yet to figure out the right settings/names based on testing e.g. the impact of compression and current autoconverge is not completely clear (helps with some workloads, doesn't help with the others) the whole post-copy migration is brand new (and we don't yet know if will be completely ready in 4.0)
These will be the prepared policies from user perspective (the details about how will they be configured are explained on the wiki). The user may be allowed to create his own policies but not sure if it makes sense...
The parameters of the migration will be pre-filled by defualts - from user perspective it will be simply how much aggressive he want the migration to be.
Specific example: should a user enable or not compression? What will he gain? I assume, less bandwidth needed for migration. Would it help for his migration (I assume it'll take longer, take more CPU, etc.) or not? When migrating one big heavily-used VM? When migrating twenty idle single-core VMs? Any point enabling it for 10Gb dedicated migration network? And 1Gb shared network which is heavily used by others? etc.
This options will be hidden under policies. From user perspective it would be: 1: the "Safe but not may not converge" is selected by default but for whatever reason my VMs are not converging 2: hmmm, I want them to migrate, lets try something more aggressive (pick "Should converge but guest may notice a pause") 3: still nothing, something even more aggressive? (pick "Guaranteed to converge")
BTW there are also other enhancements planned which should help: - protect destination host from storms - use bigger bandwidth when faster network available (it will be a cluster level setting but by default engine will pre-fill by knowing how fast networks are there) - use traffic shaping - use more smart downtime algorithm (don't push unrealistically low downtimes etc)
Y.
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (3)
-
Michal Skrivanek
-
Tomas Jelinek
-
Yaniv Kaul