
I'm having some issues with a couple of VMs in my cluster that cannot be live migrated between hosts. It appears to be the issue that is referenced by https://access.redhat.com/solutions/870113 and https://access.redhat.com/solutions/1159723. I currently have a dedicated 1Gbps network for migrations and have modified some of my vdsm settings so that my vdsm.conf looks like this: [vars] ssl = true migration_timeout = 600 migration_max_bandwidth = 100 max_outgoing_migrations = 3 migration_downtime = 1000 [addresses] management_port = 54321 The issue was happening before the configuration changes too. The relevant lines from my vdsm.log from the host that the VM is being migrated from: http://ix.io/xeT Has anyone been able to resolve this or come up with a workaround for this problem? I understand that it is because the migration cannot occur as fast as the changes in the VM are happening so this could be difficult to get working. Some downtime is acceptable for these VMs, so if there are further config changes that can be made which allow for more downtime then I would consider that a solution. Ollie

On 21 Apr 2016, at 13:31, Ollie Armstrong <ollie@fubra.com> wrote:
I'm having some issues with a couple of VMs in my cluster that cannot be live migrated between hosts. It appears to be the issue that is referenced by https://access.redhat.com/solutions/870113 and https://access.redhat.com/solutions/1159723.
I currently have a dedicated 1Gbps network for migrations and have modified some of my vdsm settings so that my vdsm.conf looks like this:
[vars] ssl = true migration_timeout = 600 migration_max_bandwidth = 100 max_outgoing_migrations = 3
you should lower this, otherwise it won’t fit (1 migration at 100MBps would eat your whole link already). So use 1
migration_downtime = 1000
it’s not really 1000. There’s a UI for that in Edit VM, so you shouldn’t set it here at all and if you do, the algorithm behind it is not-so-great in <=3.6.5. Use migration_downtime_steps=3 or 5 or something lower to mitigate that wrong algorithm behavior if you need to overcome a longer peak period of activity, also change migration_progress_timeout to something more than 150s Thanks, michal
[addresses] management_port = 54321
The issue was happening before the configuration changes too.
The relevant lines from my vdsm.log from the host that the VM is being migrated from: http://ix.io/xeT
Has anyone been able to resolve this or come up with a workaround for this problem? I understand that it is because the migration cannot occur as fast as the changes in the VM are happening so this could be difficult to get working. Some downtime is acceptable for these VMs, so if there are further config changes that can be made which allow for more downtime then I would consider that a solution.
Ollie _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 21 April 2016 at 12:43, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
max_outgoing_migrations = 3
you should lower this, otherwise it won’t fit (1 migration at 100MBps would eat your whole link already). So use 1
migration_downtime = 1000
it’s not really 1000. There’s a UI for that in Edit VM, so you shouldn’t set it here at all and if you do, the algorithm behind it is not-so-great in <=3.6.5. Use migration_downtime_steps=3 or 5 or something lower to mitigate that wrong algorithm behavior
if you need to overcome a longer peak period of activity, also change migration_progress_timeout to something more than 150s
Thanks so much for the tips Michal, I really appreciate it. I'll have to try these tweaks in my next maintenance window, so I'll post my findings in a few days. Cheers, Ollie

On 21 Apr 2016, at 14:01, Ollie Armstrong <ollie@fubra.com> wrote:
On 21 April 2016 at 12:43, Michal Skrivanek <michal.skrivanek@redhat.com> wrote:
max_outgoing_migrations = 3
you should lower this, otherwise it won’t fit (1 migration at 100MBps would eat your whole link already). So use 1
migration_downtime = 1000
it’s not really 1000. There’s a UI for that in Edit VM, so you shouldn’t set it here at all and if you do, the algorithm behind it is not-so-great in <=3.6.5. Use migration_downtime_steps=3 or 5 or something lower to mitigate that wrong algorithm behavior
if you need to overcome a longer peak period of activity, also change migration_progress_timeout to something more than 150s
Thanks so much for the tips Michal, I really appreciate it. I'll have to try these tweaks in my next maintenance window, so I'll post my findings in a few days.
Hi, great. i’d like to hear back i don’t know your workload or constraints, but if you don’t know what to try then i would do migration_max_bandwidth=100 max_outgoing_migrations=1 migration_downtime=5000 (if you don’t mind at most 5s delay the VM may experience during the handover) migration_downtime_steps=3 migration_progress_timeout=600 and please gather the log the same way as before so i can see how is it progressing over time, it’s helpful to understand why it doesn’t converge 4.0 will provide better logging capabilities, for now it’s not so great…but at least something Thanks, michal
Cheers, Ollie
participants (2)
-
Michal Skrivanek
-
Ollie Armstrong