
Hello list, Our storage domains are iSCSI on dedicated network, and when migrating VMs, the duration varies according to the size of the vDisks. The smallest VMs are migrated in about 20 seconds, while the biggest one may take more than 5 or 10 minutes. The average duration is 90 seconds. Questions : 1- Though I may have understood that the task of migration was made by the SPM, I don't know what it actually does? (which bytes goes where) 2- Do our times sound OK, or does it look like improvable? 3- What bottleneck should I investigate? I'm thinking about the dedicated hardware NICs setup of the hosts, the SAN, the MTU has already been setup at 9000... Any ideas welcomed. -- Nicolas Ecarnot

I’m under the impression it depends more on the hosts memory assignment than disk size. libvirt has to synchronize that over your networking setup. Your times sound like mine over 1G ethernet with a 9000 MTU, most of my machines are 1-4GB ram. I’ve another setup with a 10G backend that can migrate larger machines much faster. Things that do a lot of memory access (databases, say) or use more of their allocated memory, tend to take longer to migrate as it’s more work for libvirt to get it synchronized. A 10G+ backend is the best way to speed this up, and there are libvirt variables you can tweak to allocate more bandwidth to a migration (and the # of simultaneous migrations you allow). I think the defaults are 3 at max of 30% of your available bandwidth. I don’t think this takes bonds into account, so if you have bonded connections, you may be able to allocate more % or allow more simultaneous migrations. Keep in mind that if you’re sharing bandwidth/media with iSCSI, that some bandwidth will be needed there as well, how much depends on your storage load. A dedicated NIC could definitely help, especially if you’re trying to tune libvirt for this. -Darrell

This is a multi-part message in MIME format. ------=_NextPartTM-000-63f4ff93-f297-42cb-8e2f-bb3b2498361c Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable p. Your =0A= ultaneous =0A= the SPM, I don't know what it actually does? (which bytes goes where)=0A= p at 9000...=0A= than speed depends mostly on memory change pressure. The=0A= more changes per second the more restart the process needs.=0A= Best solution to speed it up is to enlarge migration_max_bandwidth=0A= in /etc/vdsm/vdsm.conf from default 30MB/s to something higher.=0A= We use 150Mb/s in 10Gbit network. With default we have seen =0A= migrations that will not come to an end.=0A= =0A= When talking about disks. It depends on how many disks you=0A= have attached to a single VM. The more disks and the more =0A= similar their sizes they are the faster you can migrate/operate=0A= on them.=0A= =0A= For example take a SAP system with 3 disks of 20GB system=0A= 20 GB executables and 300GB database. When issung disk=0A= operations (like snapshots) they will start in parallel for each disk. =0A= Disk operations will finish earlier for smaller disks. So in the end=0A= you will have only one operation left that may take hours.=0A= =0A= E.g. delete snapshot will start at ~220MB/s when running with=0A= three disks and end at ~60MB/s when only one disk snapshot =0A= deletion is active.=0A= =0A= Best regards.=0A= =0A= Markus= ------=_NextPartTM-000-63f4ff93-f297-42cb-8e2f-bb3b2498361c Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Über das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ------=_NextPartTM-000-63f4ff93-f297-42cb-8e2f-bb3b2498361c--

This is a multi-part message in MIME format. --------------000301030102040906030909 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit On 02/13/2015 08:20 PM, Markus Stockhausen wrote:
moreover, upstream qemu have some more ways to speed this up - post migration copy (a.k.a "user page faults") - basically migrate immediate to the dest and copy mem pages from source - migration over rdma - migration throttling - http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00040.html
--------------000301030102040906030909 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 02/13/2015 08:20 PM, Markus Stockhausen wrote:<br> </div> <blockquote cite="mid:12EF8D94C6F8734FB2FF37B9FBEDD1735F9DE418@EXCHANGE.collogia.de" type="cite"> <blockquote type="cite"> <pre wrap="">Von: <a class="moz-txt-link-abbreviated" href="mailto:users-bounces@ovirt.org">users-bounces@ovirt.org</a> [<a class="moz-txt-link-abbreviated" href="mailto:users-bounces@ovirt.org">users-bounces@ovirt.org</a>]" im Auftrag von "Darrell Budic [<a class="moz-txt-link-abbreviated" href="mailto:budic@onholyground.com">budic@onholyground.com</a>] Gesendet: Freitag, 13. Februar 2015 19:03 An: Nicolas Ecarnot Cc: users Betreff: Re: [ovirt-users] How long do your migrations last? Im under the impression it depends more on the hosts memory assignment than disk size. libvirt has to synchronize that over your networking setup. Your times sound like mine over 1G ethernet with a 9000 MTU, most of my machines are 1-4GB ram. Ive another setup with a 10G backend that can migrate larger machines much faster. Things that do a lot of memory access (databases, say) or use more of their allocated memory, tend to take longer to migrate as its more work for libvirt to get it synchronized. A 10G+ backend is the best way to speed this up, and there are libvirt variables you can tweak to allocate more bandwidth to a migration (and the # of simultaneous migrations you allow). I think the defaults are 3 at max of 30% of your available bandwidth. I dont think this takes bonds into account, so if you have bonded connections, you may be able to allocate more % or allow more simultaneous migrations. Keep in mind that if youre sharing bandwidth/media with iSCSI, that some bandwidth will be needed there as well, how much depends on your storage load. A dedicated NIC could definitely help, especially if youre trying to tune libvirt for this. -Darrell </pre> <blockquote type="cite"> <pre wrap="">On Feb 13, 2015, at 8:53 AM, Nicolas Ecarnot <a class="moz-txt-link-rfc2396E" href="mailto:nicolas@ecarnot.net"><nicolas@ecarnot.net></a> wrote: Hello list, Our storage domains are iSCSI on dedicated network, and when migrating VMs, the duration varies according to the size of the vDisks. The smallest VMs are migrated in about 20 seconds, while the biggest one may take more than 5 or 10 minutes. The average duration is 90 seconds. Questions : 1- Though I may have understood that the task of migration was made by the SPM, I don't know what it actually does? (which bytes goes where) 2- Do our times sound OK, or does it look like improvable? 3- What bottleneck should I investigate? I'm thinking about the dedicated hardware NICs setup of the hosts, the SAN, the MTU has already been setup at 9000... Any ideas welcomed. -- </pre> </blockquote> <pre wrap="">Nicolas Ecarnot </pre> </blockquote> <pre wrap=""> If we speak about migration of VMs - relocating qemu process - than speed depends mostly on memory change pressure. The more changes per second the more restart the process needs. Best solution to speed it up is to enlarge migration_max_bandwidth in /etc/vdsm/vdsm.conf from default 30MB/s to something higher. We use 150Mb/s in 10Gbit network. With default we have seen migrations that will not come to an end. </pre> </blockquote> +1<br> <br> moreover, upstream qemu have some more ways to speed this up<br> <br> - post migration copy (a.k.a "user page faults") - basically migrate immediate to the dest and copy mem pages from source<br> - migration over rdma <br> - migration throttling - <a href="http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00040.html">http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00040.html</a> <br> <br> <blockquote cite="mid:12EF8D94C6F8734FB2FF37B9FBEDD1735F9DE418@EXCHANGE.collogia.de" type="cite"> <pre wrap=""> When talking about disks. It depends on how many disks you have attached to a single VM. The more disks and the more similar their sizes they are the faster you can migrate/operate on them. For example take a SAP system with 3 disks of 20GB system 20 GB executables and 300GB database. When issung disk operations (like snapshots) they will start in parallel for each disk. Disk operations will finish earlier for smaller disks. So in the end you will have only one operation left that may take hours. E.g. delete snapshot will start at ~220MB/s when running with three disks and end at ~60MB/s when only one disk snapshot deletion is active. Best regards. Markus=</pre> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------000301030102040906030909--
participants (4)
-
Darrell Budic
-
Markus Stockhausen
-
Nicolas Ecarnot
-
Roy Golan