
Hi all, as part of the effort to enhance the migration convergence [1] we are proposing a semaphore for incoming migrations [2] (similar to outgoing). It's purpose is to protect the destination host from migration storms where too many migrations are coming to it from different sources. There are basically 3 ways how to do it (with pros/cons): 1: when the destination host refuses the migration, the source host tries it again later (considering no migration will take forever after some time the migration will succeed to start) (+) pros: (+) if the engine wants to migrate to a specific host (and only to the specific host because user did pick it) than it only sends the command and it will happen (now or later) (+) will not interfere with engine re-runs since the migration will fail only when there is a real issue (+) will be consistent with the current outgoing semaphore (since the outgoing semaphore also waits until has capacity and than starts the migration) (+) VDSM is more autonomous because after the engine sends the command, VDSM will do it even if engine disappears in this moment (-) cons: (-) re-try on VDSM is not common (-) if the user does not pick a specific destination and he just wants to migrate the machine out of the source, waiting on the destination to have capacity can be wasteful since failing the migration and picking a different host could lead to better results 2: when the destination host refuses the migration, the source host returns to engine "migration failed" and the engine will have to handle it somehow (+) pros: (+) simpler vdsm (try to migrate, if the destination does not have capacity, fail) (+) lets the engine to pick a different destination host (-) cons: (-) not consistent with the outgoing migration semaphore (since if there are more VMs waiting for outgoing migrations semaphore, the migration does not fail but waits) (-) engine would have to handle different kinds of migration failed reasons (-) VDSM is not autonomous - if the engine disappears the migration will not be started (-) Here I'm not sure about the consequences to scheduler but I think it would have to be reworked to accommodate the different kinds of re-run. Any ideas from someone more familiar with this? Roy, Martin? 3: (hybrid) - if the user picks a specific host, VDSM will use the first way, if the user will not pick a specific host, VDSM will use the second option (+) pros: (+) works well with both cases when the intention is to migrate the machine TO A SPECIFIC host and when the intention is just to migrate the VM out to ANY host (-) cons: (-) more complicated VDSM (-) still will interfere with engine scheduling (-) not consistent with current VDSM's outgoing semaphore The currently proposed patch [2] is the first option. Please note that we would also like to enrich the scheduler to be aware of max incoming migrations limit thus preventing the storms, but it is a separate topic (no patches around yet). Here the question is that when the storm happens, how should VDSM protect itself. Any ideas? Thank you, Tomas [1]: www.ovirt.org/Features/Migration_Enhancements [2]: https://gerrit.ovirt.org/#/c/45954/