[Engine-devel] Asynchronous tasks for live merge

Hi all, As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM. The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed. 5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into? Thanks in advance for taking the time to think about this flow and for providing your insights! -- Adam Litke

On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote:
Hi all,
As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM.
The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed.
You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic "hsm" tasks? I suggest to have something loosely modeled on posix fork/wait. - Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error. - Engine may monitor the task (a-la wait(WNOHANG)) - When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host. This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all. Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho.
5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB
Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into?
Thanks in advance for taking the time to think about this flow and for providing your insights!

On 03/03/2014 04:28 PM, Dan Kenigsberg wrote:
On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote:
Hi all,
As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM.
The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed.
You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic "hsm" tasks?
I suggest to have something loosely modeled on posix fork/wait.
- Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error.
- Engine may monitor the task (a-la wait(WNOHANG))
- When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host.
This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all.
Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho.
5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB
Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into?
Thanks in advance for taking the time to think about this flow and for providing your insights!
Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
the way i read Adam's proposal, there is no "task" entity at vdsm side to monitor, rather the state of the object the operation is performed on (similar to CreateVM, where the engine monitors the state of the VM, rather than the CreateVM request).

On 03/03/14 16:36 +0200, Itamar Heim wrote:
On 03/03/2014 04:28 PM, Dan Kenigsberg wrote:
On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote:
Hi all,
As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM.
The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed.
You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic "hsm" tasks?
I suggest to have something loosely modeled on posix fork/wait.
- Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error.
- Engine may monitor the task (a-la wait(WNOHANG))
- When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host.
This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all.
Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho.
5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB
Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into?
Thanks in advance for taking the time to think about this flow and for providing your insights!
Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
the way i read Adam's proposal, there is no "task" entity at vdsm side to monitor, rather the state of the object the operation is performed on (similar to CreateVM, where the engine monitors the state of the VM, rather than the CreateVM request).
Yeah, we use the term "job" in order to avoid assumptions and implications (ie. rollback/cancel, persistence) that come with the word "task". "Job" essentially means "libvirt Block Job", but I am trying to allow for extension in the future. Vdsm would collect block job information for devices it expects to have active block jobs and report them all under a single structure in the VM statistics. There would be no persistence of information so when a libvirt block job goes poof, vdsm will stop reporting it. -- Adam Litke

On Mon, Mar 03, 2014 at 09:56:56AM -0500, Adam Litke wrote:
On 03/03/14 16:36 +0200, Itamar Heim wrote:
On 03/03/2014 04:28 PM, Dan Kenigsberg wrote:
On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote:
Hi all,
As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM.
The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed.
You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic "hsm" tasks?
I suggest to have something loosely modeled on posix fork/wait.
- Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error.
- Engine may monitor the task (a-la wait(WNOHANG))
- When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host.
This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all.
Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho.
5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB
Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into?
Thanks in advance for taking the time to think about this flow and for providing your insights!
Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
the way i read Adam's proposal, there is no "task" entity at vdsm side to monitor, rather the state of the object the operation is performed on (similar to CreateVM, where the engine monitors the state of the VM, rather than the CreateVM request).
Yeah, we use the term "job" in order to avoid assumptions and implications (ie. rollback/cancel, persistence) that come with the word "task". "Job" essentially means "libvirt Block Job", but I am trying to allow for extension in the future. Vdsm would collect block job information for devices it expects to have active block jobs and report them all under a single structure in the VM statistics. There would be no persistence of information so when a libvirt block job goes poof, vdsm will stop reporting it.
I know, but since we need someothing quite similar for setupNetwork, I'm suggesting to have have something generic enough to fulfill both use cases. Instead of having one part of Engine poll for pending VmJobs, and another polling on whether a network finally got its address from the dhcp server. As Itamar said, a very similar logic exists for migration, and for starting up a new VM. I'm not suggesting rollback, cancelation or fancy persistence. And not even a progress indication, although I'm pretty sure it would come up handy in the future.

On 03/03/14 14:28 +0000, Dan Kenigsberg wrote:
On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote:
Hi all,
As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM.
The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed.
You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic "hsm" tasks?
Sure, I am happy to have that conversation :) If I understand correctly, HSM tasks, while ideal, might be too complex to get right and would block the Live Merge feature for longer than we would like. Has anyone looked into what it would take to implement a HSM Tasks framework like this in vdsm? Are there any WIP implementations? If the scope of this is not too big, it can be completed relatively quickly, and the resulting implementation would cover all known use cases, then this could be worth it. It's important to support Live Merge soon. Regarding deprecation of the current tasks API: Could your suggested HSM Tasks framework be extended to cover SPM/SDM tasks as well? I would hope that a it could. In that case, we could look forward to a unified async task architecture in vdsm.
I suggest to have something loosely modeled on posix fork/wait.
- Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error.
- Engine may monitor the task (a-la wait(WNOHANG))
Allon has communicated a desire to limit engine-side polling. Perhaps the active tasks could be added to the host stats?
- When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host.
This could be a good comprimise. I hate the idea of requiring engine to play janitor and clean up stale vdsm data, but there is not much better of a way to do it. Allowing reboot to auto-clear tasks will at least provide some backstop to how long tasks could pile up if forgotten.
This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all.
If we were to consider this, I would want to vet the architecture against all known use cases for tasks to make sure we don't need to create a new framework in 3 months.
Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho.
Are we okay with abandoning vdsm-side rollback entirely as we move forward? Won't that be a regression for at least some error flows (especially in the realm of SPM tasks)?
5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB
Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into?
Thanks in advance for taking the time to think about this flow and for providing your insights!
-- Adam Litke

On Mon, Mar 03, 2014 at 09:51:15AM -0500, Adam Litke wrote:
On 03/03/14 14:28 +0000, Dan Kenigsberg wrote:
On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote:
Hi all,
As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM.
The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed.
You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic "hsm" tasks?
Sure, I am happy to have that conversation :) If I understand correctly, HSM tasks, while ideal, might be too complex to get right and would block the Live Merge feature for longer than we would like. Has anyone looked into what it would take to implement a HSM Tasks framework like this in vdsm? Are there any WIP implementations? If the scope of this is not too big, it can be completed relatively quickly, and the resulting implementation would cover all known use cases, then this could be worth it. It's important to support Live Merge soon.
Regarding deprecation of the current tasks API: Could your suggested HSM Tasks framework be extended to cover SPM/SDM tasks as well? I would hope that a it could. In that case, we could look forward to a unified async task architecture in vdsm.
The current task framework in Vdsm is outrageously complex, yet unreliable. It meant to do all kinds of things, like having the new spm take over a task that was orphaned by the former spm. This has never worked properly. I'm looking for a much simpler infrastructure, where rollback is done by virtue of having an "except" clause, and spm-only verbs simply fail when the host loses spm status for some reason.
I suggest to have something loosely modeled on posix fork/wait.
- Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error.
- Engine may monitor the task (a-la wait(WNOHANG))
Allon has communicated a desire to limit engine-side polling. Perhaps the active tasks could be added to the host stats?
Engine is reluctant to add more polling, I understand that. I'd prefer a standalone new getAllTaskStats2() verb, but if lumping it into getVdsStats is going to convince everybody to have it, I'd put my aesthetic taste in the fridge.
- When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host.
This could be a good comprimise. I hate the idea of requiring engine to play janitor and clean up stale vdsm data, but there is not much better of a way to do it. Allowing reboot to auto-clear tasks will at least provide some backstop to how long tasks could pile up if forgotten.
This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all.
If we were to consider this, I would want to vet the architecture against all known use cases for tasks to make sure we don't need to create a new framework in 3 months.
I'm afraid that our time scale is a bit longer (for good and for worse), but for sure, we'd need to list all possible users of such an infrastructure.
Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho.
Are we okay with abandoning vdsm-side rollback entirely as we move forward? Won't that be a regression for at least some error flows (especially in the realm of SPM tasks)?
We would have to maintain the current spm task framework. But depending on a Vdsm-side rollback to succeed was an old mistake. Rollback may fail just as roll-forward does; thus Engine must handle the case of a lost task. So why bother? Vdsm should do its best to finish a task, and clean after itself. If it dies while at it, only Engine can ask another Vdsm to pick up the pieces.
5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB
participants (3)
-
Adam Litke
-
Dan Kenigsberg
-
Itamar Heim