[Engine-devel] VDSM tasks, the future
Adam Litke
agl at us.ibm.com
Tue Dec 4 20:50:28 UTC 2012
On Tue, Dec 04, 2012 at 10:35:01AM -0500, Saggi Mizrahi wrote:
> Because I started hinting about how VDSM tasks are going to look going forward
> I thought it's better I'll just write everything in an email so we can talk
> about it in context. This is not set in stone and I'm still debating things
> myself but it's very close to being done.
Don't debate them yourself, debate them here! Even better, propose your idea in
schema form to show how a command might work exactly.
> - Everything is asynchronous. The nature of message based communication is
> that you can't have synchronous operations. This is not really debatable
> because it's just how TCP\AMQP\<messaging> works.
Can you show how a traditionally synchronous command might work? Let's take
Host.getVmList as an example.
> - Task IDs will be decided by the caller. This is how json-rpc works and also
> makes sense because no the engine can track the task without needing to have a
> stage where we give it the task ID back. IDs are reusable as long as no one
> else is using them at the time so they can be used for synchronizing
> operations between clients (making sure a command is only executed once on a
> specific host without locking).
>
> - Tasks are transient If VDSM restarts it forgets all the task information.
> There are 2 ways to have persistent tasks: 1. The task creates an object that
> you can continue work on in VDSM. The new storage does that by the fact that
> copyImage() returns one the target volume has been created but before the data
> has been fully copied. From that moment on the stat of the copy can be
> queried from any host using getImageStatus() and the specific copy operation
> can be queried with getTaskStatus() on the host performing it. After VDSM
> crashes, depending on policy, either VDSM will create a new task to continue
> the copy or someone else will send a command to continue the operation and
> that will be a new task. 2. VDSM tasks just start other operations track-able
> not through the task interface. For example Gluster.
> gluster.startVolumeRebalance() will return once it has been registered with
> Gluster. glster.getOperationStatuses() will return the state of the operation
> from any host. Each call is a task in itself.
I worry about this approach because every command has a different semantic for
checking progress. For migration, we have to check VM status on the src and
dest hosts. For image copy we need to use a special status call on the dest
image. It would be nice if there was a unified method for checking on an
operation. Maybe that can be completion events.
Client: vdsm:
------- -----
Image.copy(...) -->
<-- Operation Started
Wait for event ...
<-- Event: Operation <id> done <code>
For an early error:
Client: vdsm:
------- -----
Image.copy(...) -->
<-- Error: <code>
> - No task tags. They are silly and the caller can mangle whatever in the task
> ID if he really wants to tag tasks.
Yes. Agreed.
> - No explicit recovery stage. VDSM will be crash-only, there should be
> efforts to make everything crash-safe. If that is problematic, in case of
> networking, VDSM will recover on start without having a task for it.
How does this work in practice for something like creating a new image from a
template?
> - No clean Task: Tasks can be started by any number of hosts this means that
> there is no way to own all tasks. There could be cases where VDSM starts
> tasks on it's own and thus they have no owner at all. The caller needs to
> continually track the state of VDSM. We will have brodcasted events to
> mitigate polling.
If a disconnected client might have missed a completion event, it will need to
check state. This means each async operation that changes state must document a
proceedure for checking progress of a potentially ongoing operation. For
Image.copy, that process would be to lookup the new image and check its state.
> - No revert Impossible to implement safely.
How do the engine folks feel about this? I am ok with it :)
> - No SPM\HSM tasks SPM\SDM is no longer necessary for all domain types (only
> for type). What used to be SPM tasks, or tasks that persist and can be
> restarted on other hosts is talked about in previous bullet points.
>
A nice simplification.
--
Adam Litke <agl at us.ibm.com>
IBM Linux Technology Center
More information about the Engine-devel
mailing list