On Tue, Dec 04, 2012 at 10:35:01AM -0500, Saggi Mizrahi wrote:
Because I started hinting about how VDSM tasks are going to look
going forward
I thought it's better I'll just write everything in an email so we can talk
about it in context. This is not set in stone and I'm still debating things
myself but it's very close to being done.
Don't debate them yourself, debate them here! Even better, propose your idea in
schema form to show how a command might work exactly.
- Everything is asynchronous. The nature of message based
communication is
that you can't have synchronous operations. This is not really debatable
because it's just how TCP\AMQP\<messaging> works.
Can you show how a traditionally synchronous command might work? Let's take
Host.getVmList as an example.
- Task IDs will be decided by the caller. This is how json-rpc works
and also
makes sense because no the engine can track the task without needing to have a
stage where we give it the task ID back. IDs are reusable as long as no one
else is using them at the time so they can be used for synchronizing
operations between clients (making sure a command is only executed once on a
specific host without locking).
- Tasks are transient If VDSM restarts it forgets all the task information.
There are 2 ways to have persistent tasks: 1. The task creates an object that
you can continue work on in VDSM. The new storage does that by the fact that
copyImage() returns one the target volume has been created but before the data
has been fully copied. From that moment on the stat of the copy can be
queried from any host using getImageStatus() and the specific copy operation
can be queried with getTaskStatus() on the host performing it. After VDSM
crashes, depending on policy, either VDSM will create a new task to continue
the copy or someone else will send a command to continue the operation and
that will be a new task. 2. VDSM tasks just start other operations track-able
not through the task interface. For example Gluster.
gluster.startVolumeRebalance() will return once it has been registered with
Gluster. glster.getOperationStatuses() will return the state of the operation
from any host. Each call is a task in itself.
I worry about this approach because every command has a different semantic for
checking progress. For migration, we have to check VM status on the src and
dest hosts. For image copy we need to use a special status call on the dest
image. It would be nice if there was a unified method for checking on an
operation. Maybe that can be completion events.
Client: vdsm:
------- -----
Image.copy(...) -->
<-- Operation Started
Wait for event ...
<-- Event: Operation <id> done <code>
For an early error:
Client: vdsm:
------- -----
Image.copy(...) -->
<-- Error: <code>
- No task tags. They are silly and the caller can mangle whatever in
the task
ID if he really wants to tag tasks.
Yes. Agreed.
- No explicit recovery stage. VDSM will be crash-only, there should
be
efforts to make everything crash-safe. If that is problematic, in case of
networking, VDSM will recover on start without having a task for it.
How does this work in practice for something like creating a new image from a
template?
- No clean Task: Tasks can be started by any number of hosts this
means that
there is no way to own all tasks. There could be cases where VDSM starts
tasks on it's own and thus they have no owner at all. The caller needs to
continually track the state of VDSM. We will have brodcasted events to
mitigate polling.
If a disconnected client might have missed a completion event, it will need to
check state. This means each async operation that changes state must document a
proceedure for checking progress of a potentially ongoing operation. For
Image.copy, that process would be to lookup the new image and check its state.
- No revert Impossible to implement safely.
How do the engine folks feel about this? I am ok with it :)
- No SPM\HSM tasks SPM\SDM is no longer necessary for all domain
types (only
for type). What used to be SPM tasks, or tasks that persist and can be
restarted on other hosts is talked about in previous bullet points.
A nice simplification.
--
Adam Litke <agl(a)us.ibm.com>
IBM Linux Technology Center