[Engine-devel] VDSM tasks, the future

Tue Dec 4 15:35:01 UTC 2012

Because I started hinting about how VDSM tasks are going to look going forward I thought it's better I'll just write everything in an email so we can talk about it in context.
This is not set in stone and I'm still debating things myself but it's very close to being done.

- Everything is asynchronous.
  The nature of message based communication is that you can't have synchronous operations.
  This is not really debatable because it's just how TCP\AMQP\<messaging> works.

- Task IDs will be decided by the caller.
  This is how json-rpc works and also makes sense because no the engine can track the task without needing to have a stage where we give it the task ID back.
  IDs are reusable as long as no one else is using them at the time so they can be used for synchronizing operations between clients (making sure a command is
  only executed once on a specific host without locking).

- Tasks are transient
  If VDSM restarts it forgets all the task information.
  There are 2 ways to have persistent tasks:
  1. The task creates an object that you can continue work on in VDSM.
     The new storage does that by the fact that copyImage() returns one the target volume has been created but before the data has been fully copied.
     From that moment on the stat of the copy can be queried from any host using getImageStatus() and the specific copy operation can be queried with getTaskStatus() on the host performing it.
     After VDSM crashes, depending on policy, either VDSM will create a new task to continue the copy or someone else will send a command to continue the operation and that will be a new task.
  2. VDSM tasks just start other operations track-able not through the task interface. For example Gluster.
     gluster.startVolumeRebalance() will return once it has been registered with Gluster.
     glster.getOperationStatuses() will return the state of the operation from any host.
     Each call is a task in itself.

- No task tags.
  They are silly and the caller can mangle whatever in the task ID if he really wants to tag tasks.

- No explicit recovery stage.
  VDSM will be crash-only, there should be efforts to make everything crash-safe.
  If that is problematic, in case of networking, VDSM will recover on start without having a task for it.

- No clean Task:
  Tasks can be started by any number of hosts this means that there is no way to own all tasks.
  There could be cases where VDSM starts tasks on it's own and thus they have no owner at all.
  The caller needs to continually track the state of VDSM. We will have brodcasted events to mitigate polling.

- No revert
  Impossible to implement safely.

- No SPM\HSM tasks
  SPM\SDM is no longer necessary for all domain types (only for type).
  What used to be SPM tasks, or tasks that persist and can be restarted on other hosts is talked about in previous bullet points.