Re: [Engine-devel] [vdsm] VDSM tasks, the future

5 Dec 2012

      This is a multi-part message in MIME format.
--------------070307080901000202010002
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 12/05/2012 12:20 AM, Saggi Mizrahi wrote:
...
As the only subsystem to use asynchronous tasks until now is the storage subsystem I suggest going over how
I suggest tackling task creation, task stop, task remove and task recovery.
Other subsystem can create similar mechanisms depending on their needs.
There is no way of avoiding it, different types of tasks need different ways of tracking\recovering from them.
network should always auto-recover because it can't get a "please fix" command if the network is down.
Storage on the other hand should never start operations on it's own because it might take up valuable resources from the host.
Tasks that need to be tracked on a single host, 2 hosts, or the entire cluster need to have their own APIs.
VM configuration never persist across reboots, networking sometimes persists and storage always persists.
This means that recovery procedures (from the managers point of view) need to be vastly different.
Add policy, resource allocation, and error flows you see that VDSM doesn't have nearly as much information to deal with the tasks.
----- Original Message -----
...
From: "Adam Litke" <agl@us.ibm.com>
To: "Saggi Mizrahi" <smizrahi@redhat.com>
Cc: "VDSM Project Development" <vdsm-devel@lists.fedorahosted.org>, "engine-devel" <engine-devel@ovirt.org>, "Ayal
Baron" <abaron@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, "Shireesh Anjal" <sanjal@redhat.com>
Sent: Tuesday, December 4, 2012 3:50:28 PM
Subject: Re: VDSM tasks, the future
On Tue, Dec 04, 2012 at 10:35:01AM -0500, Saggi Mizrahi wrote:
...
Because I started hinting about how VDSM tasks are going to look
going forward
I thought it's better I'll just write everything in an email so we
can talk
about it in context.  This is not set in stone and I'm still
debating things
myself but it's very close to being done.
Don't debate them yourself, debate them here!  Even better, propose
your idea in
schema form to show how a command might work exactly.
I don't like throwing ideas in the air
It can be much easier to understand the flow of a task in vdsm and 
outside vdsm by a small schema, mainly for the each task's states.
To define the flow of a task you can separate between type of tasks 
(network, storage, vms, or else), we should have task's states that 
clarify if the task can be recovered or not, can be canceled or not and 
inc..
Canceling\Aborting\Reverting states should be more clarified and not 
every state can lead to all types of states.
I tries to figure how task flow works today in vdsm, and this is what 
I've got - http://wiki.ovirt.org/Vdsm_tasks
...
...
...
- Everything is asynchronous.  The nature of message based
communication is
that you can't have synchronous operations.  This is not really
debatable
because it's just how TCP\AMQP\<messaging> works.
Can you show how a traditionally synchronous command might work?
  Let's take
Host.getVmList as an example.
The same as it works today, it's all a matter of how you wrap the transport layer.
You will send a json-rpc request and wait for a response with the same id.
As for the bindings, there are a lot of way we can tackle that.
Always wait for the response and simulate synchronous behavior.
Make every method return an object to track the task.
task = host.getVmList()
if not task.wait(1):
     task.cancel()
else:
     res = task.result()
It looks like traditional timeout.. why not to split blocking actions 
and non-blocking actions, non-blocking action will supply callback 
function to return to if the task fails or success. for example:
createAsyncTask(host.getVmList, params, timeout=30, callbackGetVmList)

Instead of using the dispatcher? Do you want to keep the dispatcher concept?
...
Have it both ways (it's auto generated anyway) and have
list = host.getVmList()
task = host.getVmList_async()
Have a high level and low level interfaces.
host = host()
host.connect("tcp://host:3233")
req = host.sendRequest("123213", "getVmList", [])
if not req.wait(1):
    ....
shost = SynchHost(host)
shost.getVmList() # Actually wraps a request object
ahost = AsyncHost(host)
task = getVmList() # Actually wraps a request object
...
...
- Task IDs will be decided by the caller.  This is how json-rpc
works and also
makes sense because no the engine can track the task without
needing to have a
stage where we give it the task ID back.  IDs are reusable as long
as no one
else is using them at the time so they can be used for
synchronizing
operations between clients (making sure a command is only executed
once on a
specific host without locking).
- Tasks are transient If VDSM restarts it forgets all the task
information.
There are 2 ways to have persistent tasks: 1. The task creates an
object that
you can continue work on in VDSM.  The new storage does that by the
fact that
copyImage() returns one the target volume has been created but
before the data
has been fully copied.  From that moment on the stat of the copy
can be
queried from any host using getImageStatus() and the specific copy
operation
can be queried with getTaskStatus() on the host performing it.
  After VDSM
crashes, depending on policy, either VDSM will create a new task to
continue
the copy or someone else will send a command to continue the
operation and
that will be a new task.  2. VDSM tasks just start other operations
track-able
not through the task interface. For example Gluster.
gluster.startVolumeRebalance() will return once it has been
registered with
Gluster.  glster.getOperationStatuses() will return the state of
the operation
from any host.  Each call is a task in itself.
I worry about this approach because every command has a different
semantic for
checking progress.  For migration, we have to check VM status on the
src and
dest hosts.  For image copy we need to use a special status call on
the dest
image.  It would be nice if there was a unified method for checking
on an
operation.  Maybe that can be completion events.
Client:               vdsm:
-------               -----
Image.copy(...)  -->
                  <--  Operation Started
Wait for event   ...
                  <--  Event: Operation <id> done <code>
For an early error:
Client:               vdsm:
-------               -----
Image.copy(...)  -->
                  <--  Error: <code>
The thing is that a lot of things need a different way of tracking their progress.
Storage have completely different semantics from network or VM operations.
This is the reason why we can use the implementation of task as 
something generic for all processes that we have.
of course things need different ways of tracking their progress... 
That's why we need to use task's states with meaning, and split 
storageTaskStates and networkTaskStates that inherit of TaskStates and 
add their parts as in the new bootstrap implementation.
Also we can add hooks for each state as alonbl did in his otopi code 
(not sure if we need that)
Like for instance: general states can be - starting, started, finishing, 
finished, and each specific implementation adds middle states. like 
waitForResource, processing, recovering and inc..
for each one you can add levels (pre state, post state) that can add 
more flexibility.

That way Task Object will be a general way to implement specific 
process, you will have a NetworkTask and StorageTask and the 
infrastructure will be the interface and implementation of the generic 
parts.

So here how vdsm can work that way:
client:            vsdm:
--------            ---------
image.copy() ---> copyImage::starting (same starting code - keeping the 
id, and move forward to next state)
                                copyImage::started  (waiting to recovery 
file that task is started)
                                copyImage::part1 (whatever you want to do)
                                copyImage::part2 (whatever you want to do)
                                copyImage::part3 (whatever you want to 
do)  -- for each process the programmers will add their states as they 
want in a sequence flow
result     <------      copyImage::finishing (send back to client a 
success and clean recovery file)
                                copyImage::finished (sign task id as 
succeeded)

If somewhere in the middle an error occurred, it easier to start over 
and remember where we were.
The problem with that is that we need to modify the current 
implementation for each process, and I'm not sure if we want to get 
there.. but if we do, it won't be so hard.
We can split the logic of each process to define a logic of each state, 
and then arranging the states flow for each process and clarify what can 
be recovered or not, what signs corruption or errors, and how the 
returned result can point of the current process status (\state)
...
...
...
- No task tags.  They are silly and the caller can mangle whatever
in the task
ID if he really wants to tag tasks.
Yes.  Agreed.
...
- No explicit recovery stage.  VDSM will be crash-only, there
should be
efforts to make everything crash-safe.  If that is problematic, in
case of
networking, VDSM will recover on start without having a task for
it.
How does this work in practice for something like creating a new
image from a
template?
...
- No clean Task: Tasks can be started by any number of hosts this
means that
there is no way to own all tasks.  There could be cases where VDSM
starts
tasks on it's own and thus they have no owner at all.  The caller
needs to
continually track the state of VDSM. We will have brodcasted events
to
mitigate polling.
If a disconnected client might have missed a completion event, it
will need to
check state.  This means each async operation that changes state must
document a
proceedure for checking progress of a potentially ongoing operation.
  For
Image.copy, that process would be to lookup the new image and check
its state.
...
- No revert Impossible to implement safely.
How do the engine folks feel about this?  I am ok with it :)
I don't care, unless they find a way to change they way logic works they can't have it.
The whole concept of recovery (as it is defined now) doesn't work in an HA cluster.
- No SPM\HSM tasks SPM\SDM is no longer necessary for all domain
types (only
for type).  What used to be SPM tasks, or tasks that persist and
can be
restarted on other hosts is talked about in previous bullet points.
A nice simplification.
--
Adam Litke <agl@us.ibm.com>
IBM Linux Technology Center
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
-- 
Yaniv Bronhaim.
RedHat, Israel
09-7692289
054-7744187

--------------070307080901000202010002
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 12/05/2012 12:20 AM, Saggi Mizrahi
      wrote:<br>
    </div>
    <blockquote
      cite="mid:170195820.13932156.1354659610794.JavaMail.root@redhat.com"
      type="cite">
      <pre wrap="">As the only subsystem to use asynchronous tasks until now is the storage subsystem I suggest going over how
I suggest tackling task creation, task stop, task remove and task recovery.
Other subsystem can create similar mechanisms depending on their needs.

There is no way of avoiding it, different types of tasks need different ways of tracking\recovering from them.
network should always auto-recover because it can't get a "please fix" command if the network is down.
Storage on the other hand should never start operations on it's own because it might take up valuable resources from the host.
Tasks that need to be tracked on a single host, 2 hosts, or the entire cluster need to have their own APIs.
VM configuration never persist across reboots, networking sometimes persists and storage always persists.
This means that recovery procedures (from the managers point of view) need to be vastly different.
Add policy, resource allocation, and error flows you see that VDSM doesn't have nearly as much information to deal with the tasks.

----- Original Message -----
</pre>
      <blockquote type="cite">
        <pre wrap="">From: "Adam Litke" <a class="moz-txt-link-rfc2396E" href="mailto:agl@us.ibm.com"><agl@us.ibm.com></a>
To: "Saggi Mizrahi" <a class="moz-txt-link-rfc2396E" href="mailto:smizrahi@redhat.com"><smizrahi@redhat.com></a>
Cc: "VDSM Project Development" <a class="moz-txt-link-rfc2396E" href="mailto:vdsm-devel@lists.fedorahosted.org"><vdsm-devel@lists.fedorahosted.org></a>, "engine-devel" <a class="moz-txt-link-rfc2396E" href="mailto:engine-devel@ovirt.org"><engine-devel@ovirt.org></a>, "Ayal
Baron" <a class="moz-txt-link-rfc2396E" href="mailto:abaron@redhat.com"><abaron@redhat.com></a>, "Barak Azulay" <a class="moz-txt-link-rfc2396E" href="mailto:bazulay@redhat.com"><bazulay@redhat.com></a>, "Shireesh Anjal" <a class="moz-txt-link-rfc2396E" href="mailto:sanjal@redhat.com"><sanjal@redhat.com></a>
Sent: Tuesday, December 4, 2012 3:50:28 PM
Subject: Re: VDSM tasks, the future

On Tue, Dec 04, 2012 at 10:35:01AM -0500, Saggi Mizrahi wrote:
</pre>
        <blockquote type="cite">
          <pre wrap="">Because I started hinting about how VDSM tasks are going to look
going forward
I thought it's better I'll just write everything in an email so we
can talk
about it in context.  This is not set in stone and I'm still
debating things
myself but it's very close to being done.
</pre>
        </blockquote>
        <pre wrap="">
Don't debate them yourself, debate them here!  Even better, propose
your idea in
schema form to show how a command might work exactly.
</pre>
      </blockquote>
      <pre wrap="">I don't like throwing ideas in the air</pre>
    </blockquote>
    It can be much easier to understand the flow of a task in vdsm and
    outside vdsm by a small schema, mainly for the each task's states. <br>
    To define the flow of a task you can separate between type of tasks
    (network, storage, vms, or else), we should have task's states that
    clarify if the task can be recovered or not, can be canceled or not
    and inc..<br>
    <br>
    Canceling\Aborting\Reverting states should be more clarified and not
    every state can lead to all types of states.<br>
    I tries to figure how task flow works today in vdsm, and this is
    what I've got -
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <a href="http://wiki.ovirt.org/Vdsm_tasks">http://wiki.ovirt.org/Vdsm_tasks</a><br>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <blockquote
      cite="mid:170195820.13932156.1354659610794.JavaMail.root@redhat.com"
      type="cite">
      <pre wrap="">
</pre>
      <blockquote type="cite">
        <pre wrap="">
</pre>
        <blockquote type="cite">
          <pre wrap="">- Everything is asynchronous.  The nature of message based
communication is
that you can't have synchronous operations.  This is not really
debatable
because it's just how TCP\AMQP\<messaging> works.
</pre>
        </blockquote>
        <pre wrap="">
Can you show how a traditionally synchronous command might work?
 Let's take
Host.getVmList as an example.
</pre>
      </blockquote>
      <pre wrap="">The same as it works today, it's all a matter of how you wrap the transport layer.
You will send a json-rpc request and wait for a response with the same id.

As for the bindings, there are a lot of way we can tackle that.
Always wait for the response and simulate synchronous behavior.
Make every method return an object to track the task.
task = host.getVmList()
if not task.wait(1):
    task.cancel()
else:
    res = task.result()
</pre>
    </blockquote>
    It looks like traditional timeout.. why not to split blocking
    actions and non-blocking actions, non-blocking action will supply
    callback function to return to if the task fails or success. for
    example:<br>
    <br>
    createAsyncTask(host.getVmList, params, timeout=30,
    callbackGetVmList)<br>
    <br>
    Instead of using the dispatcher? Do you want to keep the dispatcher
    concept?<br>
    <br>
    <blockquote
      cite="mid:170195820.13932156.1354659610794.JavaMail.root@redhat.com"
      type="cite">
      <pre wrap="">
Have it both ways (it's auto generated anyway) and have
list = host.getVmList()
task = host.getVmList_async()

Have a high level and low level interfaces.
host = host()
host.connect("tcp://host:3233")
req = host.sendRequest("123213", "getVmList", [])
if not req.wait(1):
   ....

shost = SynchHost(host)
shost.getVmList() # Actually wraps a request object
ahost = AsyncHost(host)
task = getVmList() # Actually wraps a request object
</pre>
      <blockquote type="cite">
        <pre wrap="">
</pre>
        <blockquote type="cite">
          <pre wrap="">- Task IDs will be decided by the caller.  This is how json-rpc
works and also
makes sense because no the engine can track the task without
needing to have a
stage where we give it the task ID back.  IDs are reusable as long
as no one
else is using them at the time so they can be used for
synchronizing
operations between clients (making sure a command is only executed
once on a
specific host without locking).

- Tasks are transient If VDSM restarts it forgets all the task
information.
There are 2 ways to have persistent tasks: 1. The task creates an
object that
you can continue work on in VDSM.  The new storage does that by the
fact that
copyImage() returns one the target volume has been created but
before the data
has been fully copied.  From that moment on the stat of the copy
can be
queried from any host using getImageStatus() and the specific copy
operation
can be queried with getTaskStatus() on the host performing it.
 After VDSM
crashes, depending on policy, either VDSM will create a new task to
continue
the copy or someone else will send a command to continue the
operation and
that will be a new task.  2. VDSM tasks just start other operations
track-able
not through the task interface. For example Gluster.
gluster.startVolumeRebalance() will return once it has been
registered with
Gluster.  glster.getOperationStatuses() will return the state of
the operation
from any host.  Each call is a task in itself.
</pre>
        </blockquote>
        <pre wrap="">
I worry about this approach because every command has a different
semantic for
checking progress.  For migration, we have to check VM status on the
src and
dest hosts.  For image copy we need to use a special status call on
the dest
image.  It would be nice if there was a unified method for checking
on an
operation.  Maybe that can be completion events.

Client:               vdsm:
-------               -----

Image.copy(...)  -->
                 <--  Operation Started
Wait for event   ...
                 <--  Event: Operation <id> done <code>

For an early error:

Client:               vdsm:
-------               -----

Image.copy(...)  -->
                 <--  Error: <code>

</pre>
      </blockquote>
      <pre wrap="">The thing is that a lot of things need a different way of tracking their progress.
Storage have completely different semantics from network or VM operations.</pre>
    </blockquote>
    This is the reason why we can use the implementation of task as
    something generic for all processes that we have.<br>
    of course things need different ways of tracking their progress...
    That's why we need to use task's states with meaning, and split
    storageTaskStates and networkTaskStates that inherit of TaskStates
    and add their parts as in the new bootstrap implementation.<br>
    Also we can add hooks for each state as alonbl did in his otopi code
    (not sure if we need that)<br>
    <br>
    Like for instance: general states can be - starting, started,
    finishing, finished, and each specific implementation adds middle
    states. like waitForResource, processing, recovering and inc.. <br>
    for each one you can add levels (pre state, post state) that can add
    more flexibility.<br>
    <br>
    That way Task Object will be a general way to implement specific
    process, you will have a NetworkTask and StorageTask and the
    infrastructure will be the interface and implementation of the
    generic parts.<br>
    <br>
    So here how vdsm can work that way:<br>
    client:            vsdm:<br>
    --------            ---------<br>
    image.copy() ---> copyImage::starting (same starting code -
    keeping the id, and move forward to next state)<br>
                                   copyImage::started  (waiting to
    recovery file that task is started)<br>
                                   copyImage::part1 (whatever you want
    to do)<br>
                                   copyImage::part2 (whatever you want
    to do)<br>
                                   copyImage::part3 (whatever you want
    to do)  -- for each process the programmers will add their states as
    they want in a sequence flow<br>
    result     <------      copyImage::finishing (send back to client
    a success and clean recovery file)<br>
                                   copyImage::finished (sign task id as
    succeeded)<br>
    <br>
    If somewhere in the middle an error occurred, it easier to start
    over and remember where we were.<br>
    The problem with that is that we need to modify the current
    implementation for each process, and I'm not sure if we want to get
    there.. but if we do, it won't be so hard.<br>
    We can split the logic of each process to define a logic of each
    state, and then arranging the states flow for each process and
    clarify what can be recovered or not, what signs corruption or
    errors, and how the returned result can point of the current process
    status (\state)<br>
                                    <br>
    <blockquote
      cite="mid:170195820.13932156.1354659610794.JavaMail.root@redhat.com"
      type="cite">
      <pre wrap="">
</pre>
      <blockquote type="cite">
        <blockquote type="cite">
          <pre wrap="">- No task tags.  They are silly and the caller can mangle whatever
in the task
ID if he really wants to tag tasks.
</pre>
        </blockquote>
        <pre wrap="">
Yes.  Agreed.

</pre>
        <blockquote type="cite">
          <pre wrap="">- No explicit recovery stage.  VDSM will be crash-only, there
should be
efforts to make everything crash-safe.  If that is problematic, in
case of
networking, VDSM will recover on start without having a task for
it.
</pre>
        </blockquote>
        <pre wrap="">
How does this work in practice for something like creating a new
image from a
template?

</pre>
        <blockquote type="cite">
          <pre wrap="">- No clean Task: Tasks can be started by any number of hosts this
means that
there is no way to own all tasks.  There could be cases where VDSM
starts
tasks on it's own and thus they have no owner at all.  The caller
needs to
continually track the state of VDSM. We will have brodcasted events
to
mitigate polling.
</pre>
        </blockquote>
        <pre wrap="">
If a disconnected client might have missed a completion event, it
will need to
check state.  This means each async operation that changes state must
document a
proceedure for checking progress of a potentially ongoing operation.
 For
Image.copy, that process would be to lookup the new image and check
its state.

</pre>
        <blockquote type="cite">
          <pre wrap="">- No revert Impossible to implement safely.
</pre>
        </blockquote>
        <pre wrap="">
How do the engine folks feel about this?  I am ok with it :)
</pre>
      </blockquote>
      <pre wrap="">I don't care, unless they find a way to change they way logic works they can't have it.
The whole concept of recovery (as it is defined now) doesn't work in an HA cluster.
</pre>
      <blockquote type="cite">
        <pre wrap="">
</pre>
        <blockquote type="cite">
          <pre wrap="">- No SPM\HSM tasks SPM\SDM is no longer necessary for all domain
types (only
for type).  What used to be SPM tasks, or tasks that persist and
can be
restarted on other hosts is talked about in previous bullet points.

</pre>
        </blockquote>
        <pre wrap="">A nice simplification.

--
Adam Litke <a class="moz-txt-link-rfc2396E" href="mailto:agl@us.ibm.com"><agl@us.ibm.com></a>
IBM Linux Technology Center

</pre>
      </blockquote>
      <pre wrap="">_______________________________________________
vdsm-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:vdsm-devel@lists.fedorahosted.org">vdsm-devel@lists.fedorahosted.org</a>
<a class="moz-txt-link-freetext" href="https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel">https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel</a>
</pre>
    </blockquote>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Yaniv Bronhaim.
RedHat, Israel 
09-7692289
054-7744187</pre>
  </body>
</html>

--------------070307080901000202010002--

Re: [Engine-devel] [vdsm] VDSM tasks, the future

ybronhei