----- Original Message -----
From: "Ayal Baron" <abaron(a)redhat.com>
To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
Cc: "Dan Kenigsberg" <danken(a)redhat.com>, "Federico Simoncelli"
<fsimonce(a)redhat.com>, engine-devel(a)ovirt.org,
vdsm-devel(a)lists.fedorahosted.org, "Adam Litke" <agl(a)us.ibm.com>
Sent: Monday, December 17, 2012 5:24:48 PM
Subject: Re: Managing async tasks
----- Original Message -----
> This is an addendum to my previous email.
>
> ----- Original Message -----
> > From: "Saggi Mizrahi" <smizrahi(a)redhat.com>
> > To: "Adam Litke" <agl(a)us.ibm.com>
> > Cc: "Dan Kenigsberg" <danken(a)redhat.com>, "Ayal
Baron"
> > <abaron(a)redhat.com>, "Federico Simoncelli"
> > <fsimonce(a)redhat.com>, engine-devel(a)ovirt.org,
> > vdsm-devel(a)lists.fedorahosted.org
> > Sent: Monday, December 17, 2012 2:52:06 PM
> > Subject: Re: Managing async tasks
> >
> >
> >
> > ----- Original Message -----
> > > From: "Adam Litke" <agl(a)us.ibm.com>
> > > To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
> > > Cc: "Dan Kenigsberg" <danken(a)redhat.com>, "Ayal
Baron"
> > > <abaron(a)redhat.com>, "Federico Simoncelli"
> > > <fsimonce(a)redhat.com>, engine-devel(a)ovirt.org,
> > > vdsm-devel(a)lists.fedorahosted.org
> > > Sent: Monday, December 17, 2012 2:16:25 PM
> > > Subject: Re: Managing async tasks
> > >
> > > On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote:
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Adam Litke" <agl(a)us.ibm.com> To:
> > > > > vdsm-devel(a)lists.fedorahosted.org
> > > > > Cc: "Dan Kenigsberg" <danken(a)redhat.com>,
"Ayal Baron"
> > > > > <abaron(a)redhat.com>,
> > > > > "Saggi Mizrahi" <smizrahi(a)redhat.com>,
"Federico
> > > > > Simoncelli"
> > > > > <fsimonce(a)redhat.com>, engine-devel(a)ovirt.org Sent:
Monday,
> > > > > December 17,
> > > > > 2012 12:00:49 PM Subject: Managing async tasks
> > > > >
> > > > > On today's vdsm call we had a lively discussion around how
> > > > > asynchronous
> > > > > operations should be handled in the future. In an effort
> > > > > to
> > > > > include more
> > > > > people in the discussion and to better capture the
> > > > > resulting
> > > > > conversation I
> > > > > would like to continue that discussion here on the mailing
> > > > > list.
> > > > >
> > > > > A lot of ideas were thrown around about how 'tasks'
should
> > > > > be
> > > > > handled in the
> > > > > future. There are a lot of ways that it can be done. To
> > > > > determine how we
> > > > > should implement it, it's probably best if we start with a
> > > > > set
> > > > > of
> > > > > requirements. If we can first agree on these, it should be
> > > > > easy
> > > > > to find a
> > > > > solution that meets them. I'll take a stab at identifying
> > > > > a
> > > > > first set of
> > > > > POSSIBLE requirements:
> > > > >
> > > > > - Standardized method for determining the result of an
> > > > > operation
> > > > >
> > > > > This is a big one for me because it directly affects the
> > > > > consumability of
> > > > > the API. If each verb has different semantics for
> > > > > discovering
> > > > > whether it
> > > > > has completed successfully, then the API will be nearly
> > > > > impossible to use
> > > > > easily.
> > > > Since there is no way to assure if of some tasks completed
> > > > successfully or
> > > > failed, especially around the murky waters of storage, I say
> > > > this
> > > > requirement
> > > > should be removed. At least not in the context of a task.
> > >
> > > I don't agree. Please feel free to convince me with some
> > > exampled.
> > > If we
> > > cannot provide feedback to a user as to whether their request
> > > has
> > > been satisfied
> > > or not, then we have some bigger problems to solve.
> > If VDSM sends a write command to a storage server, and the
> > connection
> > hangs up before the ACK has returned.
> > The operation has been committed but VDSM has no way of knowing
> > if
> > that happened as far as VDSM is concerned it got an ETIMEO or
> > EIO.
> > This is the same problem that the engine has with VDSM.
> > If VDSM creates an image\VM\network\repo but the connection hangs
> > up
> > before the response can be sent back as far as the engine is
> > concerned the operation times out.
> > This is an inherent issue with clustering.
> > This is why I want to move away from tasks being *the* trackable
> > objects.
> > Tasks should be short. As short as possible.
> > Run VM should just persist the VM information on the VDSM host
> > and
> > return. The rest of the tracking should be done using the VM ID.
> > Create image should return once VDSM persisted the information
> > about
> > the request on the repository and created the metadata files.
> > Tracking should be done on the repo or the imageId.
>
> The thing is that I know how long a VM object should live (or an
> Image object).
> So tracking it is straight forward. How long a task should live is
> very problematic and quite context specific.
> It depends on what the task is.
> I think it's quite confusing from an API standpoint to have every
> task have a different scope, id requirement and life-cycle.
>
> In VDSM has two types of APIs
>
> CRUD objects - VM, Image, Repository, Bridge, Storage
> Connections....
> General transient methods - getBiosInfo(), getDeviceList()
>
> The latter are quite simple to manage. They don't need any special
> handling. If you lost a getBiosInfo() call you just send another
> one, no harm done.
> The same is even true with things that "change" the host like
> getDeviceList()
>
> What we are really arguing about is fitting the CRUD objects to
> some
> generic task oriented scheme.
> I'm saying it's a waste of time as you can quite easily have flows
> to
> recover from each operation.
>
> Create - Check if the object exists
> Read - Read again
> Update - either update again or read and update if update didn't
> commit the first time
> Delete - Check if object doesn't exist
>
> Each of the objects we CRUD have different life-cycles and
> ownership
> semantics.
>
> Danken raised the point that creation has a problem that if it
> fails
> there is no way to get why it failed.
> This is why Create method should be minimal. They shouldn't create
> the object just the entry in the respective persistent storage.
> Even now storage connections are persisted to disk and then the
> operation returns and the user polls to see the state of the
> connection.
> The same should be done for everything. Do the minimum required to
> create the object entry and mark it as "not usable".
> For storage connections it's "connecting"
> For VMs it's "preparing for launch"
> For new images it's "broken" and in some regards "degraded"
>
> I hope this makes things clearer
Saggi,
When running an async operation (not task, operation), I want an
indication of when it finishes. This can be either an event sent to
me or via polling or by divine intervention, but this is basic
information that is required.
Polling for a specific end state is wrong because there can be
multiple end states (success, failure 1, failure 2, maybe even
multiple options for success, etc).
From the call I get the feeling that you do support having this just
not having it persisted across restarts of the service?
If so, then let's discuss the semantics of what can be reported while
vdsm doesn't crash.
If not then in addition to not agreeing with you on this, I have
additional problems. For example, when an operation ends with
failure, it is insufficient to know that it failed. I want to know
*why* it failed. Without changing something there is no reason to
believe that trying again would succeed. Without indication of
reason of failure I'd just be shooting in the dark.
To keep the discussion focused I will stop here to let you comment.
I agree, I sent
a different email with my suggestion on how to solve all
these problems.
>
>
> > >
> > > > >
> > > > >
> > > > > Sorry. That's my list :) Hopefully others will be willing
> > > > > to
> > > > > add other
> > > > > requirements for consideration.
> > > > >
> > > > > From my understanding, task recovery (stop, abort,
> > > > > rollback,
> > > > > etc)
> > > > > will not
> > > > > be generally supported and should not be a requirement.
> > > > >
> > >
> > > --
> > > Adam Litke <agl(a)us.ibm.com>
> > > IBM Linux Technology Center
> > >
> > >
> >
>