Re: [Engine-devel] Task cancelation feature

4 Dec 2012


      
----- Original Message -----
> From: "Dan Kenigsberg" <danken@redhat.com>
> To: "Saggi Mizrahi" <smizrahi@redhat.com>
> Cc: "Michael Kublin" <mkublin@redhat.com>, "engine-devel" <engine-devel@ovirt.org>
> Sent: Tuesday, December 4, 2012 4:38:13 AM
> Subject: Re: [Engine-devel] Task cancelation feature
> 
> On Mon, Dec 03, 2012 at 12:15:07PM -0500, Saggi Mizrahi wrote:
> > VDSM tasks are changing to something completely different.
> > It's still under discussion but the general direction is that:
> > - TaskIDs will be decided by the caller.
> > - VDSM can start tasks on it's own
> > - There will be no distinction between async tasks and sync tasks.
> > Everything is always async.
> > - There will be no cleanTask() when tasks are done they return
> > result to the caller and disappear immediately.
> 
> I'm not sure I understand the motivation for the latter change. I
> kinda
> like the unix process semantics, were the return code of a process is
> kept with its id after the process ends, until the process parent
> calls
> wait(2). Otherwise, how can the caller tell why its task has failed?
> 
> For example, I'd like to see vmCreate using async tasks like that.
> vmCreate returns immediately, and a vdsm task is tracking the vm
> creation. If something bad happens, the information about the failure
> can be polled by the Engine that created the vm (or a new Engine
> instance, after an Engine crash).
You can have clean task only when there is a clear owner for a task.
This is not the case because it has been decided VDSM can be used by more then one client.
I know that people kind of like how convenient it is but it's problematic from the same reason that the unix method is problematic.
- If you die you can no longer track the task.
- If the kid dies and restarts itself you no longer can track the task.
- If the task is started by another process you can no longer track it.
- If you happen to die and not restart all your tasks stay there until someone clears them and it's impossible to know for sure when that is safe.

We already have issues because this semantic, doesn't hold up for a lot of the error cases. It doesn't hold up at all when there are more then one task instigators.
Because in TCP and and AMQP you are guaranteed to get the result message as long as you are connected to the bus you will always know the result unless you or VDSM crashed.
When you restart, you have to write code to sync up in case something happen while you where down anyway.
When VDSM restart\errors you need to write code to check how the task ended anyway because even "failed" tasks can finish successfully.
Also, in the new storage, VDSM restarts usually don't fail big operations and the can continue even on another host.
(We send the commit write and the storage fails just after the request was sent but we didn't get the write ack. The task is committed but it failed on timeout.)
> 
> Similarily, we may need to make setupNetwork asynchronous, since we
> depend on dhclient, which may take a lot of time to finish.
As I said, everything *is* asynchronous. It's impossible to have synchronous calls because you can't have synchronous calls on AMQP\TCP.
Because the engine is responsible for generating the ID there is no need for the silly 2 step approach we have now.
> 
> Have these future use cases been debated?
Yes, If you convince everyone to move away from message based communication and multiplexing requests on the same TCP connection (or go back to only HTTP) you can get synchronous tasks back.
If you can find a way to make "clean task" make sense and work then you can have that back too.
I think that after thinking about it you'll realize that considering all the possible error flows, sync-up on failure is the most robust and flexible way to go.
Further more, I doubt you can find a system where you never need to sync-up on the state of VDSM so might as well make that the only flow.
> 
> Dan.
> > 
> > Also, some stuff you consider tasks will no longer be tasks any
> > more.
> > For instance, copying and image will finish successfully once VDSM
> > registers the operation for with the storage subsystem and creates
> > the image handle.
> > After that the status of the copy is bound to the status of the new
> > image and is tracked that way.
> > This means that the thing you track when you do copyImage() is
> > actually the creation of the image handle and the metadata to make
> > it usable.
> > After that is done any host can query the state of the new image by
> > using the image ID and not the task Id which was deprecated.
> > This will be true for all storage operations.
> > 
> > 
> > ----- Original Message -----
> > > From: "Michael Kublin" <mkublin@redhat.com>
> > > To: "engine-devel" <engine-devel@ovirt.org>
> > > Sent: Monday, December 3, 2012 4:19:48 AM
> > > Subject: [Engine-devel] Task cancelation feature
> > > 
> > > Hi, I created a wiki page with design of task cancellation
> > > feature.
> > > The url is : http://www.ovirt.org/Features/TaskManagerCancelTask
> > > I can not call these design, I have not any requirements , except
> > > a
> > > name of the feature,
> > > so my wiki doesn't contains anything except open questions.
> > > Also, I think that it is impossible to make a good feature based
> > > on
> > > very problematic infrastructure,
> > > I think before we should fix all our infrastructure problems, and
> > > after that to add any cancellation task
> > > feature will be a meter of couple hours of work
> > > 
> > > Regards Michael
> > > _______________________________________________
> > > Engine-devel mailing list
> > > Engine-devel@ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/engine-devel
> > > 
> > _______________________________________________
> > Engine-devel mailing list
> > Engine-devel@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/engine-devel
>