[Engine-devel] Task cancelation feature

Hi, I created a wiki page with design of task cancellation feature. The url is : http://www.ovirt.org/Features/TaskManagerCancelTask I can not call these design, I have not any requirements , except a name of the feature, so my wiki doesn't contains anything except open questions. Also, I think that it is impossible to make a good feature based on very problematic infrastructure, I think before we should fix all our infrastructure problems, and after that to add any cancellation task feature will be a meter of couple hours of work Regards Michael

VDSM tasks are changing to something completely different. It's still under discussion but the general direction is that: - TaskIDs will be decided by the caller. - VDSM can start tasks on it's own - There will be no distinction between async tasks and sync tasks. Everything is always async. - There will be no cleanTask() when tasks are done they return result to the caller and disappear immediately. Also, some stuff you consider tasks will no longer be tasks any more. For instance, copying and image will finish successfully once VDSM registers the operation for with the storage subsystem and creates the image handle. After that the status of the copy is bound to the status of the new image and is tracked that way. This means that the thing you track when you do copyImage() is actually the creation of the image handle and the metadata to make it usable. After that is done any host can query the state of the new image by using the image ID and not the task Id which was deprecated. This will be true for all storage operations. ----- Original Message -----
From: "Michael Kublin" <mkublin@redhat.com> To: "engine-devel" <engine-devel@ovirt.org> Sent: Monday, December 3, 2012 4:19:48 AM Subject: [Engine-devel] Task cancelation feature
Hi, I created a wiki page with design of task cancellation feature. The url is : http://www.ovirt.org/Features/TaskManagerCancelTask I can not call these design, I have not any requirements , except a name of the feature, so my wiki doesn't contains anything except open questions. Also, I think that it is impossible to make a good feature based on very problematic infrastructure, I think before we should fix all our infrastructure problems, and after that to add any cancellation task feature will be a meter of couple hours of work
Regards Michael _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

I just noticed I only implied the implications to task cancellation but didn't give a concrete example. For storage at least, there will be a difference between canceling for now (hibernating) and canceling completely. When you will cancel a copy VDSM task, for example, what will happen is that VDSM will stop the current copy but persist some information so it can continue the operation on a later time. This means you are actually "hibernating" the copy operation. To actually cancel the copy completely you need to stop any operation on the target image and remove it. This means that relevant vdsm task IDs can change in the middle of the operation and that one task in the engine can be N sequential tasks in VDSM. It also means that some operations can be "hibernated" and some can't. ----- Original Message -----
From: "Saggi Mizrahi" <smizrahi@redhat.com> To: "Michael Kublin" <mkublin@redhat.com> Cc: "engine-devel" <engine-devel@ovirt.org> Sent: Monday, December 3, 2012 12:15:07 PM Subject: Re: [Engine-devel] Task cancelation feature
VDSM tasks are changing to something completely different. It's still under discussion but the general direction is that: - TaskIDs will be decided by the caller. - VDSM can start tasks on it's own - There will be no distinction between async tasks and sync tasks. Everything is always async. - There will be no cleanTask() when tasks are done they return result to the caller and disappear immediately.
Also, some stuff you consider tasks will no longer be tasks any more. For instance, copying and image will finish successfully once VDSM registers the operation for with the storage subsystem and creates the image handle. After that the status of the copy is bound to the status of the new image and is tracked that way. This means that the thing you track when you do copyImage() is actually the creation of the image handle and the metadata to make it usable. After that is done any host can query the state of the new image by using the image ID and not the task Id which was deprecated. This will be true for all storage operations.
----- Original Message -----
From: "Michael Kublin" <mkublin@redhat.com> To: "engine-devel" <engine-devel@ovirt.org> Sent: Monday, December 3, 2012 4:19:48 AM Subject: [Engine-devel] Task cancelation feature
Hi, I created a wiki page with design of task cancellation feature. The url is : http://www.ovirt.org/Features/TaskManagerCancelTask I can not call these design, I have not any requirements , except a name of the feature, so my wiki doesn't contains anything except open questions. Also, I think that it is impossible to make a good feature based on very problematic infrastructure, I think before we should fix all our infrastructure problems, and after that to add any cancellation task feature will be a meter of couple hours of work
Regards Michael _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

On Mon, Dec 03, 2012 at 12:15:07PM -0500, Saggi Mizrahi wrote:
VDSM tasks are changing to something completely different. It's still under discussion but the general direction is that: - TaskIDs will be decided by the caller. - VDSM can start tasks on it's own - There will be no distinction between async tasks and sync tasks. Everything is always async. - There will be no cleanTask() when tasks are done they return result to the caller and disappear immediately.
I'm not sure I understand the motivation for the latter change. I kinda like the unix process semantics, were the return code of a process is kept with its id after the process ends, until the process parent calls wait(2). Otherwise, how can the caller tell why its task has failed? For example, I'd like to see vmCreate using async tasks like that. vmCreate returns immediately, and a vdsm task is tracking the vm creation. If something bad happens, the information about the failure can be polled by the Engine that created the vm (or a new Engine instance, after an Engine crash). Similarily, we may need to make setupNetwork asynchronous, since we depend on dhclient, which may take a lot of time to finish. Have these future use cases been debated? Dan.
Also, some stuff you consider tasks will no longer be tasks any more. For instance, copying and image will finish successfully once VDSM registers the operation for with the storage subsystem and creates the image handle. After that the status of the copy is bound to the status of the new image and is tracked that way. This means that the thing you track when you do copyImage() is actually the creation of the image handle and the metadata to make it usable. After that is done any host can query the state of the new image by using the image ID and not the task Id which was deprecated. This will be true for all storage operations.
----- Original Message -----
From: "Michael Kublin" <mkublin@redhat.com> To: "engine-devel" <engine-devel@ovirt.org> Sent: Monday, December 3, 2012 4:19:48 AM Subject: [Engine-devel] Task cancelation feature
Hi, I created a wiki page with design of task cancellation feature. The url is : http://www.ovirt.org/Features/TaskManagerCancelTask I can not call these design, I have not any requirements , except a name of the feature, so my wiki doesn't contains anything except open questions. Also, I think that it is impossible to make a good feature based on very problematic infrastructure, I think before we should fix all our infrastructure problems, and after that to add any cancellation task feature will be a meter of couple hours of work
Regards Michael _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com> Cc: "engine-devel" <engine-devel@ovirt.org> Sent: Tuesday, December 4, 2012 3:08:13 PM Subject: Re: [Engine-devel] Task cancelation feature
On Mon, Dec 03, 2012 at 12:15:07PM -0500, Saggi Mizrahi wrote:
VDSM tasks are changing to something completely different. It's still under discussion but the general direction is that: - TaskIDs will be decided by the caller. - VDSM can start tasks on it's own - There will be no distinction between async tasks and sync tasks. Everything is always async. - There will be no cleanTask() when tasks are done they return result to the caller and disappear immediately.
I'm not sure I understand the motivation for the latter change. I kinda like the unix process semantics, were the return code of a process is kept with its id after the process ends, until the process parent calls wait(2). Otherwise, how can the caller tell why its task has failed?
For example, I'd like to see vmCreate using async tasks like that. vmCreate returns immediately, and a vdsm task is tracking the vm creation. If something bad happens, the information about the failure can be polled by the Engine that created the vm (or a new Engine instance, after an Engine crash).
Similarily, we may need to make setupNetwork asynchronous, since we depend on dhclient, which may take a lot of time to finish.
Have these future use cases been debated?
There are also a few requirements from gluster perspective that may need enhancements in vdsm as well as engine. - Tasks are created and managed by glusterfs and not by vdsm. - Gluster tasks are not bound to a particular host, but are cluster-wide, and their status can be checked from any of the hosts of the cluster. - Concept of SPM does not come into picture in case of gluster clusters/hosts - Apart from starting, aborting and checking status, some of the gluster tasks support additional actions like pause, resume and commit. Based on some of the telephonic conversations with maintainers of engine and vdsm, we were planning to enhance the existing task management as follows: - Enhance the getAllTasks verb in vdsm to accept one or more tags for filtering tasks. (http://gerrit.ovirt.org/7579) - Currently all tasks created in vdsm through requests from engine have the 'spm' tag as they are SPM tasks. - Introduce a new field, say task_target in engine, which indicates what kind of a task it is. Possible values: - SPM (SPM tasks) - CLUSTER (cluster-wide tasks e.g. gluster tasks) - HOST (tasks specific to a particular host e.g. format a disk on a host) - Enhance the async task manager in engine, to fetch details of all types of tasks, by sending appropriate tags to the new getAllTasks verb. (At present it fetches only SPM tasks) - Once all the tasks are fetched, rest of the processing (updating status) remains same as before I would like to know whether we should stop working on above approach in case the new design is coming up immediately. If yes, we should make sure that the new design is capable of handling the gluster tasks as well. In case it is too far in the future, 1) We will start working with above approach. Any comments/suggestions/concerns are welcome. 2) Instead of focusing on just 'cancellation', we should try and come up with a more generic approach which can help in easily supporting more actions like pause, resume, commit, etc. (I haven't yet gone through the feature page, and please pardon me if this is already taken care)
Dan.
Also, some stuff you consider tasks will no longer be tasks any more. For instance, copying and image will finish successfully once VDSM registers the operation for with the storage subsystem and creates the image handle. After that the status of the copy is bound to the status of the new image and is tracked that way. This means that the thing you track when you do copyImage() is actually the creation of the image handle and the metadata to make it usable. After that is done any host can query the state of the new image by using the image ID and not the task Id which was deprecated. This will be true for all storage operations.
----- Original Message -----
From: "Michael Kublin" <mkublin@redhat.com> To: "engine-devel" <engine-devel@ovirt.org> Sent: Monday, December 3, 2012 4:19:48 AM Subject: [Engine-devel] Task cancelation feature
Hi, I created a wiki page with design of task cancellation feature. The url is : http://www.ovirt.org/Features/TaskManagerCancelTask I can not call these design, I have not any requirements , except a name of the feature, so my wiki doesn't contains anything except open questions. Also, I think that it is impossible to make a good feature based on very problematic infrastructure, I think before we should fix all our infrastructure problems, and after that to add any cancellation task feature will be a meter of couple hours of work
Regards Michael _______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel

----- Original Message ----- > From: "Dan Kenigsberg" <danken@redhat.com> > To: "Saggi Mizrahi" <smizrahi@redhat.com> > Cc: "Michael Kublin" <mkublin@redhat.com>, "engine-devel" <engine-devel@ovirt.org> > Sent: Tuesday, December 4, 2012 4:38:13 AM > Subject: Re: [Engine-devel] Task cancelation feature > > On Mon, Dec 03, 2012 at 12:15:07PM -0500, Saggi Mizrahi wrote: > > VDSM tasks are changing to something completely different. > > It's still under discussion but the general direction is that: > > - TaskIDs will be decided by the caller. > > - VDSM can start tasks on it's own > > - There will be no distinction between async tasks and sync tasks. > > Everything is always async. > > - There will be no cleanTask() when tasks are done they return > > result to the caller and disappear immediately. > > I'm not sure I understand the motivation for the latter change. I > kinda > like the unix process semantics, were the return code of a process is > kept with its id after the process ends, until the process parent > calls > wait(2). Otherwise, how can the caller tell why its task has failed? > > For example, I'd like to see vmCreate using async tasks like that. > vmCreate returns immediately, and a vdsm task is tracking the vm > creation. If something bad happens, the information about the failure > can be polled by the Engine that created the vm (or a new Engine > instance, after an Engine crash). You can have clean task only when there is a clear owner for a task. This is not the case because it has been decided VDSM can be used by more then one client. I know that people kind of like how convenient it is but it's problematic from the same reason that the unix method is problematic. - If you die you can no longer track the task. - If the kid dies and restarts itself you no longer can track the task. - If the task is started by another process you can no longer track it. - If you happen to die and not restart all your tasks stay there until someone clears them and it's impossible to know for sure when that is safe. We already have issues because this semantic, doesn't hold up for a lot of the error cases. It doesn't hold up at all when there are more then one task instigators. Because in TCP and and AMQP you are guaranteed to get the result message as long as you are connected to the bus you will always know the result unless you or VDSM crashed. When you restart, you have to write code to sync up in case something happen while you where down anyway. When VDSM restart\errors you need to write code to check how the task ended anyway because even "failed" tasks can finish successfully. Also, in the new storage, VDSM restarts usually don't fail big operations and the can continue even on another host. (We send the commit write and the storage fails just after the request was sent but we didn't get the write ack. The task is committed but it failed on timeout.) > > Similarily, we may need to make setupNetwork asynchronous, since we > depend on dhclient, which may take a lot of time to finish. As I said, everything *is* asynchronous. It's impossible to have synchronous calls because you can't have synchronous calls on AMQP\TCP. Because the engine is responsible for generating the ID there is no need for the silly 2 step approach we have now. > > Have these future use cases been debated? Yes, If you convince everyone to move away from message based communication and multiplexing requests on the same TCP connection (or go back to only HTTP) you can get synchronous tasks back. If you can find a way to make "clean task" make sense and work then you can have that back too. I think that after thinking about it you'll realize that considering all the possible error flows, sync-up on failure is the most robust and flexible way to go. Further more, I doubt you can find a system where you never need to sync-up on the state of VDSM so might as well make that the only flow. > > Dan. > > > > Also, some stuff you consider tasks will no longer be tasks any > > more. > > For instance, copying and image will finish successfully once VDSM > > registers the operation for with the storage subsystem and creates > > the image handle. > > After that the status of the copy is bound to the status of the new > > image and is tracked that way. > > This means that the thing you track when you do copyImage() is > > actually the creation of the image handle and the metadata to make > > it usable. > > After that is done any host can query the state of the new image by > > using the image ID and not the task Id which was deprecated. > > This will be true for all storage operations. > > > > > > ----- Original Message ----- > > > From: "Michael Kublin" <mkublin@redhat.com> > > > To: "engine-devel" <engine-devel@ovirt.org> > > > Sent: Monday, December 3, 2012 4:19:48 AM > > > Subject: [Engine-devel] Task cancelation feature > > > > > > Hi, I created a wiki page with design of task cancellation > > > feature. > > > The url is : http://www.ovirt.org/Features/TaskManagerCancelTask > > > I can not call these design, I have not any requirements , except > > > a > > > name of the feature, > > > so my wiki doesn't contains anything except open questions. > > > Also, I think that it is impossible to make a good feature based > > > on > > > very problematic infrastructure, > > > I think before we should fix all our infrastructure problems, and > > > after that to add any cancellation task > > > feature will be a meter of couple hours of work > > > > > > Regards Michael > > > _______________________________________________ > > > Engine-devel mailing list > > > Engine-devel@ovirt.org > > > http://lists.ovirt.org/mailman/listinfo/engine-devel > > > > > _______________________________________________ > > Engine-devel mailing list > > Engine-devel@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/engine-devel >
participants (4)
-
Dan Kenigsberg
-
Michael Kublin
-
Saggi Mizrahi
-
Shireesh Anjal