short recap of last vdsm call (15.4.2014)

ybronhei

22 Apr 2014 22 Apr '14

6:54 a.m.

hey, somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments - task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu. - fromani: Suggested thread pool for libvirt-related operations. if this can be elaborated more, please do. i don't recall the details or other alternatives. - danken: we should press on with the patches that keep VMs in Down state: removing them on vdsm restart keeps lvs in activated state which may spell trouble - splitting vdsm infra\generic code to sub-vdsm-module instead of creating separate packages. This can be done by splitting the git repository to allow "easy" maintenance of the code (from infra prospective) which includes also C parts and generic tools that can be used in other projects as well. if i missed anything, please add thanks, -- Yaniv Bronhaim.

Show replies by date

Dan Kenigsberg

30 Apr 30 Apr

8:22 a.m.

On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...

hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle? Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process. If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

ybronhei

8:37 a.m.

On 04/30/2014 04:22 PM, Dan Kenigsberg wrote:

...

On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

engine must polls somehow the process (/task) status to know when to end the action (with success or fail) and release locks -- Yaniv Bronhaim.

Dan Kenigsberg

9:33 a.m.

On Wed, Apr 30, 2014 at 04:37:22PM +0300, ybronhei wrote:

...

On 04/30/2014 04:22 PM, Dan Kenigsberg wrote:

...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

engine must polls somehow the process (/task) status to know when to end the action (with success or fail) and release locks

Sure - but I understood that Vdsm is to report (in getAllVmStats or whereever) the list of all pending block jobs. We could revive the plan for a new task framework in Vdsm - if we can convince Saggi that it would not be abused to do intractable synchronization attempts as the current framework is. But this requires some design and thinking - last time we did not get into an agreement regarding the persistence semantics of tasks on Vdsm side. Dan.

Adam Litke

12:26 p.m.

On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:

...

On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats. Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward.

...

If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event. * Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt. Hope this makes the rationale a bit clearer... -- Adam Litke

Dan Kenigsberg

1 May 1 May

11:53 a.m.

On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:

...

On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:

...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats.

Why would Engine ever want to initiate a new live merge of a (vmId,diskId) before it has a conclusive result of the previous success/failure of the previous attempt? As far as I understand, this should never happen, and it's actually good for the API to force avoidence of such a case.

...

Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward.

I do not think that having a (virtual) table of task_id -> vmId,diskId in Vdsm is much simpler than having it on the Engine machine. I still find the nothion of a new framework for async tasks quite useful. But as I requested before, I think we should design it first, so it fits all conceivable users. In particular, if we should not tie it to the existence of a running VM. We'd better settle on persistence semantics that works for everybody (such as network tasks). Last time, the idea was struck down by Saggi and others from infra, who are afraid to repeat mistakes from the current task framework.

...

...
If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event.

But if Vdsm dies before it managed to clean up, Engine would have to perform the cleanup via another host. So having this short-loop cleanup is redundant.

...

* Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt.

This can be done by an in-memory cache.

...

Hope this makes the rationale a bit clearer...

Yes, but I am not yet convinced...

Adam Litke

12:28 p.m.

On 01/05/14 17:53 +0100, Dan Kenigsberg wrote:

...

On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:

...
On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:

...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats.

Why would Engine ever want to initiate a new live merge of a (vmId,diskId) before it has a conclusive result of the previous success/failure of the previous attempt? As far as I understand, this should never happen, and it's actually good for the API to force avoidence of such a case.

...
Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward.

I do not think that having a (virtual) table of task_id -> vmId,diskId in Vdsm is much simpler than having it on the Engine machine.

It needs to go somewhere. As the designers of the API we felt it would be better for vdsm to hide the semantics of when a vmId,diskId tuple can be considered a unique identifier. If we ever do generalize the concept of a transient task to other users (setupNetworks, etc) it would be a far more consumable API if engine didn't need to handle a bunch of special cases about what constitutes a "job ID" and the specifics of its lifetime. UUIDs are simple and already well-supported. Why make it more difficult than it has to be?

...

I still find the nothion of a new framework for async tasks quite useful. But as I requested before, I think we should design it first, so it fits all conceivable users. In particular, if we should not tie it to the existence of a running VM. We'd better settle on persistence semantics that works for everybody (such as network tasks).

Last time, the idea was struck down by Saggi and others from infra, who are afraid to repeat mistakes from the current task framework.

Several famous quotes apply here. The only thing we have to fear is fear itself :) Sometimes perfect is the enemy of good. Tasks redesign was always going to be driven by the need to implement one feature at first. It just so happens that we volunteered to take a stab at it for live merge. It's clear that we won't be able to completely replace the old tasks and get this feature out in one pass. We believe the general principles of our tasks are generally extensible to cover new use cases in the future: * Jobs are given an engine-supplied UUID when started * There is a well-known way to check if a job is running or not * There is a well-known way to test if a finished job succeeded or failed. I believe we did spend quite a bit of time in March coming up with a design for NG tasks. Unfortunately it was infra who made our jobs vm-specific by requiring the job status to be passed by getVMStats rather than an object-agnostic getJobsStatus stand-alone API that could conglomerate all job types into a single response.

...

...
...
If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event.

But if Vdsm dies before it managed to clean up, Engine would have to perform the cleanup via another host. So having this short-loop cleanup is redundant.

Fair enough. We'll be doing the volume chain scan for every native VM disk at VM startup. The only exception is if we are recovering and the running VM's recovery file does not show any outstanding block jobs. In that case we have definitive information that a rescan will not be required.

...

...
* Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt.

This can be done by an in-memory cache.

Sure, but then you miss out on the other benefits I've enumerated.

...

...
Hope this makes the rationale a bit clearer...

Yes, but I am not yet convinced...

-- Adam Litke

Saggi Mizrahi

4 May 4 May

9:24 a.m.

The thread became a bit too long for me to follow who said what when. So I'll just say how things are going to work: VDSM is going to have a a non disk persistent tasks. The only reason I'm even giving a way to list running tasks is so that legacy operations can use it. New verbs should not rely on the task ID. You should put the status on an object so the lifetime of the job is bound to that object and have the actual commands return quickly. The Task-ID is used internally to match requests with responses. In an ideal world without BC I wouldn't even have verbs to query for running tasks. You should poll the object or, in the future, use events to report progress. As an example, instead of having startVM() return when the VM is up. You should have it return when VDSM has the VM object (in some map) and you should poll the status of the VM through the VM ID. Instead of having copyImage() return when the copy is complete it should return when the all the metadata was created and persisted on the target image so that you can track the target image instead of the task. Also, you should try and make your commands idempotent so to simplify the flows further. Even though the job idiom appears to be simpler it is harder to manage in a clustered environment as tracking job IDs is much harder to coordinate than state changes. As for Task IDs being UUIDs. As I said before. VDSM will not enforce IDs given from the engine to be UUIDs they will be treated as opaque strings. There is no reason to validate that they are UUIDs in VDSM. It's adding a limitation on the API for no reason. TaskID in the HTTP header will only work for storage verbs. All other subsystems will have to move to the json-rpc to get that. Seeing as json-rpc in VDSM is targeted for 3.5 it shouldn't be that much of an issue. ----- Original Message -----

...

From: "Adam Litke" <alitke@redhat.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: smizrahi@redhat.com, "ybronhei" <ybronhei@redhat.com>, devel@ovirt.org Sent: Thursday, May 1, 2014 8:28:14 PM Subject: Re: [ovirt-devel] short recap of last vdsm call (15.4.2014)

On 01/05/14 17:53 +0100, Dan Kenigsberg wrote:

...
On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:

...
On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:

...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats.

Why would Engine ever want to initiate a new live merge of a (vmId,diskId) before it has a conclusive result of the previous success/failure of the previous attempt? As far as I understand, this should never happen, and it's actually good for the API to force avoidence of such a case.

...
Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward.

I do not think that having a (virtual) table of task_id -> vmId,diskId in Vdsm is much simpler than having it on the Engine machine.

It needs to go somewhere. As the designers of the API we felt it would be better for vdsm to hide the semantics of when a vmId,diskId tuple can be considered a unique identifier. If we ever do generalize the concept of a transient task to other users (setupNetworks, etc) it would be a far more consumable API if engine didn't need to handle a bunch of special cases about what constitutes a "job ID" and the specifics of its lifetime. UUIDs are simple and already well-supported. Why make it more difficult than it has to be?

...
I still find the nothion of a new framework for async tasks quite useful. But as I requested before, I think we should design it first, so it fits all conceivable users. In particular, if we should not tie it to the existence of a running VM. We'd better settle on persistence semantics that works for everybody (such as network tasks).

Last time, the idea was struck down by Saggi and others from infra, who are afraid to repeat mistakes from the current task framework.

Several famous quotes apply here. The only thing we have to fear is fear itself :) Sometimes perfect is the enemy of good. Tasks redesign was always going to be driven by the need to implement one feature at first. It just so happens that we volunteered to take a stab at it for live merge. It's clear that we won't be able to completely replace the old tasks and get this feature out in one pass. We believe the general principles of our tasks are generally extensible to cover new use cases in the future:

* Jobs are given an engine-supplied UUID when started * There is a well-known way to check if a job is running or not * There is a well-known way to test if a finished job succeeded or failed.

I believe we did spend quite a bit of time in March coming up with a design for NG tasks.

Unfortunately it was infra who made our jobs vm-specific by requiring the job status to be passed by getVMStats rather than an object-agnostic getJobsStatus stand-alone API that could conglomerate all job types into a single response.

...
...
...
If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event.

But if Vdsm dies before it managed to clean up, Engine would have to perform the cleanup via another host. So having this short-loop cleanup is redundant.

Fair enough. We'll be doing the volume chain scan for every native VM disk at VM startup. The only exception is if we are recovering and the running VM's recovery file does not show any outstanding block jobs. In that case we have definitive information that a rescan will not be required.

...
...
* Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt.

This can be done by an in-memory cache.

Sure, but then you miss out on the other benefits I've enumerated.

...
...
Hope this makes the rationale a bit clearer...

Yes, but I am not yet convinced...

-- Adam Litke

Adam Litke

5 May 5 May

9:38 a.m.

New subject: vdsm tasks API design discussion

On 04/05/14 10:24 -0400, Saggi Mizrahi wrote:

...

The thread became a bit too long for me to follow who said what when.

So I'll just say how things are going to work:

This is not a very good way to frame a discussion. The above sentence suggests that you are completely closed off to new ideas or working together as a community. -1.

...

VDSM is going to have a a non disk persistent tasks. The only reason I'm even giving a way to list running tasks is so that legacy operations can use it.

Have you checked with your stakeholders to make sure this limitation is okay for the future intended use cases? I can say for sure that you haven't asked those of us who are working on live merge.

...

New verbs should not rely on the task ID. You should put the status on an object so the lifetime of the job is bound to that object and have the actual commands return quickly. The Task-ID is used internally to match requests with responses. In an ideal world without BC I wouldn't even have verbs to query for running tasks.

You should poll the object or, in the future, use events to report progress.

While this approach sounds pretty nice at first glance I doubt that these task semantics will remain simple for long. You are asking every API that implements an asynchronous operation to define its own "contract" for what it means to have a job still running. This means engine will need to account for corner cases such as vdsm crash/restart in different ways for each verb implemented. For example, the setupNetworks verb would (probably?) be aborted by a vdsm crash/restart but a live merge job would not (since it is tied to the qemu process). It would be far simpler to have vdsm return a simple list of job ids (potentially with abstracted cursor information). Vdsm knows the details about whether operations are still running and can provide that pretty easily.

...

As an example, instead of having startVM() return when the VM is up. You should have it return when VDSM has the VM object (in some map) and you should poll the status of the VM through the VM ID.

Right, I think we do that today. It's pretty easy to manage an async object creation because you can think of the object like a long running job and vdsm provides a list jobs verb (list).

...

Instead of having copyImage() return when the copy is complete it should return when the all the metadata was created and persisted on the target image so that you can track the target image instead of the task.

Yep, another simple case because you are clearly creating an object and you have a list-object verb. Let's try a more difficult case: setupNetworks. How would that one work (and please provide some details -- it's important)? Does engine need to duplicate the network model and query every single aspect of all interfaces (down to the mtu setting) in order to ensure that everything was set as it should be?

...

Also, you should try and make your commands idempotent so to simplify the flows further.

Good idea, but not always practical.

...

Even though the job idiom appears to be simpler it is harder to manage in a clustered environment as tracking job IDs is much harder to coordinate than state changes.

I'm not ready to paint with such a broad brush. State machines can be really complex. You may end up exposing too much vdsm internal information in order to give engine enough context to understand state changes.

...

As for Task IDs being UUIDs. As I said before. VDSM will not enforce IDs given from the engine to be UUIDs they will be treated as opaque strings. There is no reason to validate that they are UUIDs in VDSM. It's adding a limitation on the API for no reason.

Again, you're writing as if the debate is closed. The purpose of a task ID is to correlate an initial request with some future correspondence (either an out of order return value or in the state of another object). By allowing it to be free-form you are just begging for it to be abused in the future. This reminds me of 'specParams', 'customProperties' and the 'options' free-form dictionary parameters that some vdsm APIs have. There should be no reason why taskID should be used to trojan-horse some contextual data into vdsm.

...

TaskID in the HTTP header will only work for storage verbs. All other subsystems will have to move to the json-rpc to get that. Seeing as json-rpc in VDSM is targeted for 3.5 it shouldn't be that much of an issue.

Live merge is a really important feature for 3.5. I'm hesitant to depend on jsonrpc unless we are absolutely certain that it will be ready to go in time for us to integrate with it. Also, we need to understand the upgrade/BC characteristics of jsonrpc because they will suddenly impact live merge as well. I'd specifically like to hear what Allon thinks about adding this dependency. I am not sure you're designing a tasks API that is going to be generally useful. It seems I will need to work around the missing features in your design be adding extra complexity to my feature. Isn't this what we're trying to avoid?

...

----- Original Message -----

...
From: "Adam Litke" <alitke@redhat.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: smizrahi@redhat.com, "ybronhei" <ybronhei@redhat.com>, devel@ovirt.org Sent: Thursday, May 1, 2014 8:28:14 PM Subject: Re: [ovirt-devel] short recap of last vdsm call (15.4.2014)

On 01/05/14 17:53 +0100, Dan Kenigsberg wrote:

...
On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:

...
On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:

...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:

...
hey,

somehow we missed the summary of this call, and few "big" issues were raised there. so i would like to share it with all and hear more comments

- task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats.

Why would Engine ever want to initiate a new live merge of a (vmId,diskId) before it has a conclusive result of the previous success/failure of the previous attempt? As far as I understand, this should never happen, and it's actually good for the API to force avoidence of such a case.

...
Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward.

I do not think that having a (virtual) table of task_id -> vmId,diskId in Vdsm is much simpler than having it on the Engine machine.

It needs to go somewhere. As the designers of the API we felt it would be better for vdsm to hide the semantics of when a vmId,diskId tuple can be considered a unique identifier. If we ever do generalize the concept of a transient task to other users (setupNetworks, etc) it would be a far more consumable API if engine didn't need to handle a bunch of special cases about what constitutes a "job ID" and the specifics of its lifetime. UUIDs are simple and already well-supported. Why make it more difficult than it has to be?

...
I still find the nothion of a new framework for async tasks quite useful. But as I requested before, I think we should design it first, so it fits all conceivable users. In particular, if we should not tie it to the existence of a running VM. We'd better settle on persistence semantics that works for everybody (such as network tasks).

Last time, the idea was struck down by Saggi and others from infra, who are afraid to repeat mistakes from the current task framework.

Several famous quotes apply here. The only thing we have to fear is fear itself :) Sometimes perfect is the enemy of good. Tasks redesign was always going to be driven by the need to implement one feature at first. It just so happens that we volunteered to take a stab at it for live merge. It's clear that we won't be able to completely replace the old tasks and get this feature out in one pass. We believe the general principles of our tasks are generally extensible to cover new use cases in the future:

* Jobs are given an engine-supplied UUID when started * There is a well-known way to check if a job is running or not * There is a well-known way to test if a finished job succeeded or failed.

I believe we did spend quite a bit of time in March coming up with a design for NG tasks.

Unfortunately it was infra who made our jobs vm-specific by requiring the job status to be passed by getVMStats rather than an object-agnostic getJobsStatus stand-alone API that could conglomerate all job types into a single response.

...
...
...
If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event.

But if Vdsm dies before it managed to clean up, Engine would have to perform the cleanup via another host. So having this short-loop cleanup is redundant.

Fair enough. We'll be doing the volume chain scan for every native VM disk at VM startup. The only exception is if we are recovering and the running VM's recovery file does not show any outstanding block jobs. In that case we have definitive information that a rescan will not be required.

...
...
* Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt.

This can be done by an in-memory cache.

Sure, but then you miss out on the other benefits I've enumerated.

...
...
Hope this makes the rationale a bit clearer...

Yes, but I am not yet convinced...

-- Adam Litke

-- Adam Litke

Saggi Mizrahi

11:29 a.m.

New subject: vdsm tasks API design discussion

I'll try and sum up my responses here: About being things being closed for discussion: This stuff had been put up for discussion multiple times over more that a year. There comes a time when a decision has to be made. Unless I get compelling enough arguments (ie. an actual use case) I will have to keep things as they are. As for task ID format not being enforced: This was also up for discussion ages ago. We already had issues with useless validations causing us problems over time. As I said, I don't care if the engine enforces it to be a UUID. As long as the only argument is "I don't want people that interface VDSM to do stupid things" my response it still going to be "I am not here to make sure other projects are doing their job correctly". So unless I get a valid real world reason to reopen that discussion things I'm going leave things as they are. I will be adding length limitation to the ID to prevent abuse. Now for the meaty bit: The main reason against having job control is that it doesn't work on a clustered message based system. You need to keep 3 things in mind: 1. Messages can get lost (either requests or responses) 2. Messages can arrive out of order. 3. The same message can arrive multiple times. 3 You can try and minimize in the actual messaging layer but 1 and two have to be handled by the actual algorithms. Lets take a general request. Using getAllRunningJobs() I have no way of knowing if: 1. The message is still in transit 2. The request is already over (thus was cleared) 3. VDSM crashed. 4. Response was sent but never arrived. 5. Any combination of the cases above. That is what is important about having idempotency or having things be bound to entities. Binding things to an entity gives you the information you need to resolve those issues. without that you will find it very hard to know what exactly is going on. For setupNetwork() I would just make it idempotent. You just send the same command again with the same end state. It would be a no-op if it's the current network topology is what it's supposed to be. You could also have a special field containing the "version" of the configuration (I would make it a hash or a UUID and not a running number) that you would persist locally on the host after you finished configuring since the local host is the scope of setupNetworks(). It would allow you to not care about any of the error state keep sending the same configuration if you think something bad happened until you the is what you expect it to be or and error response actually manages find it's way back to you. By using the same task ID you are guaranteed to only have the operation running once at a time. I don't mind helping anyone with making their algorithms work but there is no escaping from the limitations listed above. If we want to make oVirt truly scalable and robust we have to start thinking about algorithms that work despite of errors and not just have error flows. Notice I don't even mention different systems of persistence and some tasks that you should be able to get state information about from more than one host. Some "Jobs" can survive a VDSM restart since it's not in VDSM like stuff in gluster or QEmu. To make it clear, the task API shouldn't really be that useful. Task IDs are just there to match requests to responses internally because as I explained, jobs are hard to manage generally in such a system. This by no way means that if we see a use case emerging that requires some sort of infra we would not do it. I just think it would probably be tied to some common algorithm or idiom than something truly generic used by every API call. Hope I made things clearer, sorry if I came out a bit rude. I'm off, I have my country's birthday to celebrate. ----- Original Message -----

...

From: "Adam Litke" <alitke@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com> Cc: "Federico Simoncelli" <fsimonce@redhat.com>, "Dan Kenigsberg" <danken@redhat.com>, "ybronhei" <ybronhei@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, devel@ovirt.org, "Allon Mureinik" <amureini@redhat.com> Sent: Monday, May 5, 2014 5:38:10 PM Subject: vdsm tasks API design discussion

On 04/05/14 10:24 -0400, Saggi Mizrahi wrote:

...
The thread became a bit too long for me to follow who said what when.

So I'll just say how things are going to work:

This is not a very good way to frame a discussion. The above sentence suggests that you are completely closed off to new ideas or working together as a community. -1.

...
VDSM is going to have a a non disk persistent tasks. The only reason I'm even giving a way to list running tasks is so that legacy operations can use it.

Have you checked with your stakeholders to make sure this limitation is okay for the future intended use cases? I can say for sure that you haven't asked those of us who are working on live merge.

...
New verbs should not rely on the task ID. You should put the status on an object so the lifetime of the job is bound to that object and have the actual commands return quickly. The Task-ID is used internally to match requests with responses. In an ideal world without BC I wouldn't even have verbs to query for running tasks.

You should poll the object or, in the future, use events to report progress.

While this approach sounds pretty nice at first glance I doubt that these task semantics will remain simple for long. You are asking every API that implements an asynchronous operation to define its own "contract" for what it means to have a job still running. This means engine will need to account for corner cases such as vdsm crash/restart in different ways for each verb implemented. For example, the setupNetworks verb would (probably?) be aborted by a vdsm crash/restart but a live merge job would not (since it is tied to the qemu process).

It would be far simpler to have vdsm return a simple list of job ids (potentially with abstracted cursor information). Vdsm knows the details about whether operations are still running and can provide that pretty easily.

...
As an example, instead of having startVM() return when the VM is up. You should have it return when VDSM has the VM object (in some map) and you should poll the status of the VM through the VM ID.

Right, I think we do that today. It's pretty easy to manage an async object creation because you can think of the object like a long running job and vdsm provides a list jobs verb (list).

...
Instead of having copyImage() return when the copy is complete it should return when the all the metadata was created and persisted on the target image so that you can track the target image instead of the task.

Yep, another simple case because you are clearly creating an object and you have a list-object verb. Let's try a more difficult case: setupNetworks. How would that one work (and please provide some details -- it's important)? Does engine need to duplicate the network model and query every single aspect of all interfaces (down to the mtu setting) in order to ensure that everything was set as it should be? setupNetworks() gets passed an end state anyway.

...
Also, you should try and make your commands idempotent so to simplify the flows further.

Good idea, but not always practical.

...
Even though the job idiom appears to be simpler it is harder to manage in a clustered environment as tracking job IDs is much harder to coordinate than state changes.

I'm not ready to paint with such a broad brush. State machines can be really complex. You may end up exposing too much vdsm internal information in order to give engine enough context to understand state changes.

...
As for Task IDs being UUIDs. As I said before. VDSM will not enforce IDs given from the engine to be UUIDs they will be treated as opaque strings. There is no reason to validate that they are UUIDs in VDSM. It's adding a limitation on the API for no reason.

Again, you're writing as if the debate is closed. The purpose of a task ID is to correlate an initial request with some future correspondence (either an out of order return value or in the state of another object). By allowing it to be free-form you are just begging for it to be abused in the future. This reminds me of 'specParams', 'customProperties' and the 'options' free-form dictionary parameters that some vdsm APIs have.

There should be no reason why taskID should be used to trojan-horse some contextual data into vdsm.

...
TaskID in the HTTP header will only work for storage verbs. All other subsystems will have to move to the json-rpc to get that. Seeing as json-rpc in VDSM is targeted for 3.5 it shouldn't be that much of an issue.

Live merge is a really important feature for 3.5. I'm hesitant to depend on jsonrpc unless we are absolutely certain that it will be ready to go in time for us to integrate with it. Also, we need to understand the upgrade/BC characteristics of jsonrpc because they will suddenly impact live merge as well. I'd specifically like to hear what Allon thinks about adding this dependency.

I am not sure you're designing a tasks API that is going to be generally useful. It seems I will need to work around the missing features in your design be adding extra complexity to my feature. Isn't this what we're trying to avoid?

...
----- Original Message -----

...
From: "Adam Litke" <alitke@redhat.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: smizrahi@redhat.com, "ybronhei" <ybronhei@redhat.com>, devel@ovirt.org Sent: Thursday, May 1, 2014 8:28:14 PM Subject: Re: [ovirt-devel] short recap of last vdsm call (15.4.2014)

On 01/05/14 17:53 +0100, Dan Kenigsberg wrote:

...
On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:

...
On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:

...
On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote: >hey, > >somehow we missed the summary of this call, and few "big" issues >were raised there. so i would like to share it with all and hear >more comments > >- task id in http header - allows engine to initiate calls with id >instead of following vdsm response - federico already started this >work, and this is mandatory for live merge feature afaiu.

Adam, Federico, may I revisit this question from another angle?

Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process.

A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats.

Why would Engine ever want to initiate a new live merge of a (vmId,diskId) before it has a conclusive result of the previous success/failure of the previous attempt? As far as I understand, this should never happen, and it's actually good for the API to force avoidence of such a case.

...
Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward.

I do not think that having a (virtual) table of task_id -> vmId,diskId in Vdsm is much simpler than having it on the Engine machine.

It needs to go somewhere. As the designers of the API we felt it would be better for vdsm to hide the semantics of when a vmId,diskId tuple can be considered a unique identifier. If we ever do generalize the concept of a transient task to other users (setupNetworks, etc) it would be a far more consumable API if engine didn't need to handle a bunch of special cases about what constitutes a "job ID" and the specifics of its lifetime. UUIDs are simple and already well-supported. Why make it more difficult than it has to be?

...
I still find the nothion of a new framework for async tasks quite useful. But as I requested before, I think we should design it first, so it fits all conceivable users. In particular, if we should not tie it to the existence of a running VM. We'd better settle on persistence semantics that works for everybody (such as network tasks).

Last time, the idea was struck down by Saggi and others from infra, who are afraid to repeat mistakes from the current task framework.

Several famous quotes apply here. The only thing we have to fear is fear itself :) Sometimes perfect is the enemy of good. Tasks redesign was always going to be driven by the need to implement one feature at first. It just so happens that we volunteered to take a stab at it for live merge. It's clear that we won't be able to completely replace the old tasks and get this feature out in one pass. We believe the general principles of our tasks are generally extensible to cover new use cases in the future:

* Jobs are given an engine-supplied UUID when started * There is a well-known way to check if a job is running or not * There is a well-known way to test if a finished job succeeded or failed.

I believe we did spend quite a bit of time in March coming up with a design for NG tasks.

Unfortunately it was infra who made our jobs vm-specific by requiring the job status to be passed by getVMStats rather than an object-agnostic getJobsStatus stand-alone API that could conglomerate all job types into a single response.

...
...
...
If we do not have a task id, we do not need to worry on how to pass it, and where to persist it.

There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event.

But if Vdsm dies before it managed to clean up, Engine would have to perform the cleanup via another host. So having this short-loop cleanup is redundant.

Fair enough. We'll be doing the volume chain scan for every native VM disk at VM startup. The only exception is if we are recovering and the running VM's recovery file does not show any outstanding block jobs. In that case we have definitive information that a rescan will not be required.

...
...
* Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt.

This can be done by an in-memory cache.

Sure, but then you miss out on the other benefits I've enumerated.

...
...
Hope this makes the rationale a bit clearer...

Yes, but I am not yet convinced...

-- Adam Litke

-- Adam Litke

Adam Litke

1:29 p.m.

New subject: vdsm tasks API design discussion

On 05/05/14 12:29 -0400, Saggi Mizrahi wrote:

...

I'll try and sum up my responses here:

I would prefer if you responded inline since it helps you respond specifically to individual points and also helps others follow along with the conversation.

...

About being things being closed for discussion: This stuff had been put up for discussion multiple times over more that a year. There comes a time when a decision has to be made.

Yep, and then you write code, submit patches, and get them reviewed. If people don't like the way you did it, you have to go back to the design phase and try again. I don't care if you had some discussions with other people a year ago. I am paying attention now and have some concerns with your approach.

...

Unless I get compelling enough arguments (ie. an actual use case) I will have to keep things as they are.

As for task ID format not being enforced: This was also up for discussion ages ago.

This is not a valid reason to ignore further discussion.

...

We already had issues with useless validations causing us problems over time.

As I said, I don't care if the engine enforces it to be a UUID. As long as the only argument is "I don't want people that interface VDSM to do stupid things" my response it still going to be "I am not here to make sure other projects are doing their job correctly".

I'm having trouble hearing you shouting from so high up in your infra silo. Last I checked we are working on the same project. I still haven't heard a good reason why a UUID isn't enough here. I'm not asking vdsm to validate a string with the 'uuid' python module. I am just asking that it be treated like other uuids in the API (ie. it's documented as a UUID in the schema or elsewhere and it is followed by convention).

...

So unless I get a valid real world reason to reopen that discussion things I'm going leave things as they are.

It seems to me that the discussion is open now whether you like it or not :) Just because you disagree with me does not make my argument invalid. Maybe others have an opinion on this too.

...

I will be adding length limitation to the ID to prevent abuse.

I thought you were against useless validations.

...

Now for the meaty bit:

The main reason against having job control is that it doesn't work on a clustered message based system.

To be clear I am not asking for job control. I am asking for a very simple form of job monitoring. Vdsm can tell me whether my named operation is still running.

...

You need to keep 3 things in mind: 1. Messages can get lost (either requests or responses)

Indeed. If my request for the list of jobs gets lost or times out I'll simply ask again. I assume jobs have not changed from running to stopped during a time of lost connectivity.

...

2. Messages can arrive out of order.

Yep, this is handled by the protocol in the form of message ids. libvirt and qemu are doing this with qmp today. It's not related to the reasons I want a list of running jobs that are indexed by engine-supplied UUIDs.

...

3. The same message can arrive multiple times.

Again, fixed by message IDs and unrelated to the meat of our disagreement.

...

3 You can try and minimize in the actual messaging layer but 1 and two have to be handled by the actual algorithms.

Lets take a general request. Using getAllRunningJobs() I have no way of knowing if: 1. The message is still in transit 2. The request is already over (thus was cleared) 3. VDSM crashed. 4. Response was sent but never arrived. 5. Any combination of the cases above.

You either got a response or didn't. If you got a response, handle it. If not, then assume that no jobs have transitioned states (running to stopped) until you get confirmation. This is how live merge will be implemented. Once you know a job does not exist, then you can handle resolution of it.

...

That is what is important about having idempotency or having things be bound to entities.

Binding things to an entity gives you the information you need to resolve those issues.

without that you will find it very hard to know what exactly is going on.

I think you're arguing about resolving success or failure which I believe we agree on. In the live merge case, we simply wait for proof from vdsm that the job is no longer running (job UUID absent from vm stats). At that point we query the volume chain of the given vm disk to determine the actual result.

...

For setupNetwork() I would just make it idempotent. You just send the same command again with the same end state. It would be a no-op if it's the current network topology is what it's supposed to be.

no-op is a bit generous. It will still need to check everything to make sure it is actually consistent. This probably involves invoking quite a few external commands to query interfaces, routes, etc. It would be nice if engine could wait until a jobID disappeared before checking a single time if the state is correct. Otherwise it will be very busy at each polling interval checking everything over and over.

...

You could also have a special field containing the "version" of the configuration (I would make it a hash or a UUID and not a running number) that you would persist locally on the host after you finished configuring since the local host is the scope of setupNetworks().

Hmm, interesing. It would save time and effort on scanning network properties. But you are introducing the persistence of task end-state. I thought this was something we are trying to avoid.

...

It would allow you to not care about any of the error state keep sending the same configuration if you think something bad happened until you the is what you expect it to be or and error response actually manages find it's way back to you. By using the same task ID you are guaranteed to only have the operation running once at a time.

I don't mind helping anyone with making their algorithms work but there is no escaping from the limitations listed above. If we want to make oVirt truly scalable and robust we have to start thinking about algorithms that work despite of errors and not just have error flows.

Agreed. This is what the ngTasks framework is supposed to achieve for us. I think you are conflating the issue of listing active operations and high level flow design. If the async operations that make up a complex flow are themselves idempotent, then we have achieved the above. It can be done with or without a vdsm api to list running jobs.

...

Notice I don't even mention different systems of persistence and some tasks that you should be able to get state information about from more than one host. Some "Jobs" can survive a VDSM restart since it's not in VDSM like stuff in gluster or QEmu.

Yep, live merge is one such job. While we don't persist the job, we do remember that it was running so we can synchronize our state with the underlying hypervisor when we restart.

...

To make it clear, the task API shouldn't really be that useful. Task IDs are just there to match requests to responses internally because as I explained, jobs are hard to manage generally in such a system.

...

This by no way means that if we see a use case emerging that requires some sort of infra we would not do it. I just think it would probably be tied to some common algorithm or idiom than something truly generic used by every API call.

Maybe we are talking about two different things that cannot be combined. All I want is a generic way to list ongoing host-level operations that will be useful for live merge and others. If all you want is a protocol syncronization mechanism in the style of QMP then that is different. Perhaps we need both. I'll be happy to keep the jobID as a formal API parameter and other new APIs that spawn long-running operations could do the same. Then whatever token you want to pass on the wire does not matter to me at all.

...

Hope I made things clearer, sorry if I came out a bit rude. I'm off, I have my country's birthday to celebrate.

Thanks for participating in the discussion. In the end we will end up with superior code than if we had not had this discussion. Happy Yom HaAtzmaut! -- Adam Litke

4450

Age (days ago)

4463

Last active (days ago)

List overview

Download

10 comments

4 participants

participants (4)

Adam Litke
Dan Kenigsberg
Saggi Mizrahi
ybronhei

short recap of last vdsm call (15.4.2014)

tags

participants (4)