[ovirt-devel] vdsm tasks API design discussion

Mon May 5 14:38:10 UTC 2014

On 04/05/14 10:24 -0400, Saggi Mizrahi wrote:
>The thread became a bit too long for me to follow who said what when.
>
>So I'll just say how things are going to work:

This is not a very good way to frame a discussion.  The above sentence
suggests that you are completely closed off to new ideas or working
together as a community.  -1.

>VDSM is going to have a a non disk persistent tasks.
>The only reason I'm even giving a way to list running
>tasks is so that legacy operations can use it.

Have you checked with your stakeholders to make sure this limitation
is okay for the future intended use cases?  I can say for sure that
you haven't asked those of us who are working on live merge.

>New verbs should not rely on the task ID. You should put
>the status on an object so the lifetime of the job is
>bound to that object and have the actual commands
>return quickly. The Task-ID is used internally to match
>requests with responses. In an ideal world without BC
>I wouldn't even have verbs to query for running tasks.
>
>You should poll the object or, in the future, use events
>to report progress.

While this approach sounds pretty nice at first glance I doubt that
these task semantics will remain simple for long.  You are asking
every API that implements an asynchronous operation to define its own
"contract" for what it means to have a job still running.  This means
engine will need to account for corner cases such as vdsm
crash/restart in different ways for each verb implemented.  For
example, the setupNetworks verb would (probably?) be aborted by a vdsm
crash/restart but a live merge job would not (since it is tied to the
qemu process).

It would be far simpler to have vdsm return a simple list of job ids
(potentially with abstracted cursor information).  Vdsm knows the
details about whether operations are still running and can provide
that pretty easily.

>As an example, instead of having
>startVM() return when the VM is up.
>You should have it return when VDSM has the VM
>object (in some map) and you should poll the
>status of the VM through the VM ID.

Right, I think we do that today.  It's pretty easy to manage an async
object creation because you can think of the object like a long
running job and vdsm provides a list jobs verb (list).

>Instead of having copyImage() return when the
>copy is complete it should return when the
>all the metadata was created and persisted
>on the target image so that you can track
>the target image instead of the task.

Yep, another simple case because you are clearly creating an object
and you have a list-object verb.  Let's try a more difficult case:
setupNetworks.  How would that one work (and please provide some
details -- it's important)?  Does engine need to duplicate the network
model and query every single aspect of all interfaces (down to the mtu
setting) in order to ensure that everything was set as it should be?

>Also, you should try and make your commands idempotent so
>to simplify the flows further.

Good idea, but not always practical.

>Even though the job idiom appears to be simpler it is harder
>to manage in a clustered environment as tracking job IDs is
>much harder to coordinate than state changes.

I'm not ready to paint with such a broad brush.  State machines can be
really complex.  You may end up exposing too much vdsm internal
information in order to give engine enough context to understand state
changes.

>As for Task IDs being UUIDs. As I said before. VDSM will not
>enforce IDs given from the engine to be UUIDs they will be
>treated as opaque strings. There is no reason to validate that
>they are UUIDs in VDSM. It's adding a limitation on the API
>for no reason.

Again, you're writing as if the debate is closed.  The purpose of a
task ID is to correlate an initial request with some future
correspondence (either an out of order return value or in the state of
another object).  By allowing it to be free-form you are just begging
for it to be abused in the future.  This reminds me of 'specParams',
'customProperties' and the 'options' free-form dictionary parameters
that some vdsm APIs have.

There should be no reason why taskID should be used to trojan-horse
some contextual data into vdsm.

>TaskID in the HTTP header will only work for storage verbs.
>All other subsystems will have to move to the json-rpc to
>get that. Seeing as json-rpc in VDSM is targeted for 3.5
>it shouldn't be that much of an issue.

Live merge is a really important feature for 3.5.  I'm hesitant to depend on
jsonrpc unless we are absolutely certain that it will be ready to go
in time for us to integrate with it.  Also, we need to understand the
upgrade/BC characteristics of jsonrpc because they will suddenly
impact live merge as well.  I'd specifically like to hear what Allon
thinks about adding this dependency.

I am not sure you're designing a tasks API that is going to be
generally useful.  It seems I will need to work around the missing
features in your design be adding extra complexity to my feature.
Isn't this what we're trying to avoid?

>
>----- Original Message -----
>> From: "Adam Litke" <alitke at redhat.com>
>> To: "Dan Kenigsberg" <danken at redhat.com>
>> Cc: smizrahi at redhat.com, "ybronhei" <ybronhei at redhat.com>, devel at ovirt.org
>> Sent: Thursday, May 1, 2014 8:28:14 PM
>> Subject: Re: [ovirt-devel] short recap of last vdsm call (15.4.2014)
>>
>> On 01/05/14 17:53 +0100, Dan Kenigsberg wrote:
>> >On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote:
>> >> On 30/04/14 14:22 +0100, Dan Kenigsberg wrote:
>> >> >On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote:
>> >> >>hey,
>> >> >>
>> >> >>somehow we missed the summary of this call, and few "big" issues
>> >> >>were raised there. so i would like to share it with all and hear
>> >> >>more comments
>> >> >>
>> >> >>- task id in http header - allows engine to initiate calls with id
>> >> >>instead of following vdsm response - federico already started this
>> >> >>work, and this is mandatory for live merge feature afaiu.
>> >> >
>> >> >Adam, Federico, may I revisit this question from another angle?
>> >> >
>> >> >Why does Vdsm needs to know live-merge's task id?
>> >> >As far as I understand, (vmid, disk id) are enough to identify a live
>> >> >merge process.
>> >>
>> >> A vmId + diskId can uniquely identify a block job at a single moment
>> >> in time since qemu guarantees that only a single block job can run at
>> >> any given point in time.  But this gives us no way to differentiate
>> >> two sequential jobs that run on the same disk.  Therefore, without
>> >> having an engine-supplied jobID, we can never be sure if a one job
>> >> finished and another started since the last time we polled stats.
>> >
>> >Why would Engine ever want to initiate a new live merge of a
>> >(vmId,diskId) before it has a conclusive result of the previous
>> >success/failure of the previous attempt? As far as I understand, this
>> >should never happen, and it's actually good for the API to force
>> >avoidence of such a case.
>> >
>> >> Additionally, engine-supplied UUIDs is part of a developing framework
>> >> for next-generation async tasks.  Engine prefers to use a single
>> >> identifier to represent any kind of task (rather than some problem
>> >> domain specific combination of UUIDs).  Adhering to this rule will
>> >> help us to converge on a single implementation of ng async tasks
>> >> moving forward.
>> >
>> >I do not think that having a (virtual) table of
>> >    task_id -> vmId,diskId
>> >in Vdsm is much simpler than having it on the Engine machine.
>>
>> It needs to go somewhere.  As the designers of the API we felt it
>> would be better for vdsm to hide the semantics of when a vmId,diskId
>> tuple can be considered a unique identifier.  If we ever do generalize
>> the concept of a transient task to other users (setupNetworks, etc) it
>> would be a far more consumable API if engine didn't need to handle a
>> bunch of special cases about what constitutes a "job ID" and the
>> specifics of its lifetime.  UUIDs are simple and already
>> well-supported.  Why make it more difficult than it has to be?
>>
>> >
>> >I still find the nothion of a new framework for async tasks quite
>> >useful. But as I requested before, I think we should design it first,
>> >so it fits all conceivable users. In particular, if we should not tie it
>> >to the existence of a running VM. We'd better settle on persistence
>> >semantics that works for everybody (such as network tasks).
>> >
>> >Last time, the idea was struck down by Saggi and others from infra, who
>> >are afraid to repeat mistakes from the current task framework.
>>
>> Several famous quotes apply here.  The only thing we have to fear is
>> fear itself :)  Sometimes perfect is the enemy of good.  Tasks
>> redesign was always going to be driven by the need to implement one
>> feature at first.  It just so happens that we volunteered to take a
>> stab at it for live merge.  It's clear that we won't be able to
>> completely replace the old tasks and get this feature out in one pass.
>> We believe the general principles of our tasks are generally
>> extensible to cover new use cases in the future:
>>
>>  * Jobs are given an engine-supplied UUID when started
>>  * There is a well-known way to check if a job is running or not
>>  * There is a well-known way to test if a finished job succeeded or
>>    failed.
>>
>> I believe we did spend quite a bit of time in March coming up with a
>> design for NG tasks.
>>
>> Unfortunately it was infra who made our jobs vm-specific by requiring
>> the job status to be passed by getVMStats rather than an
>> object-agnostic getJobsStatus stand-alone API that could conglomerate
>> all job types into a single response.
>>
>> >>
>> >> >
>> >> >If we do not have a task id, we do not need to worry on how to pass it,
>> >> >and where to persist it.
>> >>
>> >> There are at least 3 reasons to persist a block job ID:
>> >> * To associate a specific block job operation with a specific
>> >>  engine-initiated flow.
>> >> * So that you can clean up after a job that completed when vdsm could
>> >>  not receive the completion event.
>> >
>> >But if Vdsm dies before it managed to clean up, Engine would have to
>> >perform the cleanup via another host. So having this short-loop cleanup
>> >is redundant.
>> >
>>
>> Fair enough.  We'll be doing the volume chain scan for every native VM
>> disk at VM startup.  The only exception is if we are recovering and
>> the running VM's recovery file does not show any outstanding block
>> jobs.  In that case we have definitive information that a rescan will
>> not be required.
>>
>> >> * Since we must ask libvirt about block job events on a per VM, per
>> >>  disk basis, tracking the devices on which we expect block jobs
>> >>  enables us to eliminate wasteful calls to libvirt.
>> >
>> >This can be done by an in-memory cache.
>>
>> Sure, but then you miss out on the other benefits I've enumerated.
>>
>> >
>> >>
>> >> Hope this makes the rationale a bit clearer...
>> >
>> >Yes, but I am not yet convinced...
>> >
>>
>> --
>> Adam Litke
>>

-- 
Adam Litke