[Engine-devel] RFD: API: Identifying vdsm objects in the next-gen API

Mon Dec 3 22:55:29 UTC 2012

On Mon, Dec 03, 2012 at 03:57:42PM -0500, Saggi Mizrahi wrote:
> 
> 
> ----- Original Message -----
> > From: "Adam Litke" <agl at us.ibm.com> To: "Saggi Mizrahi"
> > <smizrahi at redhat.com> Cc: engine-devel at linode01.ovirt.org, "Dan Kenigsberg"
> > <danken at redhat.com>, "Federico Simoncelli" <fsimonce at redhat.com>, "Ayal
> > Baron" <abaron at redhat.com>, vdsm-devel at lists.fedorahosted.org Sent: Monday,
> > December 3, 2012 3:30:21 PM Subject: Re: RFD: API: Identifying vdsm objects
> > in the next-gen API
> > 
> > On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Adam Litke" <agl at us.ibm.com> To: "Saggi Mizrahi"
> > > > <smizrahi at redhat.com> Cc: engine-devel at linode01.ovirt.org, "Dan
> > > > Kenigsberg" <danken at redhat.com>, "Federico Simoncelli"
> > > > <fsimonce at redhat.com>, "Ayal Baron" <abaron at redhat.com>,
> > > > vdsm-devel at lists.fedorahosted.org Sent: Thursday, November 29, 2012
> > > > 5:22:43 PM Subject: Re: RFD: API: Identifying vdsm objects in the
> > > > next-gen API
> > > > 
> > > > On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote:
> > > > > They are not future proof as the paradigm is completely different.
> > > > > Storage domain IDs are not static any more (and are not guaranteed to
> > > > > be unique or the same across the cluster.  Image IDs represent the ID
> > > > > of the projected data and not the actual unique path.  Just as an
> > > > > example, to run a VM you give a list of domains that might contain the
> > > > > needed images in the chain and the image ID of the tip.  The paradigm
> > > > > is changed to and most calls get non synchronous number of images and
> > > > > domains.  Further more, the APIs themselves are completely different.
> > > > > So future proofing is not really an issue.
> > > > 
> > > > I don't understand this at all.  Perhaps we could all use some education
> > > > on the architecture of the planned architectural changes.  If I can pass
> > > > an arbitrary list of domainIDs that _might_ contain the data, why
> > > > wouldn't I just pass all of them every time?  In that case, why are they
> > > > even required since vdsm would have to search anyway?
> > > It's for optimization mostly, the engine usually has a good idea of where
> > > stuff are, having it give hints to VDSM can speed up the search process.
> > > also, then engines knows how transient some storage pieces are. If you
> > > have a domain that is only there for backup or "owned" by another manager
> > > sharing the host, you don't want you VMs using the disks that are on that
> > > storage effectively preventing it from being removed (though we do have
> > > plans to have qemu switch base snapshots at runtime for just that).
> > 
> > This is not a clean design.  If the search is slow, then maybe we need to
> > improve caching internally.  Making a client cache a bunch of internal IDs
> > to pass around sounds like a complete layering violation to me.
> You can't cache this, if the same template exists on an 2 different NFS
> domains only the engine has enough information to know which you should use.
> We only have the engine give us thing information when starting a VM or
> merging\copying an image that resides on multiple domains.  It is also
> completely optional. I didn't like it either.

Is it even valid for the same template (with identical uuids) to exist in two
places?  I thought uuids aren't supposed to collide.  I can envision some
scenario where a cached storagedomain/storagepool relationship becomes invalid
because another user detached the storagedomain.  In that case, the API just
returns the normal error about "sd XXX is not attached to sp XXX".  So I don't
see any problem here.

> > 
> > > > 
> > > > > As to making the current API a bit simpler. As I said, making them
> > > > > opaque is problematic as currently the engine is responsible for
> > > > > creating the IDs.
> > > > 
> > > > As I mentioned in my last post, engine still can specify the ID's when
> > > > the object is first created.  From that point forward the ID never
> > > > changes so it can be baked into the identifier.
> > > Where will this identifier be persisted?
> > > > 
> > > > > Further more, some calls require you to play with these (making a
> > > > > template instead of a snapshot).  Also, the full chain and topology
> > > > > needs to be completely visible to the engine.
> > > > 
> > > > Please provide a specific example of how you play with the IDs.  I can
> > > > guess where you are going, but I don't want to divert the thread.
> > > The relationship between volumes and images is deceptive at the moment.
> > > IMG is the chain and volume is a member, IMGUUID is only used to for
> > > verification and to detect when we hit a template going up the chain.
> > > When you do operation on images assumptions are being guaranteed about the
> > > resulting IDs.  When you copy an image, you assume to know all the new IDs
> > > as they remain the same.  With your method I can't tell what the new
> > > "opaque" result is going to be.  Preview mode (another abomination being
> > > deprecated) relies on the disconnect between imgUUID and volUUID.  Live
> > > migration currently moves a lot of the responsibility to the engine.
> > 
> > No client should need to know about all of these internal details.  I
> > understand that's the way it is today, and that's one of the main reasons
> > that the API is a complete pain to use.
> You are correct but this is how this API was designed you can't get away from
> that.

We are trying to fix the problems on the second try.  This will inevitably cause
a little discomfort in the short term but will be worth it in the long run.

> > 
> > > > 
> > > > > These things, as you said, are problematic. But this is the way things
> > > > > are today.
> > > > 
> > > > We are changing them.
> > > Any intermediary step is needlessly problematic for existing clients.
> > > Work is already in progress for fixing the API properly, making some calls
> > > a bit nicer isn't an excuse to start making more compatibility code in the
> > > engine.
> > 
> > The engine won't need compatibility code.  This only would impact the
> > jsonrpc bindings which aren't used by engine yet.  When engine switches
> > over, then yes it would need to adapt.
> This means that you put it as a condition to adapting the json-rpc base API
> which will postpone engine adoption significantly.

No one has said the transition to json-rpc would be effortless.  We're not
switching protocols merely of disdain for xmlrpc.  We are planning some
structural changes to the API to improve usability for everyone.

> > 
> > > > 
> > > > > As for task IDs.  Currently task IDs are only used for storage and
> > > > > they get persisted to disk. This is WRONG and is not the case with the
> > > > > new storage API.  Because we moved to an asynchronous message based
> > > > > protocol (json-rpc over TCP\AMQP) there is no need to generate a task
> > > > > ID. it is built in to json-rpc.  json-rpc specifies that the IDs have
> > > > > to be unique for a client as long as the request is still active.
> > > > > This is good enough as internally we can have a verb for a client to
> > > > > query it's own running tasks and a verb to query other host tasks by
> > > > > mangling in the client before the ID.  Because the protocol is
> > > > 
> > > > So this would rely on the client keeping the connection open and as soon
> > > > as it disconnects it would lose the ability to query tasks from before
> > > > the connection went down?  I don't know if it's a good idea to conflate
> > > > message ID's with task ID's.  While the protocol can operate
> > > > asynchronously, some calls have synchronous semantics and others have
> > > > asynchronous semantics.  I would expect sync calls to return their data
> > > > immediately and async calls to return immediately with either: an error
> > > > code, or an 'operation started' message and associated ID for querying
> > > > the status of the operation.
> > > Upon reflection I agree that having the request ID unique per client is
> > > problematic and we need to make sure they are unique per host at every
> > > point in time.
> > > > 
> > > > > asynchronous all calls are asynchronous by nature well.  Tasks will no
> > > > > longer be persisted or expected to be persisted. It's the callers
> > > > > responsibility to query the state and see if the operation succeeded
> > > > > or failed if the caller or VDSM died in the middle of the call.  The
> > > > > current "cleanTask()" system can't be used when more then one client
> > > > > is using VDSM and will not be used for anything other then legacy
> > > > > storage.
> > > > 
> > > > I agree about not persisting tasks in the future.  Although I think
> > > > finished tasks should remain in memory for some time so they can be
> > > > queried by a client who must reconnect.
> > > I am completely against keeping the task for a nominal amount of time, it
> > > just makes another flow.  You need to have code that makes up in case you
> > > missed that window any way then just have one recovery code path, when
> > > VDSM looses you task or you lose VDSM recover immediately.  Also, because
> > > task IDs can be reused once they expire assuming that the task you
> > > encountered is the same task you originally sent is problematic.
> > > 
> > > If you expect intermittent connections use the AMQP backend (which will
> > > support broker-less p2p communication as well)
> > 
> > HOw will you tell the difference between a completed task and an invalid (no
> > such task) task?  Do all completed tasks just issue a task completed event?
> When task complete they return a value (and might even send a task completed
> even).  If you lost that task you will just have to handle that by inspecting
> the current state.  As I said, you will have to do that anyway if you put a
> "sleep" on finished task and you missed the window or you are tracking a task
> that is owned by another VDSM user.  If we have to handle crash situations
> might as well be crash only. http://en.wikipedia.org/wiki/Crash-only_software
> > 
> > > > 
> > > > > AFAIK Apart from storage all objects IDs are constructed with a single
> > > > > ID, name or alias. VMs, storageConnections, network interfaces. So
> > > > > it's not a real issue.  I agree that in the future we should keep the
> > > > > idiom of pass configuration once, name it, and keep using the name to
> > > > > reference the object.
> > > > 
> > > > Yes, storage is the major problem here.
> > > And, as I said, changing the API is problematic for migration of current
> > > users.
> > > > 
> > 
> > -- Adam Litke <agl at us.ibm.com> IBM Linux Technology Center
> > 
> > 
> 

-- 
Adam Litke <agl at us.ibm.com>
IBM Linux Technology Center