[Engine-devel] RFD: API: Identifying vdsm objects in the next-gen API

Saggi Mizrahi smizrahi at redhat.com
Mon Dec 3 20:57:42 UTC 2012



----- Original Message -----
> From: "Adam Litke" <agl at us.ibm.com>
> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> Cc: engine-devel at linode01.ovirt.org, "Dan Kenigsberg" <danken at redhat.com>, "Federico Simoncelli"
> <fsimonce at redhat.com>, "Ayal Baron" <abaron at redhat.com>, vdsm-devel at lists.fedorahosted.org
> Sent: Monday, December 3, 2012 3:30:21 PM
> Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API
> 
> On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Adam Litke" <agl at us.ibm.com> To: "Saggi Mizrahi"
> > > <smizrahi at redhat.com> Cc: engine-devel at linode01.ovirt.org, "Dan
> > > Kenigsberg"
> > > <danken at redhat.com>, "Federico Simoncelli" <fsimonce at redhat.com>,
> > > "Ayal
> > > Baron" <abaron at redhat.com>, vdsm-devel at lists.fedorahosted.org
> > > Sent:
> > > Thursday, November 29, 2012 5:22:43 PM Subject: Re: RFD: API:
> > > Identifying
> > > vdsm objects in the next-gen API
> > > 
> > > On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote:
> > > > They are not future proof as the paradigm is completely
> > > > different.
> > > > Storage domain IDs are not static any more (and are not
> > > > guaranteed to be
> > > > unique or the same across the cluster.  Image IDs represent the
> > > > ID of the
> > > > projected data and not the actual unique path.  Just as an
> > > > example, to run
> > > > a VM you give a list of domains that might contain the needed
> > > > images in
> > > > the chain and the image ID of the tip.  The paradigm is changed
> > > > to and
> > > > most calls get non synchronous number of images and domains.
> > > >  Further
> > > > more, the APIs themselves are completely different. So future
> > > > proofing is
> > > > not really an issue.
> > > 
> > > I don't understand this at all.  Perhaps we could all use some
> > > education on
> > > the architecture of the planned architectural changes.  If I can
> > > pass an
> > > arbitrary list of domainIDs that _might_ contain the data, why
> > > wouldn't I
> > > just pass all of them every time?  In that case, why are they
> > > even required
> > > since vdsm would have to search anyway?
> > It's for optimization mostly, the engine usually has a good idea of
> > where
> > stuff are, having it give hints to VDSM can speed up the search
> > process.
> > also, then engines knows how transient some storage pieces are. If
> > you have a
> > domain that is only there for backup or "owned" by another manager
> > sharing the
> > host, you don't want you VMs using the disks that are on that
> > storage
> > effectively preventing it from being removed (though we do have
> > plans to have
> > qemu switch base snapshots at runtime for just that).
> 
> This is not a clean design.  If the search is slow, then maybe we
> need to
> improve caching internally.  Making a client cache a bunch of
> internal IDs to
> pass around sounds like a complete layering violation to me.
You can't cache this, if the same template exists on an 2 different NFS domains only the engine has enough information to know which you should use.
We only have the engine give us thing information when starting a VM or merging\copying an image that resides on multiple domains.
It is also completely optional. I didn't like it either.
> 
> > > 
> > > > As to making the current API a bit simpler. As I said, making
> > > > them opaque
> > > > is problematic as currently the engine is responsible for
> > > > creating the
> > > > IDs.
> > > 
> > > As I mentioned in my last post, engine still can specify the ID's
> > > when the
> > > object is first created.  From that point forward the ID never
> > > changes so it
> > > can be baked into the identifier.
> > Where will this identifier be persisted?
> > > 
> > > > Further more, some calls require you to play with these (making
> > > > a template
> > > > instead of a snapshot).  Also, the full chain and topology
> > > > needs to be
> > > > completely visible to the engine.
> > > 
> > > Please provide a specific example of how you play with the IDs.
> > >  I can guess
> > > where you are going, but I don't want to divert the thread.
> > The relationship between volumes and images is deceptive at the
> > moment.  IMG
> > is the chain and volume is a member, IMGUUID is only used to for
> > verification
> > and to detect when we hit a template going up the chain.  When you
> > do
> > operation on images assumptions are being guaranteed about the
> > resulting IDs.
> > When you copy an image, you assume to know all the new IDs as they
> > remain the
> > same.  With your method I can't tell what the new "opaque" result
> > is going to
> > be.  Preview mode (another abomination being deprecated) relies on
> > the
> > disconnect between imgUUID and volUUID.  Live migration currently
> > moves a lot
> > of the responsibility to the engine.
> 
> No client should need to know about all of these internal details.  I
> understand
> that's the way it is today, and that's one of the main reasons that
> the API is a
> complete pain to use.
You are correct but this is how this API was designed you can't get away from that.
> 
> > > 
> > > > These things, as you said, are problematic. But this is the way
> > > > things are
> > > > today.
> > > 
> > > We are changing them.
> > Any intermediary step is needlessly problematic for existing
> > clients.  Work is
> > already in progress for fixing the API properly, making some calls
> > a bit nicer
> > isn't an excuse to start making more compatibility code in the
> > engine.
> 
> The engine won't need compatibility code.  This only would impact the
> jsonrpc
> bindings which aren't used by engine yet.  When engine switches over,
> then yes
> it would need to adapt.
This means that you put it as a condition to adapting the json-rpc base API which will postpone engine adoption significantly.
> 
> > > 
> > > > As for task IDs.  Currently task IDs are only used for storage
> > > > and they
> > > > get persisted to disk. This is WRONG and is not the case with
> > > > the new
> > > > storage API.  Because we moved to an asynchronous message based
> > > > protocol
> > > > (json-rpc over TCP\AMQP) there is no need to generate a task
> > > > ID. it is
> > > > built in to json-rpc.  json-rpc specifies that the IDs have to
> > > > be unique
> > > > for a client as long as the request is still active.  This is
> > > > good enough
> > > > as internally we can have a verb for a client to query it's own
> > > > running
> > > > tasks and a verb to query other host tasks by mangling in the
> > > > client
> > > > before the ID.  Because the protocol is
> > > 
> > > So this would rely on the client keeping the connection open and
> > > as soon as
> > > it disconnects it would lose the ability to query tasks from
> > > before the
> > > connection went down?  I don't know if it's a good idea to
> > > conflate message
> > > ID's with task ID's.  While the protocol can operate
> > > asynchronously, some
> > > calls have synchronous semantics and others have asynchronous
> > > semantics.  I
> > > would expect sync calls to return their data immediately and
> > > async calls to
> > > return immediately with either: an error code, or an 'operation
> > > started'
> > > message and associated ID for querying the status of the
> > > operation.
> > Upon reflection I agree that having the request ID unique per
> > client is
> > problematic and we need to make sure they are unique per host at
> > every point
> > in time.
> > > 
> > > > asynchronous all calls are asynchronous by nature well.  Tasks
> > > > will no
> > > > longer be persisted or expected to be persisted. It's the
> > > > callers
> > > > responsibility to query the state and see if the operation
> > > > succeeded or
> > > > failed if the caller or VDSM died in the middle of the call.
> > > > The current
> > > > "cleanTask()" system can't be used when more then one client is
> > > > using VDSM
> > > > and will not be used for anything other then legacy storage.
> > > 
> > > I agree about not persisting tasks in the future.  Although I
> > > think finished
> > > tasks should remain in memory for some time so they can be
> > > queried by a
> > > client who must reconnect.
> > I am completely against keeping the task for a nominal amount of
> > time, it just
> > makes another flow.  You need to have code that makes up in case
> > you missed
> > that window any way then just have one recovery code path, when
> > VDSM looses
> > you task or you lose VDSM recover immediately.  Also, because task
> > IDs can be
> > reused once they expire assuming that the task you encountered is
> > the same
> > task you originally sent is problematic.
> > 
> > If you expect intermittent connections use the AMQP backend (which
> > will
> > support broker-less p2p communication as well)
> 
> HOw will you tell the difference between a completed task and an
> invalid (no
> such task) task?  Do all completed tasks just issue a task completed
> event?
When task complete they return a value (and might even send a task completed even).
If you lost that task you will just have to handle that by inspecting the current state.
As I said, you will have to do that anyway if you put a "sleep" on finished task and you missed the window or you are tracking a task that is owned by another VDSM user.
If we have to handle crash situations might as well be crash only. http://en.wikipedia.org/wiki/Crash-only_software
> 
> > > 
> > > > AFAIK Apart from storage all objects IDs are constructed with a
> > > > single ID,
> > > > name or alias. VMs, storageConnections, network interfaces. So
> > > > it's not a
> > > > real issue.  I agree that in the future we should keep the
> > > > idiom of pass
> > > > configuration once, name it, and keep using the name to
> > > > reference the
> > > > object.
> > > 
> > > Yes, storage is the major problem here.
> > And, as I said, changing the API is problematic for migration of
> > current
> > users.
> > > 
> 
> --
> Adam Litke <agl at us.ibm.com>
> IBM Linux Technology Center
> 
> 



More information about the Devel mailing list