Re: [Engine-devel] RFD: API: Identifying vdsm objects in the next-gen API

3 Dec 2012


      On Mon, Dec 03, 2012 at 03:57:42PM -0500, Saggi Mizrahi wrote:
...
----- Original Message -----
...
From: "Adam Litke" <agl@us.ibm.com> To: "Saggi Mizrahi"
<smizrahi@redhat.com> Cc: engine-devel@linode01.ovirt.org, "Dan Kenigsberg"
<danken@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Ayal
Baron" <abaron@redhat.com>, vdsm-devel@lists.fedorahosted.org Sent: Monday,
December 3, 2012 3:30:21 PM Subject: Re: RFD: API: Identifying vdsm objects
in the next-gen API
On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote:
...
...
From: "Adam Litke" <agl@us.ibm.com> To: "Saggi Mizrahi"
<smizrahi@redhat.com> Cc: engine-devel@linode01.ovirt.org, "Dan
Kenigsberg" <danken@redhat.com>, "Federico Simoncelli"
<fsimonce@redhat.com>, "Ayal Baron" <abaron@redhat.com>,
vdsm-devel@lists.fedorahosted.org Sent: Thursday, November 29, 2012
5:22:43 PM Subject: Re: RFD: API: Identifying vdsm objects in the
next-gen API
On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote:
...
They are not future proof as the paradigm is completely different.
Storage domain IDs are not static any more (and are not guaranteed to
be unique or the same across the cluster.  Image IDs represent the ID
of the projected data and not the actual unique path.  Just as an
example, to run a VM you give a list of domains that might contain the
needed images in the chain and the image ID of the tip.  The paradigm
is changed to and most calls get non synchronous number of images and
domains.  Further more, the APIs themselves are completely different.
So future proofing is not really an issue.
I don't understand this at all.  Perhaps we could all use some education
on the architecture of the planned architectural changes.  If I can pass
an arbitrary list of domainIDs that _might_ contain the data, why
wouldn't I just pass all of them every time?  In that case, why are they
even required since vdsm would have to search anyway?
It's for optimization mostly, the engine usually has a good idea of where
stuff are, having it give hints to VDSM can speed up the search process.
also, then engines knows how transient some storage pieces are. If you
have a domain that is only there for backup or "owned" by another manager
sharing the host, you don't want you VMs using the disks that are on that
storage effectively preventing it from being removed (though we do have
----- Original Message -----
plans to have qemu switch base snapshots at runtime for just that).
This is not a clean design.  If the search is slow, then maybe we need to
improve caching internally.  Making a client cache a bunch of internal IDs
to pass around sounds like a complete layering violation to me.
You can't cache this, if the same template exists on an 2 different NFS
domains only the engine has enough information to know which you should use.
We only have the engine give us thing information when starting a VM or
merging\copying an image that resides on multiple domains.  It is also
completely optional. I didn't like it either.
Is it even valid for the same template (with identical uuids) to exist in two
places?  I thought uuids aren't supposed to collide.  I can envision some
scenario where a cached storagedomain/storagepool relationship becomes invalid
because another user detached the storagedomain.  In that case, the API just
returns the normal error about "sd XXX is not attached to sp XXX".  So I don't
see any problem here.
...
...
...
...
...
As to making the current API a bit simpler. As I said, making them
opaque is problematic as currently the engine is responsible for
creating the IDs.
As I mentioned in my last post, engine still can specify the ID's when
the object is first created.  From that point forward the ID never
changes so it can be baked into the identifier.
Where will this identifier be persisted?
...
...
Further more, some calls require you to play with these (making a
template instead of a snapshot).  Also, the full chain and topology
needs to be completely visible to the engine.
Please provide a specific example of how you play with the IDs.  I can
guess where you are going, but I don't want to divert the thread.
The relationship between volumes and images is deceptive at the moment.
IMG is the chain and volume is a member, IMGUUID is only used to for
verification and to detect when we hit a template going up the chain.
When you do operation on images assumptions are being guaranteed about the
resulting IDs.  When you copy an image, you assume to know all the new IDs
as they remain the same.  With your method I can't tell what the new
"opaque" result is going to be.  Preview mode (another abomination being
deprecated) relies on the disconnect between imgUUID and volUUID.  Live
migration currently moves a lot of the responsibility to the engine.
No client should need to know about all of these internal details.  I
understand that's the way it is today, and that's one of the main reasons
that the API is a complete pain to use.
You are correct but this is how this API was designed you can't get away from
that.
We are trying to fix the problems on the second try.  This will inevitably cause
a little discomfort in the short term but will be worth it in the long run.
...
...
...
...
...
These things, as you said, are problematic. But this is the way things
are today.
We are changing them.
Any intermediary step is needlessly problematic for existing clients.
Work is already in progress for fixing the API properly, making some calls
a bit nicer isn't an excuse to start making more compatibility code in the
engine.
The engine won't need compatibility code.  This only would impact the
jsonrpc bindings which aren't used by engine yet.  When engine switches
over, then yes it would need to adapt.
This means that you put it as a condition to adapting the json-rpc base API
which will postpone engine adoption significantly.
No one has said the transition to json-rpc would be effortless.  We're not
switching protocols merely of disdain for xmlrpc.  We are planning some
structural changes to the API to improve usability for everyone.
...
...
...
...
...
As for task IDs.  Currently task IDs are only used for storage and
they get persisted to disk. This is WRONG and is not the case with the
new storage API.  Because we moved to an asynchronous message based
protocol (json-rpc over TCP\AMQP) there is no need to generate a task
ID. it is built in to json-rpc.  json-rpc specifies that the IDs have
to be unique for a client as long as the request is still active.
This is good enough as internally we can have a verb for a client to
query it's own running tasks and a verb to query other host tasks by
mangling in the client before the ID.  Because the protocol is
So this would rely on the client keeping the connection open and as soon
as it disconnects it would lose the ability to query tasks from before
the connection went down?  I don't know if it's a good idea to conflate
message ID's with task ID's.  While the protocol can operate
asynchronously, some calls have synchronous semantics and others have
asynchronous semantics.  I would expect sync calls to return their data
immediately and async calls to return immediately with either: an error
code, or an 'operation started' message and associated ID for querying
the status of the operation.
Upon reflection I agree that having the request ID unique per client is
problematic and we need to make sure they are unique per host at every
point in time.
...
...
asynchronous all calls are asynchronous by nature well.  Tasks will no
longer be persisted or expected to be persisted. It's the callers
responsibility to query the state and see if the operation succeeded
or failed if the caller or VDSM died in the middle of the call.  The
current "cleanTask()" system can't be used when more then one client
is using VDSM and will not be used for anything other then legacy
storage.
I agree about not persisting tasks in the future.  Although I think
finished tasks should remain in memory for some time so they can be
queried by a client who must reconnect.
I am completely against keeping the task for a nominal amount of time, it
just makes another flow.  You need to have code that makes up in case you
missed that window any way then just have one recovery code path, when
VDSM looses you task or you lose VDSM recover immediately.  Also, because
task IDs can be reused once they expire assuming that the task you
encountered is the same task you originally sent is problematic.
If you expect intermittent connections use the AMQP backend (which will
support broker-less p2p communication as well)
HOw will you tell the difference between a completed task and an invalid (no
such task) task?  Do all completed tasks just issue a task completed event?
When task complete they return a value (and might even send a task completed
even).  If you lost that task you will just have to handle that by inspecting
the current state.  As I said, you will have to do that anyway if you put a
"sleep" on finished task and you missed the window or you are tracking a task
that is owned by another VDSM user.  If we have to handle crash situations
might as well be crash only. http://en.wikipedia.org/wiki/Crash-only_software
...
...
...
...
AFAIK Apart from storage all objects IDs are constructed with a single
ID, name or alias. VMs, storageConnections, network interfaces. So
it's not a real issue.  I agree that in the future we should keep the
idiom of pass configuration once, name it, and keep using the name to
reference the object.
Yes, storage is the major problem here.
And, as I said, changing the API is problematic for migration of current
users.
...
-- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center
-- 
Adam Litke <agl@us.ibm.com>
IBM Linux Technology Center