Re: [Engine-devel] RFD: API: Identifying vdsm objects in the next-gen API

3 Dec 2012


      ----- Original Message -----
...
From: "Adam Litke" <agl@us.ibm.com>
To: "Saggi Mizrahi" <smizrahi@redhat.com>
Cc: engine-devel@linode01.ovirt.org, "Dan Kenigsberg" <danken@redhat.com>, "Federico Simoncelli"
<fsimonce@redhat.com>, "Ayal Baron" <abaron@redhat.com>, vdsm-devel@lists.fedorahosted.org
Sent: Monday, December 3, 2012 3:30:21 PM
Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API
On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote:
...
...
From: "Adam Litke" <agl@us.ibm.com> To: "Saggi Mizrahi"
<smizrahi@redhat.com> Cc: engine-devel@linode01.ovirt.org, "Dan
Kenigsberg"
<danken@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>,
"Ayal
Baron" <abaron@redhat.com>, vdsm-devel@lists.fedorahosted.org
Sent:
Thursday, November 29, 2012 5:22:43 PM Subject: Re: RFD: API:
Identifying
vdsm objects in the next-gen API
On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote:
...
They are not future proof as the paradigm is completely
different.
Storage domain IDs are not static any more (and are not
guaranteed to be
unique or the same across the cluster.  Image IDs represent the
ID of the
projected data and not the actual unique path.  Just as an
example, to run
a VM you give a list of domains that might contain the needed
images in
the chain and the image ID of the tip.  The paradigm is changed
to and
most calls get non synchronous number of images and domains.
 Further
more, the APIs themselves are completely different. So future
proofing is
not really an issue.
I don't understand this at all.  Perhaps we could all use some
education on
the architecture of the planned architectural changes.  If I can
pass an
arbitrary list of domainIDs that _might_ contain the data, why
wouldn't I
just pass all of them every time?  In that case, why are they
even required
since vdsm would have to search anyway?
It's for optimization mostly, the engine usually has a good idea of
where
stuff are, having it give hints to VDSM can speed up the search
----- Original Message -----
process.
also, then engines knows how transient some storage pieces are. If
you have a
domain that is only there for backup or "owned" by another manager
sharing the
host, you don't want you VMs using the disks that are on that
storage
effectively preventing it from being removed (though we do have
plans to have
qemu switch base snapshots at runtime for just that).
This is not a clean design.  If the search is slow, then maybe we
need to
improve caching internally.  Making a client cache a bunch of
internal IDs to
pass around sounds like a complete layering violation to me.
You can't cache this, if the same template exists on an 2 different NFS domains only the engine has enough information to know which you should use.
We only have the engine give us thing information when starting a VM or merging\copying an image that resides on multiple domains.
It is also completely optional. I didn't like it either.
...
...
...
As to making the current API a bit simpler. As I said, making
them opaque
is problematic as currently the engine is responsible for
creating the
IDs.
As I mentioned in my last post, engine still can specify the ID's
when the
object is first created.  From that point forward the ID never
changes so it
can be baked into the identifier.
Where will this identifier be persisted?
...
...
Further more, some calls require you to play with these (making
a template
instead of a snapshot).  Also, the full chain and topology
needs to be
completely visible to the engine.
Please provide a specific example of how you play with the IDs.
 I can guess
where you are going, but I don't want to divert the thread.
The relationship between volumes and images is deceptive at the
moment.  IMG
is the chain and volume is a member, IMGUUID is only used to for
verification
and to detect when we hit a template going up the chain.  When you
do
operation on images assumptions are being guaranteed about the
resulting IDs.
When you copy an image, you assume to know all the new IDs as they
remain the
same.  With your method I can't tell what the new "opaque" result
is going to
be.  Preview mode (another abomination being deprecated) relies on
the
disconnect between imgUUID and volUUID.  Live migration currently
moves a lot
of the responsibility to the engine.
No client should need to know about all of these internal details.  I
understand
that's the way it is today, and that's one of the main reasons that
the API is a
complete pain to use.
You are correct but this is how this API was designed you can't get away from that.
...
...
...
These things, as you said, are problematic. But this is the way
things are
today.
We are changing them.
Any intermediary step is needlessly problematic for existing
clients.  Work is
already in progress for fixing the API properly, making some calls
a bit nicer
isn't an excuse to start making more compatibility code in the
engine.
The engine won't need compatibility code.  This only would impact the
jsonrpc
bindings which aren't used by engine yet.  When engine switches over,
then yes
it would need to adapt.
This means that you put it as a condition to adapting the json-rpc base API which will postpone engine adoption significantly.
...
...
...
As for task IDs.  Currently task IDs are only used for storage
and they
get persisted to disk. This is WRONG and is not the case with
the new
storage API.  Because we moved to an asynchronous message based
protocol
(json-rpc over TCP\AMQP) there is no need to generate a task
ID. it is
built in to json-rpc.  json-rpc specifies that the IDs have to
be unique
for a client as long as the request is still active.  This is
good enough
as internally we can have a verb for a client to query it's own
running
tasks and a verb to query other host tasks by mangling in the
client
before the ID.  Because the protocol is
So this would rely on the client keeping the connection open and
as soon as
it disconnects it would lose the ability to query tasks from
before the
connection went down?  I don't know if it's a good idea to
conflate message
ID's with task ID's.  While the protocol can operate
asynchronously, some
calls have synchronous semantics and others have asynchronous
semantics.  I
would expect sync calls to return their data immediately and
async calls to
return immediately with either: an error code, or an 'operation
started'
message and associated ID for querying the status of the
operation.
Upon reflection I agree that having the request ID unique per
client is
problematic and we need to make sure they are unique per host at
every point
in time.
...
...
asynchronous all calls are asynchronous by nature well.  Tasks
will no
longer be persisted or expected to be persisted. It's the
callers
responsibility to query the state and see if the operation
succeeded or
failed if the caller or VDSM died in the middle of the call.
The current
"cleanTask()" system can't be used when more then one client is
using VDSM
and will not be used for anything other then legacy storage.
I agree about not persisting tasks in the future.  Although I
think finished
tasks should remain in memory for some time so they can be
queried by a
client who must reconnect.
I am completely against keeping the task for a nominal amount of
time, it just
makes another flow.  You need to have code that makes up in case
you missed
that window any way then just have one recovery code path, when
VDSM looses
you task or you lose VDSM recover immediately.  Also, because task
IDs can be
reused once they expire assuming that the task you encountered is
the same
task you originally sent is problematic.
If you expect intermittent connections use the AMQP backend (which
will
support broker-less p2p communication as well)
HOw will you tell the difference between a completed task and an
invalid (no
such task) task?  Do all completed tasks just issue a task completed
event?
When task complete they return a value (and might even send a task completed even).
If you lost that task you will just have to handle that by inspecting the current state.
As I said, you will have to do that anyway if you put a "sleep" on finished task and you missed the window or you are tracking a task that is owned by another VDSM user.
If we have to handle crash situations might as well be crash only. http://en.wikipedia.org/wiki/Crash-only_software
...
...
...
AFAIK Apart from storage all objects IDs are constructed with a
single ID,
name or alias. VMs, storageConnections, network interfaces. So
it's not a
real issue.  I agree that in the future we should keep the
idiom of pass
configuration once, name it, and keep using the name to
reference the
object.
Yes, storage is the major problem here.
And, as I said, changing the API is problematic for migration of
current
users.
...
--
Adam Litke <agl@us.ibm.com>
IBM Linux Technology Center