On Mon, Dec 03, 2012 at 03:57:42PM -0500, Saggi Mizrahi wrote:
----- Original Message -----
> From: "Adam Litke" <agl(a)us.ibm.com> To: "Saggi Mizrahi"
> <smizrahi(a)redhat.com> Cc: engine-devel(a)linode01.ovirt.org, "Dan
Kenigsberg"
> <danken(a)redhat.com>, "Federico Simoncelli"
<fsimonce(a)redhat.com>, "Ayal
> Baron" <abaron(a)redhat.com>, vdsm-devel(a)lists.fedorahosted.org Sent:
Monday,
> December 3, 2012 3:30:21 PM Subject: Re: RFD: API: Identifying vdsm objects
> in the next-gen API
>
> On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote:
> >
> >
> > ----- Original Message -----
> > > From: "Adam Litke" <agl(a)us.ibm.com> To: "Saggi
Mizrahi"
> > > <smizrahi(a)redhat.com> Cc: engine-devel(a)linode01.ovirt.org,
"Dan
> > > Kenigsberg" <danken(a)redhat.com>, "Federico
Simoncelli"
> > > <fsimonce(a)redhat.com>, "Ayal Baron"
<abaron(a)redhat.com>,
> > > vdsm-devel(a)lists.fedorahosted.org Sent: Thursday, November 29, 2012
> > > 5:22:43 PM Subject: Re: RFD: API: Identifying vdsm objects in the
> > > next-gen API
> > >
> > > On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote:
> > > > They are not future proof as the paradigm is completely different.
> > > > Storage domain IDs are not static any more (and are not guaranteed
to
> > > > be unique or the same across the cluster. Image IDs represent the
ID
> > > > of the projected data and not the actual unique path. Just as an
> > > > example, to run a VM you give a list of domains that might contain
the
> > > > needed images in the chain and the image ID of the tip. The
paradigm
> > > > is changed to and most calls get non synchronous number of images
and
> > > > domains. Further more, the APIs themselves are completely
different.
> > > > So future proofing is not really an issue.
> > >
> > > I don't understand this at all. Perhaps we could all use some
education
> > > on the architecture of the planned architectural changes. If I can pass
> > > an arbitrary list of domainIDs that _might_ contain the data, why
> > > wouldn't I just pass all of them every time? In that case, why are
they
> > > even required since vdsm would have to search anyway?
> > It's for optimization mostly, the engine usually has a good idea of where
> > stuff are, having it give hints to VDSM can speed up the search process.
> > also, then engines knows how transient some storage pieces are. If you
> > have a domain that is only there for backup or "owned" by another
manager
> > sharing the host, you don't want you VMs using the disks that are on that
> > storage effectively preventing it from being removed (though we do have
> > plans to have qemu switch base snapshots at runtime for just that).
>
> This is not a clean design. If the search is slow, then maybe we need to
> improve caching internally. Making a client cache a bunch of internal IDs
> to pass around sounds like a complete layering violation to me.
You can't cache this, if the same template exists on an 2 different NFS
domains only the engine has enough information to know which you should use.
We only have the engine give us thing information when starting a VM or
merging\copying an image that resides on multiple domains. It is also
completely optional. I didn't like it either.
Is it even valid for the same template (with identical uuids) to exist in two
places? I thought uuids aren't supposed to collide. I can envision some
scenario where a cached storagedomain/storagepool relationship becomes invalid
because another user detached the storagedomain. In that case, the API just
returns the normal error about "sd XXX is not attached to sp XXX". So I
don't
see any problem here.
>
> > >
> > > > As to making the current API a bit simpler. As I said, making them
> > > > opaque is problematic as currently the engine is responsible for
> > > > creating the IDs.
> > >
> > > As I mentioned in my last post, engine still can specify the ID's
when
> > > the object is first created. From that point forward the ID never
> > > changes so it can be baked into the identifier.
> > Where will this identifier be persisted?
> > >
> > > > Further more, some calls require you to play with these (making a
> > > > template instead of a snapshot). Also, the full chain and topology
> > > > needs to be completely visible to the engine.
> > >
> > > Please provide a specific example of how you play with the IDs. I can
> > > guess where you are going, but I don't want to divert the thread.
> > The relationship between volumes and images is deceptive at the moment.
> > IMG is the chain and volume is a member, IMGUUID is only used to for
> > verification and to detect when we hit a template going up the chain.
> > When you do operation on images assumptions are being guaranteed about the
> > resulting IDs. When you copy an image, you assume to know all the new IDs
> > as they remain the same. With your method I can't tell what the new
> > "opaque" result is going to be. Preview mode (another abomination
being
> > deprecated) relies on the disconnect between imgUUID and volUUID. Live
> > migration currently moves a lot of the responsibility to the engine.
>
> No client should need to know about all of these internal details. I
> understand that's the way it is today, and that's one of the main reasons
> that the API is a complete pain to use.
You are correct but this is how this API was designed you can't get away from
that.
We are trying to fix the problems on the second try. This will inevitably cause
a little discomfort in the short term but will be worth it in the long run.
>
> > >
> > > > These things, as you said, are problematic. But this is the way
things
> > > > are today.
> > >
> > > We are changing them.
> > Any intermediary step is needlessly problematic for existing clients.
> > Work is already in progress for fixing the API properly, making some calls
> > a bit nicer isn't an excuse to start making more compatibility code in the
> > engine.
>
> The engine won't need compatibility code. This only would impact the
> jsonrpc bindings which aren't used by engine yet. When engine switches
> over, then yes it would need to adapt.
This means that you put it as a condition to adapting the json-rpc base API
which will postpone engine adoption significantly.
No one has said the transition to json-rpc would be effortless. We're not
switching protocols merely of disdain for xmlrpc. We are planning some
structural changes to the API to improve usability for everyone.
>
> > >
> > > > As for task IDs. Currently task IDs are only used for storage and
> > > > they get persisted to disk. This is WRONG and is not the case with
the
> > > > new storage API. Because we moved to an asynchronous message based
> > > > protocol (json-rpc over TCP\AMQP) there is no need to generate a
task
> > > > ID. it is built in to json-rpc. json-rpc specifies that the IDs
have
> > > > to be unique for a client as long as the request is still active.
> > > > This is good enough as internally we can have a verb for a client to
> > > > query it's own running tasks and a verb to query other host tasks
by
> > > > mangling in the client before the ID. Because the protocol is
> > >
> > > So this would rely on the client keeping the connection open and as soon
> > > as it disconnects it would lose the ability to query tasks from before
> > > the connection went down? I don't know if it's a good idea to
conflate
> > > message ID's with task ID's. While the protocol can operate
> > > asynchronously, some calls have synchronous semantics and others have
> > > asynchronous semantics. I would expect sync calls to return their data
> > > immediately and async calls to return immediately with either: an error
> > > code, or an 'operation started' message and associated ID for
querying
> > > the status of the operation.
> > Upon reflection I agree that having the request ID unique per client is
> > problematic and we need to make sure they are unique per host at every
> > point in time.
> > >
> > > > asynchronous all calls are asynchronous by nature well. Tasks will
no
> > > > longer be persisted or expected to be persisted. It's the
callers
> > > > responsibility to query the state and see if the operation succeeded
> > > > or failed if the caller or VDSM died in the middle of the call. The
> > > > current "cleanTask()" system can't be used when more
then one client
> > > > is using VDSM and will not be used for anything other then legacy
> > > > storage.
> > >
> > > I agree about not persisting tasks in the future. Although I think
> > > finished tasks should remain in memory for some time so they can be
> > > queried by a client who must reconnect.
> > I am completely against keeping the task for a nominal amount of time, it
> > just makes another flow. You need to have code that makes up in case you
> > missed that window any way then just have one recovery code path, when
> > VDSM looses you task or you lose VDSM recover immediately. Also, because
> > task IDs can be reused once they expire assuming that the task you
> > encountered is the same task you originally sent is problematic.
> >
> > If you expect intermittent connections use the AMQP backend (which will
> > support broker-less p2p communication as well)
>
> HOw will you tell the difference between a completed task and an invalid (no
> such task) task? Do all completed tasks just issue a task completed event?
When task complete they return a value (and might even send a task completed
even). If you lost that task you will just have to handle that by inspecting
the current state. As I said, you will have to do that anyway if you put a
"sleep" on finished task and you missed the window or you are tracking a task
that is owned by another VDSM user. If we have to handle crash situations
might as well be crash only.
http://en.wikipedia.org/wiki/Crash-only_software
>
> > >
> > > > AFAIK Apart from storage all objects IDs are constructed with a
single
> > > > ID, name or alias. VMs, storageConnections, network interfaces. So
> > > > it's not a real issue. I agree that in the future we should keep
the
> > > > idiom of pass configuration once, name it, and keep using the name
to
> > > > reference the object.
> > >
> > > Yes, storage is the major problem here.
> > And, as I said, changing the API is problematic for migration of current
> > users.
> > >
>
> -- Adam Litke <agl(a)us.ibm.com> IBM Linux Technology Center
>
>
--
Adam Litke <agl(a)us.ibm.com>
IBM Linux Technology Center