----- Original Message -----
From: "Adam Litke" <agl(a)us.ibm.com>
To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
Cc: "Livnat Peer" <lpeer(a)redhat.com>, engine-devel(a)ovirt.org,
vdsm-devel(a)lists.fedorahosted.org
Sent: Thursday, January 26, 2012 1:58:40 PM
Subject: Re: [vdsm] [Engine-devel] [RFC] New Connection Management API
On Thu, Jan 26, 2012 at 10:00:57AM -0500, Saggi Mizrahi wrote:
> <snip>
> Again trying to sum up and address all comments
>
> Clear all:
> ==========
> My opinions is still to not implement it.
> Even though it might generate a bit more traffic premature
> optimization is bad and there are other reasons we can improve
> VDSM command overhead without doing this.
>
> In any case this argument is redundant because my intention is (as
> Litke pointed out) is to have a lean API.
> and API call is something you have to support across versions, this
> call implemented in the engine is something that no one has to
> support and can change\evolve easily.
>
> As a rule, if an API call C and be implemented by doing A + B then
> C is redundant.
>
> List of connections as args:
> ============================
> Sorry I forgot to respond about that. I'm not as strongly opposed
> to the idea as the other things you suggested. It'll just make
> implementing the persistence logic in VDSM significantly more
> complicated as I will have to commit multiple connection
> information to disk in an all or nothing mode. I can create a
> small sqlitedb to do that or do some directory tricks and exploit
> FS rename atomicity but I'd rather not.
I would be strongly opposed to introducing a sqlite database into
vdsm just to
enable "convenience mode" for this API. Does the operation really
need to be
atomic? Why not just perform each connection sequentially and return
a list of
statuses? Is the only motivation for allowing a list of parameters
to reduce
the number of API calls between engine and vdsm)? If so, the same
argument
Saggi makes above applies here.
I try and have VDSM expose APIs that are simple to predict. a command can either succeed
or fail.
The problem is not actually validating the connections. The problem is that once I
concluded that they are all OK I need to persist to disk the information that will allow
me to reconnect if VDSM happens to crash. If I naively save them one by one I could get in
a state where only some of the connections persisted before the operation failed. So I
have to somehow put all this in a transaction.
I don't have to use sqlite. I could also put all the persistence information in a new
dir for every call named <UUID>.tmp. Once I wrote everything down I rename the
directory to just <UUID> and fsync it. This is guarantied by posix to be atomic. For
unmanage, I move all the persistence information from the directories they sit in to a new
dir named <UUID>. Rename it to a <UUDI>.tmp, fsync it and then remove it.
This all just looks like more trouble then it's worth to me.
> The demands are not without base. I would like to keep the code
> simple under the hood in the price of a few more calls. You would
> like to make less calls and keep the code simpler on your side.
> There isn't a real way to settle this.
> If anyone on the list as pros and cons for either way I'd be happy
> to hear them.
> If no compelling arguments arise I will let Ayal call this one.
>
> Transient connections:
> ======================
> The problem you are describing as I understand it is that VDSM did
> not respond and not that the API client did not respond.
> Again, this can happen for a number of reason, most of which VDSM
> might not be aware that there is actually a problem (network
> issues).
>
> This relates to the EOL policy. I agree we have to find a good way
> to define an automatic EOL for resources. I have made my
> suggestion. Out of the scope of the API.
>
> In the meantime cleaning stale connections is trivial and I have
> made it clear a previous email about how to go about it in a
> simple non intrusive way. Clean hosts on host connect, and on
> every poll if you find connections that you don't like. This
> should keep things squeaky clean.
>
> ----- Original Message -----
> > From: "Livnat Peer" <lpeer(a)redhat.com>
> > To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
> > Cc: vdsm-devel(a)lists.fedorahosted.org, engine-devel(a)ovirt.org
> > Sent: Thursday, January 26, 2012 5:22:42 AM
> > Subject: Re: [Engine-devel] [RFC] New Connection Management API
> >
> > On 25/01/12 23:35, Saggi Mizrahi wrote:
> > > <SNIP>
> > > This is mail was getting way too long.
> > >
> > > About the clear all verb.
> > > No.
> > > Just loop, find the connections YOU OWN and clean them. Even
> > > though
> > > you don't want to support multiple clients to VDSM API doesn't
> > > mean the engine shouldn't behave like a proper citizen.
> > > It's the same reason why VDSM tries and not mess system
> > > resources
> > > it didn't initiate.
> >
> >
> > There is a big difference, VDSM living in hybrid mode with other
> > workload on the host is a valid use case, having more than one
> > concurrent manager for VDSM is not.
> > Generating a disconnect request for each connection does not seem
> > like
> > the right API to me, again think on the simple flow of moving
> > host
> > from
> > one data center to another, the engine needs to disconnect tall
> > storage
> > domains (each domain can have couple of connections associated
> > with
> > it).
> >
> > I am giving example from the engine use cases as it is the main
> > user
> > of
> > VDSM ATM but I am sure it will be relevant to any other user of
> > VDSM.
> >
> > >
> > > ------------------------
> > >
> > > As I see it the only point of conflict is the so called
> > > non-peristed connections.
> > > I will call them transient connections from now on.
> > >
> > > There are 2 user cases being discussed
> > > 1. Wait until a connection is made, if it fails don't retry and
> > > automatically unmanage.
> > > 2. If the called of the API forgets or fails to unmanage a
> > > connection.
> > >
> >
> > Actually I was not discussing #2 at all.
> >
> > > Your suggestion as I understand it:
> > > Transient connections are:
> > > - Connection that VDSM will only try to connect to once
> > > and
> > > will not reconnect to in case of disconnect.
> >
> > yes
> >
> > >
> > > My problem with this definition that it does not specify the
> > > "end
> > > of life" of the connection.
> > > Meaning it solves only use case 1.
> >
> > since this is the only use case i had in mind, it is what i was
> > looking for.
> >
> > > If all is well, and it usually is, VDSM will not invoke a
> > > disconnect.
> > > So the caller would have to call unmanage if the connection
> > > succeeded at the end of the flow.
> >
> > agree.
> >
> > > Now, if you are already calling unmanage if connection
> > > succeeded
> > > you can just call it anyway.
> >
> > not exactly, an example I gave earlier on the thread was that
> > VSDM
> > hangs
> > or have other error and the engine can not initiate unmanaged,
> > instead
> > let's assume the host is fenced (self-fence or external fence
> > does
> > not
> > matter), in this scenario the engine will not issue unmanage.
> >
> > >
> > > instead of doing: (with your suggestion)
> > > ----------------
> > > manage
> > > wait until succeeds or lastError has value
> > > try:
> > > do stuff
> > > finally:
> > > unmanage
> > >
> > > do: (with the canonical flow)
> > > ---
> > > manage
> > > try:
> > > wait until succeeds or lastError has value
> > > do stuff
> > > finally:
> > > unmanage
> > >
> > > This is simpler to do than having another connection type.
> >
> > You are assuming the engine can communicate with VDSM and there
> > are
> > scenarios where it is not feasible.
> >
> > >
> > > Now that we got that out of the way lets talk about the 2nd use
> > > case.
> >
> > Since I did not ask VDSM to clean after the (engine) user and you
> > don't
> > want to do it I am not sure we need to discuss this.
> >
> > If you insist we can start the discussion on who should implement
> > the
> > cleanup mechanism but I'm afraid I have no strong arguments for
> > VDSM
> > to
> > do it, so I rather not go there ;)
> >
> >
> > You dropped from the discussion my request for supporting list of
> > connections for manage and unmanage verbs.
> >
> > > API client died in the middle of the operation and unmanage was
> > > never called.
> > >
> > > Your suggested definition means that unless there was a problem
> > > with the connection VDSM will still have this connection
> > > active.
> > > The engine will have to clean it anyway.
> > >
> > > The problem is, VDSM has no way of knowing that a client died,
> > > forgot or is thinking really hard and will continue on in about
> > > 2
> > > minutes.
> >
> > >
> > > Connections that live until they die is a hard to define and
> > > work
> > > with lifecycle. Solving this problem is theoretically simple.
> > >
> > > Have clients hold some sort of session token and force the
> > > client
> > > to update it at a specified interval. You could bind resources
> > > (like domains, VMs, connections) to that session token so when
> > > it
> > > expires VDSM auto cleans the resources.
> > >
> > > This kind of mechanism is out of the scope of this API change.
> > > Further more I think that this mechanism should sit in the
> > > engine
> > > since the session might actually contain resources from
> > > multiple
> > > hosts and resources that are not managed by VDSM.
> > >
> > > In GUI flows specifically the user might do actions that don't
> > > even
> > > touch the engine and forcing it to refresh the engine token is
> > > simpler then having it refresh the VDSM token.
> > >
> > > I understand that engine currently has no way of tracking a
> > > user
> > > session. This, as I said, is also true in the case of VDSM. We
> > > can
> > > start and argue about which project should implement the
> > > session
> > > semantics. But as I see it it's not relevant to the connection
> > > management API.
> >
> >
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel(a)lists.fedorahosted.org
>
https://fedorahosted.org/mailman/listinfo/vdsm-devel
--
Adam Litke <agl(a)us.ibm.com>
IBM Linux Technology Center