Re: [Engine-devel] [vdsm] [RFC] New Connection Management API

Thursday, 26 January 2012

----- Original Message -----
...
 From: "Adam Litke" <agl(a)us.ibm.com&gt;
 To: "Saggi Mizrahi" <smizrahi(a)redhat.com&gt;
 Cc: "Livnat Peer" <lpeer(a)redhat.com&gt;, engine-devel(a)ovirt.org,
vdsm-devel(a)lists.fedorahosted.org
 Sent: Thursday, January 26, 2012 1:58:40 PM
 Subject: Re: [vdsm] [Engine-devel] [RFC] New Connection Management API

 On Thu, Jan 26, 2012 at 10:00:57AM -0500, Saggi Mizrahi wrote:
 > <snip>
 > Again trying to sum up and address all comments
 > 
 > Clear all:
 > ==========
 > My opinions is still to not implement it.
 > Even though it might generate a bit more traffic premature
 > optimization is bad and there are other reasons we can improve
 > VDSM command overhead without doing this.
 > 
 > In any case this argument is redundant because my intention is (as
 > Litke pointed out) is to have a lean API.
 > and API call is something you have to support across versions, this
 > call implemented in the engine is something that no one has to
 > support and can change\evolve easily.
 > 
 > As a rule, if an API call C and be implemented by doing A + B then
 > C is redundant.
 > 
 > List of connections as args:
 > ============================
 > Sorry I forgot to respond about that. I'm not as strongly opposed
 > to the idea as the other things you suggested. It'll just make
 > implementing the persistence logic in VDSM significantly more
 > complicated as I will have to commit multiple connection
 > information to disk in an all or nothing mode. I can create a
 > small sqlitedb to do that or do some directory tricks and exploit
 > FS rename atomicity but I'd rather not.

 I would be strongly opposed to introducing a sqlite database into
 vdsm just to
 enable "convenience mode" for this API.  Does the operation really
 need to be
 atomic?  Why not just perform each connection sequentially and return
 a list of
 statuses? Is the only motivation for allowing a list of parameters
 to reduce
 the number of API calls between engine and vdsm)?  If so, the same
 argument
 Saggi makes above applies here. 
I try and have VDSM expose APIs that are simple to predict. a command can either succeed
or fail.
The problem is not actually validating the connections. The problem is that once I
concluded that they are all OK I need to persist to disk the information that will allow
me to reconnect if VDSM happens to crash. If I naively save them one by one I could get in
a state where only some of the connections persisted before the operation failed. So I
have to somehow put all this in a transaction.

I don't have to use sqlite. I could also put all the persistence information in a new
dir for every call named <UUID>.tmp. Once I wrote everything down I rename the
directory to just <UUID> and fsync it. This is guarantied by posix to be atomic. For
unmanage, I move all the persistence information from the directories they sit in to a new
dir named <UUID>. Rename it to a <UUDI>.tmp, fsync it and then remove it.

This all just looks like more trouble then it's worth to me.

...

 > The demands are not without base. I would like to keep the code
 > simple under the hood in the price of a few more calls. You would
 > like to make less calls and keep the code simpler on your side.
 > There isn't a real way to settle this.
 > If anyone on the list as pros and cons for either way I'd be happy
 > to hear them.
 > If no compelling arguments arise I will let Ayal call this one.
 > 
 > Transient connections:
 > ======================
 > The problem you are describing as I understand it is that VDSM did
 > not respond and not that the API client did not respond.
 > Again, this can happen for a number of reason, most of which VDSM
 > might not be aware that there is actually a problem (network
 > issues).
 > 
 > This relates to the EOL policy. I agree we have to find a good way
 > to define an automatic EOL for resources. I have made my
 > suggestion. Out of the scope of the API.
 > 
 > In the meantime cleaning stale connections is trivial and I have
 > made it clear a previous email about how to go about it in a
 > simple non intrusive way. Clean hosts on host connect, and on
 > every poll if you find connections that you don't like. This
 > should keep things squeaky clean.
 > 
 > ----- Original Message -----
 > > From: "Livnat Peer" <lpeer(a)redhat.com&gt;
 > > To: "Saggi Mizrahi" <smizrahi(a)redhat.com&gt;
 > > Cc: vdsm-devel(a)lists.fedorahosted.org, engine-devel(a)ovirt.org
 > > Sent: Thursday, January 26, 2012 5:22:42 AM
 > > Subject: Re: [Engine-devel] [RFC] New Connection Management API
 > > 
 > > On 25/01/12 23:35, Saggi Mizrahi wrote:
 > > > <SNIP>
 > > > This is mail was getting way too long.
 > > > 
 > > > About the clear all verb.
 > > > No.
 > > > Just loop, find the connections YOU OWN and clean them. Even
 > > > though
 > > > you don't want to support multiple clients to VDSM API doesn't
 > > > mean the engine shouldn't behave like a proper citizen.
 > > > It's the same reason why VDSM tries and not mess system
 > > > resources
 > > > it didn't initiate.
 > > 
 > > 
 > > There is a big difference, VDSM living in hybrid mode with other
 > > workload on the host is a valid use case, having more than one
 > > concurrent manager for VDSM is not.
 > > Generating a disconnect request for each connection does not seem
 > > like
 > > the right API to me, again think on the simple flow of moving
 > > host
 > > from
 > > one data center to another, the engine needs to disconnect tall
 > > storage
 > > domains (each domain can have couple of connections associated
 > > with
 > > it).
 > > 
 > > I am giving example from the engine use cases as it is the main
 > > user
 > > of
 > > VDSM ATM but I am sure it will be relevant to any other user of
 > > VDSM.
 > > 
 > > > 
 > > > ------------------------
 > > > 
 > > > As I see it the only point of conflict is the so called
 > > > non-peristed connections.
 > > > I will call them transient connections from now on.
 > > > 
 > > > There are 2 user cases being discussed
 > > > 1. Wait until a connection is made, if it fails don't retry and
 > > > automatically unmanage.
 > > > 2. If the called of the API forgets or fails to unmanage a
 > > > connection.
 > > > 
 > > 
 > > Actually I was not discussing #2 at all.
 > > 
 > > > Your suggestion as I understand it:
 > > > Transient connections are:
 > > >      - Connection that VDSM will only try to connect to once
 > > >      and
 > > >      will not reconnect to in case of disconnect.
 > > 
 > > yes
 > > 
 > > > 
 > > > My problem with this definition that it does not specify the
 > > > "end
 > > > of life" of the connection.
 > > > Meaning it solves only use case 1.
 > > 
 > > since this is the only use case i had in mind, it is what i was
 > > looking for.
 > > 
 > > > If all is well, and it usually is, VDSM will not invoke a
 > > > disconnect.
 > > > So the caller would have to call unmanage if the connection
 > > > succeeded at the end of the flow.
 > > 
 > > agree.
 > > 
 > > > Now, if you are already calling unmanage if connection
 > > > succeeded
 > > > you can just call it anyway.
 > > 
 > > not exactly, an example I gave earlier on the thread was that
 > > VSDM
 > > hangs
 > > or have other error and the engine can not initiate unmanaged,
 > > instead
 > > let's assume the host is fenced (self-fence or external fence
 > > does
 > > not
 > > matter), in this scenario the engine will not issue unmanage.
 > > 
 > > > 
 > > > instead of doing: (with your suggestion)
 > > > ----------------
 > > > manage
 > > > wait until succeeds or lastError has value
 > > > try:
 > > >   do stuff
 > > > finally:
 > > >   unmanage
 > > > 
 > > > do: (with the canonical flow)
 > > > ---
 > > > manage
 > > > try:
 > > >   wait until succeeds or lastError has value
 > > >   do stuff
 > > > finally:
 > > >   unmanage
 > > > 
 > > > This is simpler to do than having another connection type.
 > > 
 > > You are assuming the engine can communicate with VDSM and there
 > > are
 > > scenarios where it is not feasible.
 > > 
 > > > 
 > > > Now that we got that out of the way lets talk about the 2nd use
 > > > case.
 > > 
 > > Since I did not ask VDSM to clean after the (engine) user and you
 > > don't
 > > want to do it I am not sure we need to discuss this.
 > > 
 > > If you insist we can start the discussion on who should implement
 > > the
 > > cleanup mechanism but I'm afraid I have no strong arguments for
 > > VDSM
 > > to
 > > do it, so I rather not go there ;)
 > > 
 > > 
 > > You dropped from the discussion my request for supporting list of
 > > connections for manage and unmanage verbs.
 > > 
 > > > API client died in the middle of the operation and unmanage was
 > > > never called.
 > > > 
 > > > Your suggested definition means that unless there was a problem
 > > > with the connection VDSM will still have this connection
 > > > active.
 > > > The engine will have to clean it anyway.
 > > > 
 > > > The problem is, VDSM has no way of knowing that a client died,
 > > > forgot or is thinking really hard and will continue on in about
 > > > 2
 > > > minutes.
 > > 
 > > > 
 > > > Connections that live until they die is a hard to define and
 > > > work
 > > > with lifecycle. Solving this problem is theoretically simple.
 > > > 
 > > > Have clients hold some sort of session token and force the
 > > > client
 > > > to update it at a specified interval. You could bind resources
 > > > (like domains, VMs, connections) to that session token so when
 > > > it
 > > > expires VDSM auto cleans the resources.
 > > > 
 > > > This kind of mechanism is out of the scope of this API change.
 > > > Further more I think that this mechanism should sit in the
 > > > engine
 > > > since the session might actually contain resources from
 > > > multiple
 > > > hosts and resources that are not managed by VDSM.
 > > > 
 > > > In GUI flows specifically the user might do actions that don't
 > > > even
 > > > touch the engine and forcing it to refresh the engine token is
 > > > simpler then having it refresh the VDSM token.
 > > > 
 > > > I understand that engine currently has no way of tracking a
 > > > user
 > > > session. This, as I said, is also true in the case of VDSM. We
 > > > can
 > > > start and argue about which project should implement the
 > > > session
 > > > semantics. But as I see it it's not relevant to the connection
 > > > management API.
 > > 
 > > 
 > _______________________________________________
 > vdsm-devel mailing list
 > vdsm-devel(a)lists.fedorahosted.org
 > https://fedorahosted.org/mailman/listinfo/vdsm-devel

 --
 Adam Litke <agl(a)us.ibm.com&gt;
 IBM Linux Technology Center

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Engine-devel] [vdsm] [RFC] New Connection Management API