Re: [Engine-devel] [RFC] New Connection Management API

26 Jan 2012

      ----- Original Message -----
...
From: "Livnat Peer" <lpeer@redhat.com>
To: "Saggi Mizrahi" <smizrahi@redhat.com>
Cc: vdsm-devel@lists.fedorahosted.org, engine-devel@ovirt.org
Sent: Thursday, January 26, 2012 3:03:39 PM
Subject: Re: [Engine-devel] [RFC] New Connection Management API
On 26/01/12 17:00, Saggi Mizrahi wrote:
...
<snip>
Again trying to sum up and address all comments
Clear all:
==========
My opinions is still to not implement it.
Even though it might generate a bit more traffic premature
optimization is bad and there are other reasons we can improve
VDSM command overhead without doing this.
In any case this argument is redundant because my intention is (as
Litke pointed out) is to have a lean API.
and API call is something you have to support across versions, this
call implemented in the engine is something that no one has to
support and can change\evolve easily.
As a rule, if an API call C and be implemented by doing A + B then
C is redundant.
I disagree with the above statement, exposing a bulk of operations in
a
single API call is very common and not considered redundant.
I agree that that APIs with those kind of calls exist but it doesn't mean they are not redundant.
re·dun·dant: adj. (of words or data) Able to be omitted without loss of meaning or function

This call can be omitted without loss of function.
API calls are a commitment for generations. Wrapping this in the clients doesn't.
To quot myself:
"API call is something you have to support across versions, this
call implemented in the engine is something that no one has to
support and can change\evolve easily." ~ Saggi Mizrahi, a few lines above

this API set will one day be considered stupid, obsolete and annoying.
That's just how life is. We'll find better ways of solving these problems.
When that moment comes I want to have as little functionality as possible I have to keep maintaining.
I doubt there is any way you can convince me otherwise.

Put yourself in my position and think if you would have made this sacrifice just to save someone a loop.

To sum up, I will not add any API calls I don't absolutely have to.

As to the amount of calls, this is not relevant to the clear all verb. This is addressed by the point right below this sentence.
...
...
List of connections as args:
============================
Sorry I forgot to respond about that. I'm not as strongly opposed
to the idea as the other things you suggested. It'll just make
implementing the persistence logic in VDSM significantly more
complicated as I will have to commit multiple connection
information to disk in an all or nothing mode. I can create a
small sqlitedb to do that or do some directory tricks and exploit
FS rename atomicity but I'd rather not.
The demands are not without base. I would like to keep the code
simple under the hood in the price of a few more calls. You would
like to make less calls and keep the code simpler on your side.
There isn't a real way to settle this.
It is not about keeping the code simple (writing a loop is simple as
well), it is about redundant round trips.
As I said, I agree there is merit there.

I think that roundtrips is a general issue not specific to this call.
My opinion is that communication with VDSM should just use HTTP pipelining (http://en.wikipedia.org/wiki/HTTP_pipelining)
This will solve the problem globally instead of tacking it on to the API interface.

I generally prefer simplicity of the API and the implementation, and correctness over performance.

I laid out out what the change entails, multiple ways of solving this, and my personal perspective.
Unless someone on the list objects to either solution, Ayal will have final say on this matter.
He is more of a pragmatist than I (and doing what he says usually correlates with me getting my paycheck).
...
...
If anyone on the list as pros and cons for either way I'd be happy
to hear them.
If no compelling arguments arise I will let Ayal call this one.
Transient connections:
======================
The problem you are describing as I understand it is that VDSM did
not respond and not that the API client did not respond.
Again, this can happen for a number of reason, most of which VDSM
might not be aware that there is actually a problem (network
issues).
This relates to the EOL policy. I agree we have to find a good way
to define an automatic EOL for resources. I have made my
suggestion. Out of the scope of the API.
In the meantime cleaning stale connections is trivial and I have
made it clear a previous email about how to go about it in a
simple non intrusive way. Clean hosts on host connect, and on
every poll if you find connections that you don't like. This
should keep things squeaky clean.
I have no additional input on this.
The only real legitimate reservation you still have with the API is transient connections. As I said, if you can find a way to define an End Of Life condition I will implement it. Promise, David star my heart and hope to die.

Just some pointers:
* It cannot be time as no flow is really restricted by time.
* It cannot be tied to specific calls as an argument (like createStorageDomain, connectStorageDomain, etc...) as this is arbitrary and solves problems in very specific flows.
* It cannot be tied to VDSM restart or host restart. This just makes no sense as there is never a connection between these and a flow.
* It cannot be tied to number of connection attempts. Flows should be able to decide weather they want to keep on trying or bail out. The EOL mechanism should not come in the expense of preventing legitimate EOL. VDSM\Client\ can't clean up because of some unrecoverable error.

I proposed my solution to the EOL issue. The only problem I see you having with it, is that it's not actually VDSM that has to implement it. You can either accept it or give a better one.

I know that garbage collection and other fun tricks made everyone spoiled rotten about managing resources.
But to create an automatic resource freeing mechanism (GC) you got to have a clear EOL condition.

Examples:
- Cleaning FDs is easy as processes have a very clear EOL.
- With memory it is a bit more complicated but this is why you got ref-counting, hard-refs and weak-refs, loop detection, scoped pointers, and a million other strategies.

All I need to create a collection mechanism is a clear condition where I am guaranteed that the resource is not in use.
Until I get that you will just have to manage the resources yourself.
This is not an API issue, this is a general problem when making software.

I don't go ask the lvm guys to automatically clear VGs if I fail while creating a block domain.
I understand and accept the fact that they are just too low on the stack to solve this for me.
...
I truly appreciate your effort for modeling clean and simple API, but
at
the end of the day if the users of the API don't think it is clean
and
simple you missed your goal.
Simple is an amorphic concept that everyone have a personal definition for.
I don't care if this doesn't fit in to what you consider simple.
You are free to suggest improvements.
Hell, just write your dream API if it's better then this well use it.
...
...
----- Original Message -----
...
From: "Livnat Peer" <lpeer@redhat.com>
To: "Saggi Mizrahi" <smizrahi@redhat.com>
Cc: vdsm-devel@lists.fedorahosted.org, engine-devel@ovirt.org
Sent: Thursday, January 26, 2012 5:22:42 AM
Subject: Re: [Engine-devel] [RFC] New Connection Management API
On 25/01/12 23:35, Saggi Mizrahi wrote:
...
<SNIP>
This is mail was getting way too long.
About the clear all verb.
No.
Just loop, find the connections YOU OWN and clean them. Even
though
you don't want to support multiple clients to VDSM API doesn't
mean the engine shouldn't behave like a proper citizen.
It's the same reason why VDSM tries and not mess system resources
it didn't initiate.
There is a big difference, VDSM living in hybrid mode with other
workload on the host is a valid use case, having more than one
concurrent manager for VDSM is not.
Generating a disconnect request for each connection does not seem
like
the right API to me, again think on the simple flow of moving host
from
one data center to another, the engine needs to disconnect tall
storage
domains (each domain can have couple of connections associated
with
it).
I am giving example from the engine use cases as it is the main
user
of
VDSM ATM but I am sure it will be relevant to any other user of
VDSM.
...
------------------------
As I see it the only point of conflict is the so called
non-peristed connections.
I will call them transient connections from now on.
There are 2 user cases being discussed
1. Wait until a connection is made, if it fails don't retry and
automatically unmanage.
2. If the called of the API forgets or fails to unmanage a
connection.
Actually I was not discussing #2 at all.
...
Your suggestion as I understand it:
Transient connections are:
     - Connection that VDSM will only try to connect to once and
     will not reconnect to in case of disconnect.
yes
...
My problem with this definition that it does not specify the "end
of life" of the connection.
Meaning it solves only use case 1.
since this is the only use case i had in mind, it is what i was
looking for.
...
If all is well, and it usually is, VDSM will not invoke a
disconnect.
So the caller would have to call unmanage if the connection
succeeded at the end of the flow.
agree.
...
Now, if you are already calling unmanage if connection succeeded
you can just call it anyway.
not exactly, an example I gave earlier on the thread was that VSDM
hangs
or have other error and the engine can not initiate unmanaged,
instead
let's assume the host is fenced (self-fence or external fence does
not
matter), in this scenario the engine will not issue unmanage.
...
instead of doing: (with your suggestion)
----------------
manage
wait until succeeds or lastError has value
try:
  do stuff
finally:
  unmanage
do: (with the canonical flow)
---
manage
try:
  wait until succeeds or lastError has value
  do stuff
finally:
  unmanage
This is simpler to do than having another connection type.
You are assuming the engine can communicate with VDSM and there
are
scenarios where it is not feasible.
...
Now that we got that out of the way lets talk about the 2nd use
case.
Since I did not ask VDSM to clean after the (engine) user and you
don't
want to do it I am not sure we need to discuss this.
If you insist we can start the discussion on who should implement
the
cleanup mechanism but I'm afraid I have no strong arguments for
VDSM
to
do it, so I rather not go there ;)
You dropped from the discussion my request for supporting list of
connections for manage and unmanage verbs.
...
API client died in the middle of the operation and unmanage was
never called.
Your suggested definition means that unless there was a problem
with the connection VDSM will still have this connection active.
The engine will have to clean it anyway.
The problem is, VDSM has no way of knowing that a client died,
forgot or is thinking really hard and will continue on in about 2
minutes.
...
Connections that live until they die is a hard to define and work
with lifecycle. Solving this problem is theoretically simple.
Have clients hold some sort of session token and force the client
to update it at a specified interval. You could bind resources
(like domains, VMs, connections) to that session token so when it
expires VDSM auto cleans the resources.
This kind of mechanism is out of the scope of this API change.
Further more I think that this mechanism should sit in the engine
since the session might actually contain resources from multiple
hosts and resources that are not managed by VDSM.
In GUI flows specifically the user might do actions that don't
even
touch the engine and forcing it to refresh the engine token is
simpler then having it refresh the VDSM token.
I understand that engine currently has no way of tracking a user
session. This, as I said, is also true in the case of VDSM. We
can
start and argue about which project should implement the session
semantics. But as I see it it's not relevant to the connection
management API.