On 26/01/12 21:21, Saggi Mizrahi wrote:
----- Original Message -----
> From: "Adam Litke" <agl(a)us.ibm.com>
> To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
> Cc: "Livnat Peer" <lpeer(a)redhat.com>, engine-devel(a)ovirt.org,
vdsm-devel(a)lists.fedorahosted.org
> Sent: Thursday, January 26, 2012 1:58:40 PM
> Subject: Re: [vdsm] [Engine-devel] [RFC] New Connection Management API
>
> On Thu, Jan 26, 2012 at 10:00:57AM -0500, Saggi Mizrahi wrote:
>> <snip>
>> Again trying to sum up and address all comments
>>
>> Clear all:
>> ==========
>> My opinions is still to not implement it.
>> Even though it might generate a bit more traffic premature
>> optimization is bad and there are other reasons we can improve
>> VDSM command overhead without doing this.
>>
>> In any case this argument is redundant because my intention is (as
>> Litke pointed out) is to have a lean API.
>> and API call is something you have to support across versions, this
>> call implemented in the engine is something that no one has to
>> support and can change\evolve easily.
>>
>> As a rule, if an API call C and be implemented by doing A + B then
>> C is redundant.
>>
>> List of connections as args:
>> ============================
>> Sorry I forgot to respond about that. I'm not as strongly opposed
>> to the idea as the other things you suggested. It'll just make
>> implementing the persistence logic in VDSM significantly more
>> complicated as I will have to commit multiple connection
>> information to disk in an all or nothing mode. I can create a
>> small sqlitedb to do that or do some directory tricks and exploit
>> FS rename atomicity but I'd rather not.
>
> I would be strongly opposed to introducing a sqlite database into
> vdsm just to
> enable "convenience mode" for this API. Does the operation really
> need to be
> atomic? Why not just perform each connection sequentially and return
> a list of
> statuses? Is the only motivation for allowing a list of parameters
> to reduce
> the number of API calls between engine and vdsm)? If so, the same
> argument
> Saggi makes above applies here.
I try and have VDSM expose APIs that are simple to predict. a command can either succeed
or fail.
The problem is not actually validating the connections. The problem is that once I
concluded that they are all OK I need to persist to disk the information that will allow
me to reconnect if VDSM happens to crash. If I naively save them one by one I could get in
a state where only some of the connections persisted before the operation failed. So I
have to somehow put all this in a transaction.
I don't have to use sqlite. I could also put all the persistence information in a new
dir for every call named <UUID>.tmp. Once I wrote everything down I rename the
directory to just <UUID> and fsync it. This is guarantied by posix to be atomic. For
unmanage, I move all the persistence information from the directories they sit in to a new
dir named <UUID>. Rename it to a <UUDI>.tmp, fsync it and then remove it.
This all just looks like more trouble then it's worth to me.
I agree with Adam, I don't think the operation should be atomic, having
only some of the connections persisted is a perfectly valid outcome if
the API returns a list of statuses.
>
>> The demands are not without base. I would like to keep the code
>> simple under the hood in the price of a few more calls. You would
>> like to make less calls and keep the code simpler on your side.
>> There isn't a real way to settle this.
>> If anyone on the list as pros and cons for either way I'd be happy
>> to hear them.
>> If no compelling arguments arise I will let Ayal call this one.
>>
>> Transient connections:
>> ======================
>> The problem you are describing as I understand it is that VDSM did
>> not respond and not that the API client did not respond.
>> Again, this can happen for a number of reason, most of which VDSM
>> might not be aware that there is actually a problem (network
>> issues).
>>
>> This relates to the EOL policy. I agree we have to find a good way
>> to define an automatic EOL for resources. I have made my
>> suggestion. Out of the scope of the API.
>>
>> In the meantime cleaning stale connections is trivial and I have
>> made it clear a previous email about how to go about it in a
>> simple non intrusive way. Clean hosts on host connect, and on
>> every poll if you find connections that you don't like. This
>> should keep things squeaky clean.
>>
>> ----- Original Message -----
>>> From: "Livnat Peer" <lpeer(a)redhat.com>
>>> To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
>>> Cc: vdsm-devel(a)lists.fedorahosted.org, engine-devel(a)ovirt.org
>>> Sent: Thursday, January 26, 2012 5:22:42 AM
>>> Subject: Re: [Engine-devel] [RFC] New Connection Management API
>>>
>>> On 25/01/12 23:35, Saggi Mizrahi wrote:
>>>> <SNIP>
>>>> This is mail was getting way too long.
>>>>
>>>> About the clear all verb.
>>>> No.
>>>> Just loop, find the connections YOU OWN and clean them. Even
>>>> though
>>>> you don't want to support multiple clients to VDSM API doesn't
>>>> mean the engine shouldn't behave like a proper citizen.
>>>> It's the same reason why VDSM tries and not mess system
>>>> resources
>>>> it didn't initiate.
>>>
>>>
>>> There is a big difference, VDSM living in hybrid mode with other
>>> workload on the host is a valid use case, having more than one
>>> concurrent manager for VDSM is not.
>>> Generating a disconnect request for each connection does not seem
>>> like
>>> the right API to me, again think on the simple flow of moving
>>> host
>>> from
>>> one data center to another, the engine needs to disconnect tall
>>> storage
>>> domains (each domain can have couple of connections associated
>>> with
>>> it).
>>>
>>> I am giving example from the engine use cases as it is the main
>>> user
>>> of
>>> VDSM ATM but I am sure it will be relevant to any other user of
>>> VDSM.
>>>
>>>>
>>>> ------------------------
>>>>
>>>> As I see it the only point of conflict is the so called
>>>> non-peristed connections.
>>>> I will call them transient connections from now on.
>>>>
>>>> There are 2 user cases being discussed
>>>> 1. Wait until a connection is made, if it fails don't retry and
>>>> automatically unmanage.
>>>> 2. If the called of the API forgets or fails to unmanage a
>>>> connection.
>>>>
>>>
>>> Actually I was not discussing #2 at all.
>>>
>>>> Your suggestion as I understand it:
>>>> Transient connections are:
>>>> - Connection that VDSM will only try to connect to once
>>>> and
>>>> will not reconnect to in case of disconnect.
>>>
>>> yes
>>>
>>>>
>>>> My problem with this definition that it does not specify the
>>>> "end
>>>> of life" of the connection.
>>>> Meaning it solves only use case 1.
>>>
>>> since this is the only use case i had in mind, it is what i was
>>> looking for.
>>>
>>>> If all is well, and it usually is, VDSM will not invoke a
>>>> disconnect.
>>>> So the caller would have to call unmanage if the connection
>>>> succeeded at the end of the flow.
>>>
>>> agree.
>>>
>>>> Now, if you are already calling unmanage if connection
>>>> succeeded
>>>> you can just call it anyway.
>>>
>>> not exactly, an example I gave earlier on the thread was that
>>> VSDM
>>> hangs
>>> or have other error and the engine can not initiate unmanaged,
>>> instead
>>> let's assume the host is fenced (self-fence or external fence
>>> does
>>> not
>>> matter), in this scenario the engine will not issue unmanage.
>>>
>>>>
>>>> instead of doing: (with your suggestion)
>>>> ----------------
>>>> manage
>>>> wait until succeeds or lastError has value
>>>> try:
>>>> do stuff
>>>> finally:
>>>> unmanage
>>>>
>>>> do: (with the canonical flow)
>>>> ---
>>>> manage
>>>> try:
>>>> wait until succeeds or lastError has value
>>>> do stuff
>>>> finally:
>>>> unmanage
>>>>
>>>> This is simpler to do than having another connection type.
>>>
>>> You are assuming the engine can communicate with VDSM and there
>>> are
>>> scenarios where it is not feasible.
>>>
>>>>
>>>> Now that we got that out of the way lets talk about the 2nd use
>>>> case.
>>>
>>> Since I did not ask VDSM to clean after the (engine) user and you
>>> don't
>>> want to do it I am not sure we need to discuss this.
>>>
>>> If you insist we can start the discussion on who should implement
>>> the
>>> cleanup mechanism but I'm afraid I have no strong arguments for
>>> VDSM
>>> to
>>> do it, so I rather not go there ;)
>>>
>>>
>>> You dropped from the discussion my request for supporting list of
>>> connections for manage and unmanage verbs.
>>>
>>>> API client died in the middle of the operation and unmanage was
>>>> never called.
>>>>
>>>> Your suggested definition means that unless there was a problem
>>>> with the connection VDSM will still have this connection
>>>> active.
>>>> The engine will have to clean it anyway.
>>>>
>>>> The problem is, VDSM has no way of knowing that a client died,
>>>> forgot or is thinking really hard and will continue on in about
>>>> 2
>>>> minutes.
>>>
>>>>
>>>> Connections that live until they die is a hard to define and
>>>> work
>>>> with lifecycle. Solving this problem is theoretically simple.
>>>>
>>>> Have clients hold some sort of session token and force the
>>>> client
>>>> to update it at a specified interval. You could bind resources
>>>> (like domains, VMs, connections) to that session token so when
>>>> it
>>>> expires VDSM auto cleans the resources.
>>>>
>>>> This kind of mechanism is out of the scope of this API change.
>>>> Further more I think that this mechanism should sit in the
>>>> engine
>>>> since the session might actually contain resources from
>>>> multiple
>>>> hosts and resources that are not managed by VDSM.
>>>>
>>>> In GUI flows specifically the user might do actions that don't
>>>> even
>>>> touch the engine and forcing it to refresh the engine token is
>>>> simpler then having it refresh the VDSM token.
>>>>
>>>> I understand that engine currently has no way of tracking a
>>>> user
>>>> session. This, as I said, is also true in the case of VDSM. We
>>>> can
>>>> start and argue about which project should implement the
>>>> session
>>>> semantics. But as I see it it's not relevant to the connection
>>>> management API.
>>>
>>>
>> _______________________________________________
>> vdsm-devel mailing list
>> vdsm-devel(a)lists.fedorahosted.org
>>
https://fedorahosted.org/mailman/listinfo/vdsm-devel
>
> --
> Adam Litke <agl(a)us.ibm.com>
> IBM Linux Technology Center
>
>