[Engine-devel] [RFC] New Connection Management API

Livnat Peer lpeer at redhat.com
Wed Jan 25 20:41:57 UTC 2012


>> Hi Saggi,
>>
>> I see the added value in the above functionality and I think it is a
>> needed functionality in VDSM.
>>
>> Your suggestion includes 2 concepts:
>> - Persist connection - auto-reconnect on failures
>> - Reference counting (with CID granularity)
> It's not reference counting and the API user should not assume it is a reference count.
> Each CID can only registered once.

By reference counting with CID granularity I meant that as long as you
have more than one CID registered on a connection the connection will be
managed by the host.


> Subsequent requests to register will fail if the CID is already registered.
> There shouldn't be any assumptions between a manage call and how many physical connections are actually created.
> Optimizations like internal multiplexing are implementation detail and might change.


>>
>> Here are some comments:
>>
>> * Assuming you meant that the new API will be a replacement to the
>> current API (based on previous chats we had on this topic)
> It is
>> I think you
>> are missing a needed functionality to support non-persisted
>> connection.
> the problem is with the term non-persisted. Everything is transient depending on the scale of time you consider "temporary". I leave the decision on what is temporary to the API user and give him the freedom to implement any connection lifecycle mechanism he chooses. I assume that all components, including VDSM, can crash in the middle of a flow and might want to either recover and continue or roll back. I leave it the the user to decide how to handle this as he is the one managing the flow and not VDSM.

I would call connections that don't need to reconnect upon failure - non
persistent connection, it is not a function of time.

There are operations that upon failure can be done on another host and
there is no reason to reconnect to the storage target as it is not
interesting for the user any more.

>> creating a storage domain is an example where it can be useful.
>> The flow includes connecting the host to the storage server creating
>> storage domain and disconnecting from the storage server.

> I think you are confusing create storage domain flow in general and how create storage domain flow now in the RHEV GUI.
> A flow can have multiple strategies waiting for connection availability:
> * If it's a non interactive process I might not care if the actual connect takes 3 hours.
> * On the other hand if it's an interactive connect I might only want to wait for 1 minute even if an actual connect request takes a lot more because of some problem.
> * If I am testing the connection arguments I might want to wait until I see the connection succeed or lastError get a value no matter how long it takes.
> * I might want to try as long as the error is not credential related (or other non transient issue).
> * I might want to try until I see the connection active for X amount of time (To test for intermittent disconnects).
> 
> All of these can be accommodated by the suggested API.

That is great but i am not sure how is it related to non-persist connection.

> 
>> Let's assume VDSM hangs while creating the storage domain any
>> unmanageStorageServer will fail, the engine rolls back and tries to
>> create the storage domain on another host, there is no reason for the
>> host to reconnect to this storage server.
> That is true but there is no way for VDSM to know if the connection is deprecated or not. For all I know rhevm might be having issues and will continue on with the flow in a few minutes.

If there is an error VDSM doesn't need to reconnect the non-persist
connection, and it should be up to the VDSM user to ask for persist
connection or non-persist connection.


>> In the above flow i would use non-persist connection if I had one.
> Again, what does no-persist means.
>>
>> * In the suggested solution the connect will not initiate an
>> immediate
>> connect to the storage server instead it will register the connection
>> as
>> handled connection and will actually generate the connect as part of
>> the
>> managed connection mechanism.
> The mechanism guarantees maximum availability so it will immediately connect. The command might return before an actual connection succeeded as the Manage part is done.
>> I argue that this modeling is implementation driven which is wrong
>> from
>> the user perspective.
> VDSM is pretty low on the stack and has to accommodate many API users. I think it's wrong to model an API when you don't consider how things actually behave and try and glue stuff to appease a GUI flow. GUI flows change all the time, APIs don't so having a flexible API that supports multiple use patterns and does not enforce arbitrary limitations is better then one that is tightly coupled to 1 user flow.

I am not sure why do you think i am looking on the GUI flow, as I
actually was referring to the engine as the user of VDSM . The engine
has to support different clients, the UI is only one of them.

>> As a user I expect connect to actually initiate
>> a
>> connect action and that the return value should indicate if the
>> connect
>> succeeded, the way you modeled it the API will return true if you
>> succeeded 'registering' the connect.
>> You modeled the API to be asynchronous with no handler (task id) to
>> monitor the results of the action, which requires polling
> The API is not asynchronous it is perfectly synchronous. When manageStorageConnection() returns the connection is managed.
> You will have maximum connection uptime. You will have to poll and check for the liveness of the connection before using it as some problems may occur preventing VDSM from supplying the connection at the moment.
>> in the
>> create
>> storage domain flow which I really don't like.
>> In addition you introduced a verb for monitoring the status of the
>> connections alone I would like to be able to monitor it as part of
>> the
>> general host status and not have to poll on a new verb in addition to
>> the current one.
>>
>>
>> As part of solving the connection management flows in OE I am
>> missing:
>>
>> - A way to clear all managed connections.
>> use case: We move a host from one data center to another and we want
>> the
>> host to clear all the managed connections.
>> we can ask for the list of managed connection and clear them but
>> having
>> clearAll is much easier.
> Nope, you should get all active connections. Cherry pick the ones you own using some ID scheme (RHEVM_FLOWID_CON?) and only clear you own connections. There might be other client using VDSM that you will forcibly disconnect.

I hope that VDSM is going to serve many types of clients but clients
hybrid mode is the less interesting use case IMO.
How often will you have more than one virtualization manager manages the
same host? I think not a common use case, and if this is not the common
use case i expect the API to be more friendly to the single manager use
case.

moving the host from one data center to another is a clear use case
where clearAll API would be useful, and i am sure other clients will
find this API useful as well.

>>
>> - Handling a list of Ids in each API verb
> Only getDeviceList will have a list of IDs handed to it. It makes no sense in other verbs.

I disagree, if I need to connect a host to a storage domain I need to
execute number of API calls which is linear to the number of storage
servers i use for the storage domain, again not a friendly API.



>>
>> - A verb which handles create storage domain and encapsulates the
>> connect create and disconnect.
> This is a hackish ad-hoc solution. Why not have one for the entire pool? Why not have one for a VM?

I think we are going to remove pool in 4.0 so probably not, and for VM
well that's an interesting idea :)



>>
>>
>> Thanks, Livnat
>>
> 
> 
> I will try and sum the points ups here:
> manageConnection is not connectStorageServer. They are different. The latter means connect to the storage server, the former means manage it. They are both synchronous.
> 
> non-persistence makes no sense. Auto unmanage does. If anyone suggests a valid mechanism to auto clean CIDs that is correct and accommodates interactive and non interactive flows that I will be willing to accept it. Timeouts are never correct as no flow is really time capped and it will create more issues that it will solve.
> 

I am not sure what is the problem with the non-persistent mechanism i
suggested earlier.

> Polling the CID to track connection availability is the correct way to go as what you really want to do is not connect to the storage bu rather have it available. Polling is just waiting until the connection is available or condition has been triggered. This give the flow manager freedom of what the condition is (see above).
> 
> Cleaning the connections, like closing FDs, freeing memory. and other resource management is a pain. I understand, and having a transaction like mechanism to lock resources to a flow will be great but this is outside the scope of this change.
> 
> VDSM being a tiny cog in the cluster can never have enough information to know when a flow started or finished. This is why I leave it to the management to manage these resources. I just prevent collisions (with the CIDs) and handle resource availability.
> 
> How to implement stuff:
> I suggest this CID scheme:
> 
> For connections that persist across engine restarts.
> 
> OENGINE_<resource type>_<resource id>_CON<connection id>
> EX:
> OENGINE_DOMAIN_2131-321dsa-dsadsa-232_CON1
> 
> For connections that are managed for flows and do not might not persist engine restart
> 
> OENGINE_<engine isntanceid>_FLOW_<flow id>_CON<connection id>
> EX:
> OENGINE_4324-23423dfd-fsdfsd-21312_FLOW_1023_CON1
> 
> Note:
> instance id is a uuid generate on each instance run to differentiate between running instances simply.
> 
> 
> 
> How to poll for connections: (in pythonic pseudo code)
> ---------------------------------------------------------
> def pollConenctions(vdsm host, stringList CidList, func(void)bool stopContion, int interval):
>     clist = CidList.copy()
>     while (not stopCondition) and (len(clist) > 0):
>        statuses = host.getStorageConnectionsStatuses()
>        for id in statuses:
>            if not id.startswith("OENGINE"):
>                # This is not an engine connection, ignore
>                continue
> 
>            # Check the scheme and see if it has an instance ID after the prefix or not
>            if isPersistantConnection(id):
>               continue
> 
>            instanceId, flowId, conId = parseCID(id)
> 
>            # Clean connections from past instances
>            if instanceId != global_instance_id
>               # Ignore errors here as some other thread may be clearing this ID as well
>               # at any case VDSM is taking care of thread safety.
>               host.unmanageStorageConnection(id)
> 
>            if id in CidList:
>               if statuses[id].connected:
>                   clist.remove(id)
> 
>        sleep(interval)
> -------------------------------------------------

I would not use sleep, I would use a scheduler based monitoring and
release the thread in between cycles.

> It's easy to see how you can modify this template to support multiple modes of tracking
> * Pass a flow id instead of a CID list to track a flow
> * Exit when at least X connections succeeded
> * call getDeviceList after every successful connect and check if the lun you are looking for is available if it is continue and let the other connections complete at their own pace for multipathing.
> * connect to multiple hosts and return once 1 host has connected successfuly
> * you can also add an install id or a cluster id if you want to have multiple engines managing the same VDSM and not have them step on each others toes.
> 
> and much, much more.
> 
> Implementing this will give you everything you want with maximum correctness and flexibility. This will also make the transition to event driven communication with VDSM simpler.






More information about the Engine-devel mailing list