----- Original Message -----
From: "Livnat Peer" <lpeer(a)redhat.com>
To: "Saggi Mizrahi" <smizrahi(a)redhat.com>
Cc: vdsm-devel(a)lists.fedorahosted.org, engine-devel(a)ovirt.org
Sent: Tuesday, January 24, 2012 12:43:39 PM
Subject: Re: [Engine-devel] [RFC] New Connection Management API
On 23/01/12 23:54, Saggi Mizrahi wrote:
> I have begun work at changing how API clients can control storage
> connections when interacting with VDSM.
>
> Currently there are 2 API calls:
> connectStorageServer() - Will connect to the storage target if the
> host is not already connected to it.
> disconnectStorageServer() - Will disconnect from the storage target
> if the host is connected to it.
>
> This API is very simple but is inappropriate when multiple clients
> and flows try to access the same storage.
>
> This is currently solved by trying to synchronize things inside
> rhevm. This is hard and convoluted. It also brings out issues with
> other clients using the VDSM API.
>
> Another problem is error recovery. Currently ovirt-engine(OE) has
> no way of monitoring the connections on all the hosts an if a
> connection disappears it's OE's responsibility to reconnect.
>
> I suggest a different concept where VDSM 'manages' the connections.
> VDSM receives a manage request with the connection information and
> from that point forward VDSM will try to keep this connection
> alive. If the connection fails VDSM will automatically try and
> recover.
>
> Every manage request will also have a connection ID(CID). This CID
> will be used when the same client asks to unamange the connection.
> When multiple requests for manage are received to the same
> connection they all have to have their own unique CID. By
> internally mapping CIDs to actual connections VDSM can properly
> disconnect when no CID is addressing the connection. This allows
> each client and even each flow to have it's own CID effectively
> eliminating connect\disconnect races.
>
> The change from (dis)connect to (un)manage also changes the
> semantics of the calls significantly.
> Whereas connectStorageServer would have returned when the storage
> is either connected or failed to connect, manageStorageServer will
> return once VDSM registered the CID. This means that the
> connection might not be active immediately as the VDSM tries to
> connect. The connection might remain down for a long time if the
> storage target is down or is having issues.
>
> This allows for VDSM to receive the manage request even if the
> storage is having issues and recover as soon as it's operational
> without user intervention.
>
> In order for the client to query the current state of the
> connections I propose getStorageConnectionList(). This will return
> a mapping of CID to connection status. The status contains the
> connection info (excluding credentials), whether the connection is
> active, whether the connection is managed (unamanged connection
> are returned with transient IDs), and, if the connection is down,
> the last error information.
>
> The same actual connection can return multiple times, once for each
> CID.
>
> For cases where an operation requires a connection to be active a
> user can poll the status of the CID. The user can then choose to
> poll for a certain amount of time or until an error appears in the
> error field of the status. This will give you either a timeout or
> a "try once" semantic depending on the flows needs.
>
> All connections that have been managed persist VDSM restart and
> will be managed until a corresponding unmanage command has been
> issued.
>
> There is no concept of temporary connections as "temporary" is flow
> dependent and VDSM can't accommodate all interpretation of
> "temporary". An ad-hoc mechanism can be build using the CID field.
> For instance a client can manage a connection with
> "ENGINE_FLOW101_CON1". If the flow got interrupted the client can
> clean all IDs with certain flow IDs.
>
> I think this API gives safety, robustness, and implementation
> freedom.
>
>
> Nitty Gritty:
>
> manageStorageServer
> ===================
> Synopsis:
> manageStorageServer(uri, connectionID):
>
> Parameters:
> uri - a uri pointing to a storage target (eg: nfs://server:export,
> iscsi://host/iqn;portal=1)
> connectionID - string with any char except "/".
>
> Description:
> Tells VDSM to start managing the connection. From this moment on
> VDSM will try and have the connection available when needed. VDSM
> will monitor the connection and will automatically reconnect on
> failure.
> Returns:
> Success code if VDSM was able to manage the connection.
> It usually just verifies that the arguments are sane and that the
> CID is not already in use.
> This doesn't mean the host is connected.
> ----
> unmanageStorageServer
> =====================
> Synopsis:
> unmanageStorageServer(connectionID):
>
> Parameters:
> connectionID - string with any char except "/".
>
> Descriptions:
> Tells VDSM to stop managing the connection. VDSM will try and
> disconnect for the storage target if this is the last CID
> referencing the storage connection.
>
> Returns:
> Success code if VDSM was able to unmanage the connection.
> It will return an error if the CID is not registered with VDSM.
> Disconnect failures are not reported. Active unmanaged connections
> can be tracked with getStorageServerList()
> ----
> getStorageServerList
> ====================
> Synopsis:
> getStorageServerList()
>
> Description:
> Will return list of all managed and unmanaged connections.
> Unmanaged connections have temporary IDs and are not guaranteed to
> be consistent across calls.
>
> Results:VDSM was able to manage the connection.
> It usually just verifies that the arguments are sane and that the
> CID is not already in use.
> This doesn't mean the host is connected.
> ----
> unmanageStorageServer
> =====================
> Synopsis:
> unmanageStorageServer(connectionID):
>
> Parameters:
> connectionID - string with any char except "/".
>
> Descriptions:
> Tells VDSM to stop managing the connection. VDSM will try and
> disconnect for the storage target if this is the last CID
> referencing the storage connection.
>
> Returns:
> Success code if VDSM was able to unmanage the connection.
> It will return an error if the CID is not registered with VDSM.
> Disconnect failures are not reported. Active unmanaged connections
> can be tracked with getStorageServerList()
> ----
> getStorageServerList
> ====================
> Synopsis:
> getStorageServerList()
>
> Description:
> Will return list of all managed and unmanaged connections.
> Unmanaged connections have temporary IDs and are not guaranteed to
> be consistent across calls.
>
> Results:
> A mapping between CIDs and the status.
> example return value (Actual key names may differ)
>
> {'conA': {'connected': True, 'managed': True,
'lastError': 0,
> 'connectionInfo': {
> 'remotePath': 'server:/export
> 'retrans': 3
> 'version': 4
> }}
> 'iscsi_session_34': {'connected': False, 'managed': False,
> 'lastError': 339, 'connectionIfno': {
> 'hostname': 'dandylopn'
> 'portal': 1}}
> }
> _______________________________________________
> Engine-devel mailing list
> Engine-devel(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/engine-devel
Hi Saggi,
I see the added value in the above functionality and I think it is a
needed functionality in VDSM.
Your suggestion includes 2 concepts:
- Persist connection - auto-reconnect on failures
- Reference counting (with CID granularity)
It's not reference counting and the
API user should not assume it is a reference count.
Each CID can only registered once.
Subsequent requests to register will fail if the CID is already registered.
There shouldn't be any assumptions between a manage call and how many physical
connections are actually created.
Optimizations like internal multiplexing are implementation detail and might change.
Here are some comments:
* Assuming you meant that the new API will be a replacement to the
current API (based on previous chats we had on this topic)
It is
I think you
are missing a needed functionality to support non-persisted
connection.
the problem is with the term non-persisted. Everything is transient
depending on the scale of time you consider "temporary". I leave the decision on
what is temporary to the API user and give him the freedom to implement any connection
lifecycle mechanism he chooses. I assume that all components, including VDSM, can crash in
the middle of a flow and might want to either recover and continue or roll back. I leave
it the the user to decide how to handle this as he is the one managing the flow and not
VDSM.
creating a storage domain is an example where it can be useful.
The flow includes connecting the host to the storage server creating
storage domain and disconnecting from the storage server.
I think you are confusing
create storage domain flow in general and how create storage domain flow now in the RHEV
GUI.
A flow can have multiple strategies waiting for connection availability:
* If it's a non interactive process I might not care if the actual connect takes 3
hours.
* On the other hand if it's an interactive connect I might only want to wait for 1
minute even if an actual connect request takes a lot more because of some problem.
* If I am testing the connection arguments I might want to wait until I see the connection
succeed or lastError get a value no matter how long it takes.
* I might want to try as long as the error is not credential related (or other non
transient issue).
* I might want to try until I see the connection active for X amount of time (To test for
intermittent disconnects).
All of these can be accommodated by the suggested API.
Let's assume VDSM hangs while creating the storage domain any
unmanageStorageServer will fail, the engine rolls back and tries to
create the storage domain on another host, there is no reason for the
host to reconnect to this storage server.
That is true but there is no way for VDSM
to know if the connection is deprecated or not. For all I know rhevm might be having
issues and will continue on with the flow in a few minutes.
In the above flow i would use non-persist connection if I had one.
Again, what does no-persist means.
* In the suggested solution the connect will not initiate an
immediate
connect to the storage server instead it will register the connection
as
handled connection and will actually generate the connect as part of
the
managed connection mechanism.
The mechanism guarantees maximum availability so it
will immediately connect. The command might return before an actual connection succeeded
as the Manage part is done.
I argue that this modeling is implementation driven which is wrong
from
the user perspective.
VDSM is pretty low on the stack and has to accommodate many
API users. I think it's wrong to model an API when you don't consider how things
actually behave and try and glue stuff to appease a GUI flow. GUI flows change all the
time, APIs don't so having a flexible API that supports multiple use patterns and does
not enforce arbitrary limitations is better then one that is tightly coupled to 1 user
flow.
As a user I expect connect to actually initiate
a
connect action and that the return value should indicate if the
connect
succeeded, the way you modeled it the API will return true if you
succeeded 'registering' the connect.
You modeled the API to be asynchronous with no handler (task id) to
monitor the results of the action, which requires polling
The API is not
asynchronous it is perfectly synchronous. When manageStorageConnection() returns the
connection is managed.
You will have maximum connection uptime. You will have to poll and check for the liveness
of the connection before using it as some problems may occur preventing VDSM from
supplying the connection at the moment.
in the
create
storage domain flow which I really don't like.
In addition you introduced a verb for monitoring the status of the
connections alone I would like to be able to monitor it as part of
the
general host status and not have to poll on a new verb in addition to
the current one.
As part of solving the connection management flows in OE I am
missing:
- A way to clear all managed connections.
use case: We move a host from one data center to another and we want
the
host to clear all the managed connections.
we can ask for the list of managed connection and clear them but
having
clearAll is much easier.
Nope, you should get all active connections. Cherry pick
the ones you own using some ID scheme (RHEVM_FLOWID_CON?) and only clear you own
connections. There might be other client using VDSM that you will forcibly disconnect.
- Handling a list of Ids in each API verb
Only getDeviceList will have a list of
IDs handed to it. It makes no sense in other verbs.
- A verb which handles create storage domain and encapsulates the
connect create and disconnect.
This is a hackish ad-hoc solution. Why not have one
for the entire pool? Why not have one for a VM?
Thanks, Livnat
I will try and sum the points ups here:
manageConnection is not connectStorageServer. They are different. The latter means connect
to the storage server, the former means manage it. They are both synchronous.
non-persistence makes no sense. Auto unmanage does. If anyone suggests a valid mechanism
to auto clean CIDs that is correct and accommodates interactive and non interactive flows
that I will be willing to accept it. Timeouts are never correct as no flow is really time
capped and it will create more issues that it will solve.
Polling the CID to track connection availability is the correct way to go as what you
really want to do is not connect to the storage bu rather have it available. Polling is
just waiting until the connection is available or condition has been triggered. This give
the flow manager freedom of what the condition is (see above).
Cleaning the connections, like closing FDs, freeing memory. and other resource management
is a pain. I understand, and having a transaction like mechanism to lock resources to a
flow will be great but this is outside the scope of this change.
VDSM being a tiny cog in the cluster can never have enough information to know when a flow
started or finished. This is why I leave it to the management to manage these resources. I
just prevent collisions (with the CIDs) and handle resource availability.
How to implement stuff:
I suggest this CID scheme:
For connections that persist across engine restarts.
OENGINE_<resource type>_<resource id>_CON<connection id>
EX:
OENGINE_DOMAIN_2131-321dsa-dsadsa-232_CON1
For connections that are managed for flows and do not might not persist engine restart
OENGINE_<engine isntanceid>_FLOW_<flow id>_CON<connection id>
EX:
OENGINE_4324-23423dfd-fsdfsd-21312_FLOW_1023_CON1
Note:
instance id is a uuid generate on each instance run to differentiate between running
instances simply.
How to poll for connections: (in pythonic pseudo code)
---------------------------------------------------------
def pollConenctions(vdsm host, stringList CidList, func(void)bool stopContion, int
interval):
clist = CidList.copy()
while (not stopCondition) and (len(clist) > 0):
statuses = host.getStorageConnectionsStatuses()
for id in statuses:
if not id.startswith("OENGINE"):
# This is not an engine connection, ignore
continue
# Check the scheme and see if it has an instance ID after the prefix or not
if isPersistantConnection(id):
continue
instanceId, flowId, conId = parseCID(id)
# Clean connections from past instances
if instanceId != global_instance_id
# Ignore errors here as some other thread may be clearing this ID as well
# at any case VDSM is taking care of thread safety.
host.unmanageStorageConnection(id)
if id in CidList:
if statuses[id].connected:
clist.remove(id)
sleep(interval)
-------------------------------------------------
It's easy to see how you can modify this template to support multiple modes of
tracking
* Pass a flow id instead of a CID list to track a flow
* Exit when at least X connections succeeded
* call getDeviceList after every successful connect and check if the lun you are looking
for is available if it is continue and let the other connections complete at their own
pace for multipathing.
* connect to multiple hosts and return once 1 host has connected successfuly
* you can also add an install id or a cluster id if you want to have multiple engines
managing the same VDSM and not have them step on each others toes.
and much, much more.
Implementing this will give you everything you want with maximum correctness and
flexibility. This will also make the transition to event driven communication with VDSM
simpler.