[Engine-devel] vdsm checkConnectivity pitfals

Roy Golan rgolan at redhat.com
Mon Dec 26 09:16:16 UTC 2011


On Sun 25 Dec 2011 02:41:08 PM IST, Dan Kenigsberg wrote:
> On Thu, Dec 22, 2011 at 01:05:22PM +0200, Roy Golan wrote:
>>
>> Hi all,
>>
>> VDSM network provisioning api exposes a validity check to know the
>> newly applied changes works. The actual
>> check is comparing the /var/run/vdsm/client.log modified time to the
>> time when the check began and repeats
>> that check after sleeping 1 seconds for X time (where X is
>> connectivityTimeOut)
>>
>> def clientSeen(timeout):
>>      start = time.time()
>>      while timeout>= 0:
>>          if os.stat(constants.P_VDSM_CLIENT_LOG).st_mtime>  start:
>>              return True
>>          time.sleep(1)
>>          timeout -= 1
>>      return False
>>
>> Main issues I spot are:
>> 1. In case the host is in maintenance, the caller of the API must
>> generate traffic, concurrently to running api call,
>> and then must join and sync threads to realize when all is done. see
>> http://gerrit.ovirt.org/#change,584
>> 2. locally calling vdsClient also modifies the client.log  - we
>> can't rely on that no one will do that during the call.
>
> TODO: when verifying connectivity, vdsm should ack only if the caller of
> setupNetworks is logged; communication from other IPs should be ignored
> for that sense.
>
> Until then, customers should remember that vdsClient is unsupported, and
> can cause much worse stuff than fooling Vdsm to believe it has
> ovirt-engine connectivity.
>
>> 3. Failure writing to the client log will fail network provisioning!
>
> If vdsm cannot touch a zero-length file, it has more serious problems.
>
> 4. we verify connectivity only of the management network. It could be
> much nicer if we had an independent verification for other networks.
> avahi protocol could give this + auto discovery.

its actually a different check per network. For mgmt network we need to 
know the engine can reach the vdsm
on storage network, though, the VDSM is the client of the storage 
server. Display network couldn't determine who's its clients - can
avahi solve that?




>
>>
>> All of the above makes it not very reliable as a check and harder to
>> call, without posing races, as client.
>>
>> Possible alternate solution:
>> 1.  We can try to reach the api caller socket in return, maybe use
>> http code 100 ?
>> 2. pass in the API a URL which the VDSM will call. could be a
>> health-check servlet or something similar.
>>
>> Any toughs?
>>
>>
>>





More information about the Devel mailing list