[Engine-devel] [vdsm] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization

Sun Mar 17 15:30:38 UTC 2013

On Sun, Mar 17, 2013 at 10:28:15AM -0400, Ayal Baron wrote:
> 
> 
> ----- Original Message -----
> > On 17/03/13 15:13, Ayal Baron wrote:
> > >
> > > ----- Original Message -----
> > >> On 03/13/2013 11:55 PM, Ayal Baron wrote:
> > >> ...
> > >>>>>> The only reason we have this problem is because there is this
> > >>>>>> thing against making multiple calls.
> > >>>>>>
> > >>>>>> Just split it up.
> > >>>>>> getVmRuntimeStats() - transient things like mem and cpu%
> > >>>>>> getVmInformation() - (semi)static things like disk\networking
> > >>>>>> layout
> > >>>>>> etc.
> > >>>>>> Each updated at different intervals.
> > >>>>> +1 on splitting the data up into 2 separate API calls.
> > >>>>> You could potentially add a checksum (md5, or any other way) of
> > >>>>> the
> > >>>>> "static" data to getVmRuntimeStats and not bother even with
> > >>>>> polling
> > >>>>> the VmInformation if this hasn't changed.  Then you could poll
> > >>>>> as
> > >>>>> often as you'd like the stats and immediately see if you also
> > >>>>> need
> > >>>>> to retrieve VmInfo or not (you rarely would).
> > >>>> +1 To Ayal's suggestion
> > >>>> except that instead of the engine hashing the data VDSM sends
> > >>>> the
> > >>>> key which is opaque to the engine.
> > >>>> This can be a local timestap or a generation number.
> > >>> Of course vdsm does the hash, otherwise you'd need to pass all
> > >>> the
> > >>> data to engine which would beat the purpose.
> > >> I thought you meant engine will be sending the hash of previous
> > >> requests
> > >> per VM to vdsm, then vdsm will reply back with vm's removed, vm's
> > >> added,
> > >> and the details for vm's that changed (i.e., engine would be doing
> > >> something like if-modified-since-checksum per vm).
> > >> benefit is reducing a round trip.
> > >> but first would need to split to calls of stats (always changing)
> > >> and
> > >> slowly/never changing data.
> > > If vdms accepts the hash then in your method engine would have to
> > > periodically call getVmInfo(hash).
> > > What I was suggesting is that getVmStats would return vmInfo hash
> > > so that we could avoid calling getVmInfo altogether.
> > > The stats *always* change so there is no need for checking if that
> > > info has changed.
> > > What we could do is avoid the split into 2 verbs by calling
> > > getVmStats(hash) and then have getVmStats return everything if the
> > > hash has changed or only the stats if it hasn't.  This would be
> > > the least number of roundtrips and avoid the split.  If you don't
> > > pass a hash it would return everything so this way it's also fully
> > > backward compatible.
> > 
> > For the 'static' data, why is there a need for a hash?
> > If VDSM sends in each update a timestamp, can't RHEVM just use
> > if-modified-since with the last timestamp it got from VDSM?
> > Is it cheaper for VDSM to calculate the hash, than update the
> > timestamp
> > per change in any of the fields? It doesn't really need to update the
> > timestamp per change, only for the first change since last update
> > sent
> > actually (so 'dirty' flag in a way, to signify data that RHEVM hasn't
> > seen yet).
> > Y.
> 
> As Saggi mentioned: "VDSM sends the key which is opaque to the engine. This can be a local timestap or a generation number."
> 
> The content doesn't matter, what matters is that it has changed.
> timestamp assumes that vdsm will track changes and send only delta.
> Although possible this would be an overkill (for every value in the
> dict you'd have to hold a timestamp of last change and send only those
> which have changed since the timestamp which was passed by the user).

If we're in the spirit of quoting Saggi, this suggestion is not
compatible with "...mak[ing] the return value differ according to input
... is a big no no when talking about type safe APIs.".

Dan.