[Engine-devel] [vdsm] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization

Ayal Baron abaron at redhat.com
Sun Mar 17 14:29:18 UTC 2013



----- Original Message -----
> 
> 
> ----- Original Message -----
> > From: "Ayal Baron" <abaron at redhat.com>
> > To: "Itamar Heim" <iheim at redhat.com>
> > Cc: engine-devel at ovirt.org, vdsm-devel at lists.fedorahosted.org
> > Sent: Sunday, March 17, 2013 3:13:09 PM
> > Subject: Re: [Engine-devel] [vdsm] Proposal VDSM <=> Engine Data
> > Statistics Retrieval	Optimization
> > 
> > 
> > 
> > ----- Original Message -----
> > > On 03/13/2013 11:55 PM, Ayal Baron wrote:
> > > ...
> > > >>>> The only reason we have this problem is because there is
> > > >>>> this
> > > >>>> thing against making multiple calls.
> > > >>>>
> > > >>>> Just split it up.
> > > >>>> getVmRuntimeStats() - transient things like mem and cpu%
> > > >>>> getVmInformation() - (semi)static things like
> > > >>>> disk\networking
> > > >>>> layout
> > > >>>> etc.
> > > >>>> Each updated at different intervals.
> > > >>>
> > > >>> +1 on splitting the data up into 2 separate API calls.
> > > >>> You could potentially add a checksum (md5, or any other way)
> > > >>> of
> > > >>> the
> > > >>> "static" data to getVmRuntimeStats and not bother even with
> > > >>> polling
> > > >>> the VmInformation if this hasn't changed.  Then you could
> > > >>> poll
> > > >>> as
> > > >>> often as you'd like the stats and immediately see if you also
> > > >>> need
> > > >>> to retrieve VmInfo or not (you rarely would).
> > > >> +1 To Ayal's suggestion
> > > >> except that instead of the engine hashing the data VDSM sends
> > > >> the
> > > >> key which is opaque to the engine.
> > > >> This can be a local timestap or a generation number.
> > > >
> > > > Of course vdsm does the hash, otherwise you'd need to pass all
> > > > the
> > > > data to engine which would beat the purpose.
> > > 
> > > I thought you meant engine will be sending the hash of previous
> > > requests
> > > per VM to vdsm, then vdsm will reply back with vm's removed, vm's
> > > added,
> > > and the details for vm's that changed (i.e., engine would be
> > > doing
> > > something like if-modified-since-checksum per vm).
> > > benefit is reducing a round trip.
> > > but first would need to split to calls of stats (always changing)
> > > and
> > > slowly/never changing data.
> > 
> > If vdms accepts the hash then in your method engine would have to
> > periodically call getVmInfo(hash).
> > What I was suggesting is that getVmStats would return vmInfo hash
> > so
> > that we could avoid calling getVmInfo altogether.
> > The stats *always* change so there is no need for checking if that
> > info has changed.
> > What we could do is avoid the split into 2 verbs by calling
> > getVmStats(hash) and then have getVmStats return everything if the
> > hash has changed or only the stats if it hasn't.  This would be the
> > least number of roundtrips and avoid the split.  If you don't pass
> > a
> > hash it would return everything so this way it's also fully
> > backward
> > compatible.
> 
> Actually, I assume we can pass hash 0 (to have vdsm return
> "everything"). I assume that the chances for md5 on "real data" (i.e
> -
> real data that is known to engine) to be 0 are very slim.

We'd need to support hash=None to keep backward compatibility, plus there are no assumptions this way on hash algorithm so why bother with hash=0?

> 
> > 
> > > 
> > > >
> > > >>
> > > >> But, we might want to consider that when we add events polling
> > > >> becomes (much) less frequent so maybe it'll be an overkill.
> > > >
> > > > You'd still need to compare versions of the data in vdsm and
> > > > send
> > > > only if it changed.  If you don't persist what was received
> > > > last
> > > > then potentially you could have a monday morning effect where
> > > > upon
> > > > on system startup you'd be sending everything.  So I still
> > > > think
> > > > you'd want to have the hash.
> > > >
> > > >
> > > >>
> > > >>>
> > > >>>>
> > > >>>> ----- Original Message -----
> > > >>>>> From: "Vinzenz Feenstra" <vfeenstr at redhat.com>
> > > >>>>> To: vdsm-devel at lists.fedorahosted.org,
> > > >>>>> engine-devel at ovirt.org
> > > >>>>> Sent: Thursday, March 7, 2013 6:25:54 AM
> > > >>>>> Subject: [Engine-devel] Proposal VDSM <=> Engine Data
> > > >>>>> Statistics
> > > >>>>> Retrieval	Optimization
> > > >>>>>
> > > >>>>>
> > > >>>>> Please find the prettier version on the wiki:
> > > >>>>> http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval
> > > >>>>>
> > > >>>>> Proposal VDSM - Engine Data Statistics Retrieval
> > > >>>>> VDSM <=> Engine data retrieval optimization
> > > >>>>> Motivation:
> > > >>>>>
> > > >>>>>
> > > >>>>> Currently the RHEVM engine is polling the a lot of data
> > > >>>>> from
> > > >>>>> VDSM
> > > >>>>> every 15 seconds. This should be optimized and the amount
> > > >>>>> of
> > > >>>>> data
> > > >>>>> requested should be more specific.
> > > >>>>>
> > > >>>>> For each VM the data currently contains much more
> > > >>>>> information
> > > >>>>> than
> > > >>>>> actually needed which blows up the size of the XML content
> > > >>>>> quite
> > > >>>>> big. We could optimize this by splitting the reply on the
> > > >>>>> getVmStats
> > > >>>>> based on the request of the engine into sections. For this
> > > >>>>> reason
> > > >>>>> Omer Frenkel and me have split up the data into parts based
> > > >>>>> on
> > > >>>>> their
> > > >>>>> usage.
> > > >>>>>
> > > >>>>> This data can and usually does change during the lifetime
> > > >>>>> of
> > > >>>>> the
> > > >>>>> VM.
> > > >>>>> Rarely Changed:
> > > >>>>>
> > > >>>>>
> > > >>>>> This data is change not very frequent and it should be
> > > >>>>> enough
> > > >>>>> to
> > > >>>>> update this only once in a while. Most commonly this data
> > > >>>>> changes
> > > >>>>> after changes made in the UI or after a migration of the VM
> > > >>>>> to
> > > >>>>> another Host. Status = Running acpiEnable = true vmType =
> > > >>>>> kvm
> > > >>>>> guestName = W864GUESTAGENTT displayType = qxl guestOs = Win
> > > >>>>> 8
> > > >>>>> kvmEnable = true # this should be constant and never
> > > >>>>> changed
> > > >>>>> pauseCode = NOERR monitorResponse = 0 session = Locked #
> > > >>>>> unused
> > > >>>>> netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet
> > > >>>>> NIC',
> > > >>>>> 'inet6':  ['fe80::490c:92bb:bbcc:9f87'], 'inet':
> > > >>>>> ['10.34.60.148'],
> > > >>>>> 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4',
> > > >>>>> 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3',
> > > >>>>> 'RHEV-Network64
> > > >>>>> 3.2.2',
> > > >>>>> 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3',
> > > >>>>> 'RHEV-Balloon64
> > > >>>>> 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2',
> > > >>>>> 'RHEV-USB
> > > >>>>> 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid =
> > > >>>>> 11314
> > > >>>>> guestIPs = 10.34.60.148 # duplicated info displayIp = 0
> > > >>>>> displayPort
> > > >>>>> = 5902 displaySecurePort = 5903 username =
> > > >>>>> user at W864GUESTAGENTT
> > > >>>>> clientIp = lastLogin = 1361976900.67 Often Changed:
> > > >>>>>
> > > >>>>>
> > > >>>>> This data is changed quite often however it is not
> > > >>>>> necessary
> > > >>>>> to
> > > >>>>> update this data every 15 seconds. As this is cumulative
> > > >>>>> data
> > > >>>>> and
> > > >>>>> reflects the current status, and it does not need to be
> > > >>>>> snapshotted
> > > >>>>> every 15 seconds to retrieve statistics. The data can be
> > > >>>>> retrieved
> > > >>>>> in much more generous time slices. (e.g. Every 5 minutes)
> > > >>>>> network
> > > >>>>> =
> > > >>>>> {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped':
> > > >>>>> '0',
> > > >>>>> 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0',
> > > >>>>> 'rxRate':
> > > >>>>> '0.0',
> > > >>>>> 'txErrors': '0', 'state': 'unknown', 'speed': '100',
> > > >>>>> 'name':
> > > >>>>> 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total':
> > > >>>>> '64055406592',
> > > >>>>> 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\',
> > > >>>>> 'total':
> > > >>>>> '3490912256', 'fs': 'UDF', 'used': '3490912256'}]
> > > >>>>> timeOffset
> > > >>>>> =
> > > >>>>> 14422
> > > >>>>> elapsedTime = 68591 hash = 2335461227228498964 statsAge =
> > > >>>>> 0.09
> > > >>>>> #
> > > >>>>> unused Often Changed but unused
> > > >>>>>
> > > >>>>>
> > > >>>>> This data does not seem to be used in the engine at all. It
> > > >>>>> is
> > > >>>>> not
> > > >>>>> even used in the data warehouse. memoryStats = {'swap_out':
> > > >>>>> '0',
> > > >>>>> 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0',
> > > >>>>> 'pageflt':
> > > >>>>> '0', 'mem_total': '2096736', 'mem_unused': '1466884'}
> > > >>>>> balloonInfo
> > > >>>>> =
> > > >>>>> {'balloon_max': 2097152, 'balloon_cur': 2097152} disks =
> > > >>>>> {'vda':
> > > >>>>> {'readLatency': '0', 'apparentsize': '64424509440',
> > > >>>>> 'writeLatency':
> > > >>>>> '1754496', 	'imageID':
> > > >>>>> '28abb923-7b89-4638-84f8-1700f0b76482',
> > > >>>>> 'flushLatency': '156549',  'readRate': '0.00', 'truesize':
> > > >>>>> '18855059456', 'writeRate': '952.05'}, 'hdc':
> > > >>>>> {'readLatency':
> > > >>>>> '0',
> > > >>>>> 'apparentsize': '0', 'writeLatency': '0', 'flushLatency':
> > > >>>>> '0',
> > > >>>>> 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}
> > > >>>>> Very
> > > >>>>> frequent uppdates needed by webadmin portal:
> > > >>>>>
> > > >>>>>
> > > >>>>> This data is mostly needed for the webadmin portal and
> > > >>>>> might
> > > >>>>> be
> > > >>>>> required to be updated quite often. An exception here is
> > > >>>>> the
> > > >>>>> statsAge field, which seems to be unused by the Engine.
> > > >>>>> This
> > > >>>>> data
> > > >>>>> could be requested every 15 seconds to keep things as they
> > > >>>>> are
> > > >>>>> now.
> > > >>>>> cpuSys = 2.32 cpuUser = 1.34 memUsage = 30 Proposed
> > > >>>>> Solution
> > > >>>>> for
> > > >>>>> VDSM & Engine:
> > > >>>>>
> > > >>>>>
> > > >>>>> We will introduce new optional parameters to getVmStats,
> > > >>>>> getAllVmStats and list to allow a finer grained
> > > >>>>> specification
> > > >>>>> of
> > > >>>>> data which should be included.
> > > >>>>>
> > > >>>>> Parameter: statsType = <string> (getVmStats, getAllVmStats
> > > >>>>> only)
> > > >>>>> Allowed values:
> > > >>>>>
> > > >>>>>      * full (default to keep backwards compatibility)
> > > >>>>>      * app-list (Just send the application list)
> > > >>>>>      * rare (include everything from rarely changed to very
> > > >>>>>      frequent)
> > > >>>>>      * often (include everything from often changed to very
> > > >>>>>      frequent)
> > > >>>>>      * frequent (only send the very frequently changed
> > > >>>>>      items)
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Parameter: clientId = <string> The client id is specified
> > > >>>>> by
> > > >>>>> the
> > > >>>>> client and should be unique however constantly used.
> > > >>>>>
> > > >>>>> Parameter: diff = <boolean> In combination with the
> > > >>>>> clientId
> > > >>>>> VDSM
> > > >>>>> will send only differences to the previous request from the
> > > >>>>> named
> > > >>>>> clientId. (if diff=true)
> > > >>>>>
> > > >>>>>
> > > >>>>> Additional Change:
> > > >>>>>
> > > >>>>>
> > > >>>>> Besides the introduction of the new parameters for list,
> > > >>>>> getVmStats
> > > >>>>> and getAllVmStats it might make sense to include a hash for
> > > >>>>> the
> > > >>>>> appList into the rarely changed section of the response
> > > >>>>> which
> > > >>>>> would
> > > >>>>> allow to identify changes and avoid having to sent the
> > > >>>>> complete
> > > >>>>> appList every so often and only if the hash known to the
> > > >>>>> client
> > > >>>>> is
> > > >>>>> outdated.
> > > >>>>>
> > > >>>>> Note: The appList (Application List) reported by the guest
> > > >>>>> agent
> > > >>>>> could be fully implemented on request only, as long as the
> > > >>>>> guest
> > > >>>>> agent installed supports this. As there seems to be a
> > > >>>>> request
> > > >>>>> to
> > > >>>>> have the complete list of installed applications on all
> > > >>>>> guests
> > > >>>>> this
> > > >>>>> data could be quite extensive and a huge list. On the other
> > > >>>>> hand
> > > >>>>> this data is only rarely visible and therefore it should
> > > >>>>> not
> > > >>>>> be
> > > >>>>> requested all the time and only on demand. Improvement of
> > > >>>>> the
> > > >>>>> Guest
> > > >>>>> Agent:
> > > >>>>>
> > > >>>>>
> > > >>>>> As part of the proposed solution it is necessary to improve
> > > >>>>> the
> > > >>>>> guest
> > > >>>>> agent as well. For the full application list there should
> > > >>>>> be
> > > >>>>> implemented a caching system which will be fully reactive
> > > >>>>> and
> > > >>>>> should
> > > >>>>> not poll the application list for example all the time. The
> > > >>>>> guest
> > > >>>>> can create a prepared data file containing all data in the
> > > >>>>> JSON
> > > >>>>> format (as used for the communication with VDSM via VIO)
> > > >>>>> and
> > > >>>>> just
> > > >>>>> have to read that file from disk and directly sends it to
> > > >>>>> VDSM.
> > > >>>>> However it is quite possible that this list is to big and
> > > >>>>> it
> > > >>>>> might
> > > >>>>> have to be chunked into pieces. (Multiple messages, which
> > > >>>>> would
> > > >>>>> have
> > > >>>>> to be supported by VDSM then as well) The solution for this
> > > >>>>> is
> > > >>>>> to
> > > >>>>> make VDSM request this data and it will retrieve the data
> > > >>>>> necessary
> > > >>>>> on request only. --
> > > >>>>> Regards,
> > > >>>>>
> > > >>>>> Vinzenz Feenstra | Senior Software Engineer
> > > >>>>> RedHat Engineering Virtualization R & D
> > > >>>>> Phone: +420 532 294 625
> > > >>>>> IRC: vfeenstr or evilissimo
> > > >>>>>
> > > >>>>> Better technology. Faster innovation. Powered by community
> > > >>>>> collaboration.
> > > >>>>> See how it works at redhat.com
> > > >>>>> _______________________________________________
> > > >>>>> Engine-devel mailing list
> > > >>>>> Engine-devel at ovirt.org
> > > >>>>> http://lists.ovirt.org/mailman/listinfo/engine-devel
> > > >>>>>
> > > >>>> _______________________________________________
> > > >>>> vdsm-devel mailing list
> > > >>>> vdsm-devel at lists.fedorahosted.org
> > > >>>> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> > > >>>>
> > > >>>
> > > >>
> > > > _______________________________________________
> > > > vdsm-devel mailing list
> > > > vdsm-devel at lists.fedorahosted.org
> > > > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> > > >
> > > 
> > > 
> > _______________________________________________
> > Engine-devel mailing list
> > Engine-devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/engine-devel
> > 
> 



More information about the Engine-devel mailing list