[Engine-devel] [vdsm] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization

Yair Zaslavsky yzaslavs at redhat.com
Sun Mar 17 14:11:14 UTC 2013



----- Original Message -----
> From: "Ayal Baron" <abaron at redhat.com>
> To: "Itamar Heim" <iheim at redhat.com>
> Cc: engine-devel at ovirt.org, vdsm-devel at lists.fedorahosted.org
> Sent: Sunday, March 17, 2013 3:13:09 PM
> Subject: Re: [Engine-devel] [vdsm] Proposal VDSM <=> Engine Data Statistics Retrieval	Optimization
> 
> 
> 
> ----- Original Message -----
> > On 03/13/2013 11:55 PM, Ayal Baron wrote:
> > ...
> > >>>> The only reason we have this problem is because there is this
> > >>>> thing against making multiple calls.
> > >>>>
> > >>>> Just split it up.
> > >>>> getVmRuntimeStats() - transient things like mem and cpu%
> > >>>> getVmInformation() - (semi)static things like disk\networking
> > >>>> layout
> > >>>> etc.
> > >>>> Each updated at different intervals.
> > >>>
> > >>> +1 on splitting the data up into 2 separate API calls.
> > >>> You could potentially add a checksum (md5, or any other way) of
> > >>> the
> > >>> "static" data to getVmRuntimeStats and not bother even with
> > >>> polling
> > >>> the VmInformation if this hasn't changed.  Then you could poll
> > >>> as
> > >>> often as you'd like the stats and immediately see if you also
> > >>> need
> > >>> to retrieve VmInfo or not (you rarely would).
> > >> +1 To Ayal's suggestion
> > >> except that instead of the engine hashing the data VDSM sends
> > >> the
> > >> key which is opaque to the engine.
> > >> This can be a local timestap or a generation number.
> > >
> > > Of course vdsm does the hash, otherwise you'd need to pass all
> > > the
> > > data to engine which would beat the purpose.
> > 
> > I thought you meant engine will be sending the hash of previous
> > requests
> > per VM to vdsm, then vdsm will reply back with vm's removed, vm's
> > added,
> > and the details for vm's that changed (i.e., engine would be doing
> > something like if-modified-since-checksum per vm).
> > benefit is reducing a round trip.
> > but first would need to split to calls of stats (always changing)
> > and
> > slowly/never changing data.
> 
> If vdms accepts the hash then in your method engine would have to
> periodically call getVmInfo(hash).
> What I was suggesting is that getVmStats would return vmInfo hash so
> that we could avoid calling getVmInfo altogether.
> The stats *always* change so there is no need for checking if that
> info has changed.
> What we could do is avoid the split into 2 verbs by calling
> getVmStats(hash) and then have getVmStats return everything if the
> hash has changed or only the stats if it hasn't.  This would be the
> least number of roundtrips and avoid the split.  If you don't pass a
> hash it would return everything so this way it's also fully backward
> compatible.

Actually, I assume we can pass hash 0 (to have vdsm return "everything"). I assume that the chances for md5 on "real data" (i.e -
real data that is known to engine) to be 0 are very slim.

> 
> > 
> > >
> > >>
> > >> But, we might want to consider that when we add events polling
> > >> becomes (much) less frequent so maybe it'll be an overkill.
> > >
> > > You'd still need to compare versions of the data in vdsm and send
> > > only if it changed.  If you don't persist what was received last
> > > then potentially you could have a monday morning effect where
> > > upon
> > > on system startup you'd be sending everything.  So I still think
> > > you'd want to have the hash.
> > >
> > >
> > >>
> > >>>
> > >>>>
> > >>>> ----- Original Message -----
> > >>>>> From: "Vinzenz Feenstra" <vfeenstr at redhat.com>
> > >>>>> To: vdsm-devel at lists.fedorahosted.org, engine-devel at ovirt.org
> > >>>>> Sent: Thursday, March 7, 2013 6:25:54 AM
> > >>>>> Subject: [Engine-devel] Proposal VDSM <=> Engine Data
> > >>>>> Statistics
> > >>>>> Retrieval	Optimization
> > >>>>>
> > >>>>>
> > >>>>> Please find the prettier version on the wiki:
> > >>>>> http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval
> > >>>>>
> > >>>>> Proposal VDSM - Engine Data Statistics Retrieval
> > >>>>> VDSM <=> Engine data retrieval optimization
> > >>>>> Motivation:
> > >>>>>
> > >>>>>
> > >>>>> Currently the RHEVM engine is polling the a lot of data from
> > >>>>> VDSM
> > >>>>> every 15 seconds. This should be optimized and the amount of
> > >>>>> data
> > >>>>> requested should be more specific.
> > >>>>>
> > >>>>> For each VM the data currently contains much more information
> > >>>>> than
> > >>>>> actually needed which blows up the size of the XML content
> > >>>>> quite
> > >>>>> big. We could optimize this by splitting the reply on the
> > >>>>> getVmStats
> > >>>>> based on the request of the engine into sections. For this
> > >>>>> reason
> > >>>>> Omer Frenkel and me have split up the data into parts based
> > >>>>> on
> > >>>>> their
> > >>>>> usage.
> > >>>>>
> > >>>>> This data can and usually does change during the lifetime of
> > >>>>> the
> > >>>>> VM.
> > >>>>> Rarely Changed:
> > >>>>>
> > >>>>>
> > >>>>> This data is change not very frequent and it should be enough
> > >>>>> to
> > >>>>> update this only once in a while. Most commonly this data
> > >>>>> changes
> > >>>>> after changes made in the UI or after a migration of the VM
> > >>>>> to
> > >>>>> another Host. Status = Running acpiEnable = true vmType = kvm
> > >>>>> guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8
> > >>>>> kvmEnable = true # this should be constant and never changed
> > >>>>> pauseCode = NOERR monitorResponse = 0 session = Locked #
> > >>>>> unused
> > >>>>> netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC',
> > >>>>> 'inet6':  ['fe80::490c:92bb:bbcc:9f87'], 'inet':
> > >>>>> ['10.34.60.148'],
> > >>>>> 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4',
> > >>>>> 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64
> > >>>>> 3.2.2',
> > >>>>> 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64
> > >>>>> 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2',
> > >>>>> 'RHEV-USB
> > >>>>> 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid =
> > >>>>> 11314
> > >>>>> guestIPs = 10.34.60.148 # duplicated info displayIp = 0
> > >>>>> displayPort
> > >>>>> = 5902 displaySecurePort = 5903 username =
> > >>>>> user at W864GUESTAGENTT
> > >>>>> clientIp = lastLogin = 1361976900.67 Often Changed:
> > >>>>>
> > >>>>>
> > >>>>> This data is changed quite often however it is not necessary
> > >>>>> to
> > >>>>> update this data every 15 seconds. As this is cumulative data
> > >>>>> and
> > >>>>> reflects the current status, and it does not need to be
> > >>>>> snapshotted
> > >>>>> every 15 seconds to retrieve statistics. The data can be
> > >>>>> retrieved
> > >>>>> in much more generous time slices. (e.g. Every 5 minutes)
> > >>>>> network
> > >>>>> =
> > >>>>> {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0',
> > >>>>> 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate':
> > >>>>> '0.0',
> > >>>>> 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name':
> > >>>>> 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total':
> > >>>>> '64055406592',
> > >>>>> 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\',
> > >>>>> 'total':
> > >>>>> '3490912256', 'fs': 'UDF', 'used': '3490912256'}] timeOffset
> > >>>>> =
> > >>>>> 14422
> > >>>>> elapsedTime = 68591 hash = 2335461227228498964 statsAge =
> > >>>>> 0.09
> > >>>>> #
> > >>>>> unused Often Changed but unused
> > >>>>>
> > >>>>>
> > >>>>> This data does not seem to be used in the engine at all. It
> > >>>>> is
> > >>>>> not
> > >>>>> even used in the data warehouse. memoryStats = {'swap_out':
> > >>>>> '0',
> > >>>>> 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0',
> > >>>>> 'pageflt':
> > >>>>> '0', 'mem_total': '2096736', 'mem_unused': '1466884'}
> > >>>>> balloonInfo
> > >>>>> =
> > >>>>> {'balloon_max': 2097152, 'balloon_cur': 2097152} disks =
> > >>>>> {'vda':
> > >>>>> {'readLatency': '0', 'apparentsize': '64424509440',
> > >>>>> 'writeLatency':
> > >>>>> '1754496', 	'imageID':
> > >>>>> '28abb923-7b89-4638-84f8-1700f0b76482',
> > >>>>> 'flushLatency': '156549',  'readRate': '0.00', 'truesize':
> > >>>>> '18855059456', 'writeRate': '952.05'}, 'hdc': {'readLatency':
> > >>>>> '0',
> > >>>>> 'apparentsize': '0', 'writeLatency': '0', 'flushLatency':
> > >>>>> '0',
> > >>>>> 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}
> > >>>>> Very
> > >>>>> frequent uppdates needed by webadmin portal:
> > >>>>>
> > >>>>>
> > >>>>> This data is mostly needed for the webadmin portal and might
> > >>>>> be
> > >>>>> required to be updated quite often. An exception here is the
> > >>>>> statsAge field, which seems to be unused by the Engine. This
> > >>>>> data
> > >>>>> could be requested every 15 seconds to keep things as they
> > >>>>> are
> > >>>>> now.
> > >>>>> cpuSys = 2.32 cpuUser = 1.34 memUsage = 30 Proposed Solution
> > >>>>> for
> > >>>>> VDSM & Engine:
> > >>>>>
> > >>>>>
> > >>>>> We will introduce new optional parameters to getVmStats,
> > >>>>> getAllVmStats and list to allow a finer grained specification
> > >>>>> of
> > >>>>> data which should be included.
> > >>>>>
> > >>>>> Parameter: statsType = <string> (getVmStats, getAllVmStats
> > >>>>> only)
> > >>>>> Allowed values:
> > >>>>>
> > >>>>>      * full (default to keep backwards compatibility)
> > >>>>>      * app-list (Just send the application list)
> > >>>>>      * rare (include everything from rarely changed to very
> > >>>>>      frequent)
> > >>>>>      * often (include everything from often changed to very
> > >>>>>      frequent)
> > >>>>>      * frequent (only send the very frequently changed items)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Parameter: clientId = <string> The client id is specified by
> > >>>>> the
> > >>>>> client and should be unique however constantly used.
> > >>>>>
> > >>>>> Parameter: diff = <boolean> In combination with the clientId
> > >>>>> VDSM
> > >>>>> will send only differences to the previous request from the
> > >>>>> named
> > >>>>> clientId. (if diff=true)
> > >>>>>
> > >>>>>
> > >>>>> Additional Change:
> > >>>>>
> > >>>>>
> > >>>>> Besides the introduction of the new parameters for list,
> > >>>>> getVmStats
> > >>>>> and getAllVmStats it might make sense to include a hash for
> > >>>>> the
> > >>>>> appList into the rarely changed section of the response which
> > >>>>> would
> > >>>>> allow to identify changes and avoid having to sent the
> > >>>>> complete
> > >>>>> appList every so often and only if the hash known to the
> > >>>>> client
> > >>>>> is
> > >>>>> outdated.
> > >>>>>
> > >>>>> Note: The appList (Application List) reported by the guest
> > >>>>> agent
> > >>>>> could be fully implemented on request only, as long as the
> > >>>>> guest
> > >>>>> agent installed supports this. As there seems to be a request
> > >>>>> to
> > >>>>> have the complete list of installed applications on all
> > >>>>> guests
> > >>>>> this
> > >>>>> data could be quite extensive and a huge list. On the other
> > >>>>> hand
> > >>>>> this data is only rarely visible and therefore it should not
> > >>>>> be
> > >>>>> requested all the time and only on demand. Improvement of the
> > >>>>> Guest
> > >>>>> Agent:
> > >>>>>
> > >>>>>
> > >>>>> As part of the proposed solution it is necessary to improve
> > >>>>> the
> > >>>>> guest
> > >>>>> agent as well. For the full application list there should be
> > >>>>> implemented a caching system which will be fully reactive and
> > >>>>> should
> > >>>>> not poll the application list for example all the time. The
> > >>>>> guest
> > >>>>> can create a prepared data file containing all data in the
> > >>>>> JSON
> > >>>>> format (as used for the communication with VDSM via VIO) and
> > >>>>> just
> > >>>>> have to read that file from disk and directly sends it to
> > >>>>> VDSM.
> > >>>>> However it is quite possible that this list is to big and it
> > >>>>> might
> > >>>>> have to be chunked into pieces. (Multiple messages, which
> > >>>>> would
> > >>>>> have
> > >>>>> to be supported by VDSM then as well) The solution for this
> > >>>>> is
> > >>>>> to
> > >>>>> make VDSM request this data and it will retrieve the data
> > >>>>> necessary
> > >>>>> on request only. --
> > >>>>> Regards,
> > >>>>>
> > >>>>> Vinzenz Feenstra | Senior Software Engineer
> > >>>>> RedHat Engineering Virtualization R & D
> > >>>>> Phone: +420 532 294 625
> > >>>>> IRC: vfeenstr or evilissimo
> > >>>>>
> > >>>>> Better technology. Faster innovation. Powered by community
> > >>>>> collaboration.
> > >>>>> See how it works at redhat.com
> > >>>>> _______________________________________________
> > >>>>> Engine-devel mailing list
> > >>>>> Engine-devel at ovirt.org
> > >>>>> http://lists.ovirt.org/mailman/listinfo/engine-devel
> > >>>>>
> > >>>> _______________________________________________
> > >>>> vdsm-devel mailing list
> > >>>> vdsm-devel at lists.fedorahosted.org
> > >>>> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> > >>>>
> > >>>
> > >>
> > > _______________________________________________
> > > vdsm-devel mailing list
> > > vdsm-devel at lists.fedorahosted.org
> > > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> > >
> > 
> > 
> _______________________________________________
> Engine-devel mailing list
> Engine-devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel
> 



More information about the Devel mailing list