[Engine-devel] [vdsm] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization
Ayal Baron
abaron at redhat.com
Mon Apr 22 20:39:48 UTC 2013
----- Original Message -----
> I have left this for a while without continuing because I had to focus
> on other things. However this is still in progress :-)
Are you writing patches? (if so, what solution are you pursuing)
> On 03/13/2013 10:55 PM, Ayal Baron wrote:
> >
> > ----- Original Message -----
> >>
> >> ----- Original Message -----
> >>> From: "Ayal Baron" <abaron at redhat.com>
> >>> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> >>> Cc: engine-devel at ovirt.org, vdsm-devel at lists.fedorahosted.org,
> >>> "Vinzenz Feenstra" <vfeenstr at redhat.com>
> >>> Sent: Wednesday, March 13, 2013 5:39:24 PM
> >>> Subject: Re: [vdsm] [Engine-devel] Proposal VDSM <=> Engine Data
> >>> Statistics Retrieval Optimization
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> I am completely against this.
> >>>> It make the return value differ according to input which
> >>>> is a big no no when talking about type safe APIs.
> >>>>
> >>>> The only reason we have this problem is because there is this
> >>>> thing against making multiple calls.
> Which is totally contra productive because multiple calls, if properly
> split up, will actually lead to less data sent for frequent needed data
> calls. And the others shall be triggered when necessary.
> >>>>
> >>>> Just split it up.
> >>>> getVmRuntimeStats() - transient things like mem and cpu%
> >>>> getVmInformation() - (semi)static things like disk\networking
> >>>> layout
> >>>> etc.
> >>>> Each updated at different intervals.
> >>> +1 on splitting the data up into 2 separate API calls.
> >>> You could potentially add a checksum (md5, or any other way) of the
> >>> "static" data to getVmRuntimeStats and not bother even with polling
> >>> the VmInformation if this hasn't changed. Then you could poll as
> >>> often as you'd like the stats and immediately see if you also need
> >>> to retrieve VmInfo or not (you rarely would).
> >> +1 To Ayal's suggestion
> >> except that instead of the engine hashing the data VDSM sends the
> >> key which is opaque to the engine.
> >> This can be a local timestap or a generation number.
> > Of course vdsm does the hash, otherwise you'd need to pass all the data to
> > engine which would beat the purpose.
> We need the hash if we can't have dynamic content. Generation numbers
> aren't really helpful as every call aggregates the statistics data
> newly, at the moment at least.
> >> But, we might want to consider that when we add events polling
> >> becomes (much) less frequent so maybe it'll be an overkill.
> > You'd still need to compare versions of the data in vdsm and send only if
> > it changed. If you don't persist what was received last then potentially
> > you could have a monday morning effect where upon on system startup you'd
> > be sending everything. So I still think you'd want to have the hash.
> We do a hash already on the XML and include it in the getStats response.
> Hashes should show enough difference.
>
> Now to the non-dynamic responses and 'type-safe' API: If we would go for
> non dynamic responses we would need for sure 5 new API calls to achieve
> some gain on the amount of data sent.
>
> *getAllVmRuntimeStats() "returns a map of vmId/data pairs for all vms"*
> # All the time changing data which is needed by the oVirt Engine, or so
> often changing that it does not make sense
> # to place it anywhere else
> {
> VmId: {
> cpuSys --> Could be potentially summarized
> cpuUser -/
> memUsage
> elapsedTime,
> status
> statsAge
>
> hashes = {
> conf, # Hased information of the XML
> (This one is called "hash" in getStats())
> info, # Hashed information of semi
> static items
> statusHash: # Hashed information of items with are
> likely to change however not that often
> guestDetails: # Hashed value of the guest details
> (applicationList, network information)
> }
> }
>
> **getVmStatuses([vmId1, vmId2, ...])*****"Returns a vmId/data pair for
> each vm requested"**
> *# This data does not change that often and can be retrieved on demand
> once the hash changes
> return {
> vmId: {
> timeOffset,
> monitorResponse
> clientIp,
> lastLogin,
> username,
> session,
> guestIPs,
> }
> }
>
> *getAllVmDeviceStatistics():**"Returns a vmId/data pair for all vms"*
> # This data has to be requested all the time however in lower
> intervals (e.g. every 5 minutes)
> # And is usually needed for all the VMs anyway
> return {
> vmId: {
> network,
> disksUsage, # Might be improved by summarizing?
> disks,
> balloonInfo,
> memoryStats
> }
> }
>
> *getVmInfo([vmId1, vmId2, ...]) "Returns a vmId/data pair for each vm
> requested"
> * # Basically this should be almost constant, except if there have
> been changes like migrations, pausing, errors etc
> return {
> vmId: {
> acpiEnable,
> vmType,
> guestName,
> guestOS,
> kvmEnable,
> pauseCode,
> displayIp,
> displayPort,
> displaySecurePort,
> pid,
> }
> }
>
> *getVmGuestDetails*([vmId1, vmId2, ...])
> # Data which changes seldom and these changes can be reflected in
> the hash when this needs to be requested
> # This data is really only necessary when it really has been
> changed or needs to be refreshed for whatever reason.
> return {
> vmId: {
> appsList,
> netIfaces,
> }
> }
>
> >
> >
> >>>> ----- Original Message -----
> >>>>> From: "Vinzenz Feenstra" <vfeenstr at redhat.com>
> >>>>> To: vdsm-devel at lists.fedorahosted.org, engine-devel at ovirt.org
> >>>>> Sent: Thursday, March 7, 2013 6:25:54 AM
> >>>>> Subject: [Engine-devel] Proposal VDSM <=> Engine Data
> >>>>> Statistics
> >>>>> Retrieval Optimization
> >>>>>
> >>>>>
> >>>>> Please find the prettier version on the wiki:
> >>>>> http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval
> >>>>>
> >>>>> Proposal VDSM - Engine Data Statistics Retrieval
> >>>>> VDSM <=> Engine data retrieval optimization
> >>>>> Motivation:
> >>>>>
> >>>>>
> >>>>> Currently the RHEVM engine is polling the a lot of data from
> >>>>> VDSM
> >>>>> every 15 seconds. This should be optimized and the amount of
> >>>>> data
> >>>>> requested should be more specific.
> >>>>>
> >>>>> For each VM the data currently contains much more information
> >>>>> than
> >>>>> actually needed which blows up the size of the XML content
> >>>>> quite
> >>>>> big. We could optimize this by splitting the reply on the
> >>>>> getVmStats
> >>>>> based on the request of the engine into sections. For this
> >>>>> reason
> >>>>> Omer Frenkel and me have split up the data into parts based on
> >>>>> their
> >>>>> usage.
> >>>>>
> >>>>> This data can and usually does change during the lifetime of
> >>>>> the
> >>>>> VM.
> >>>>> Rarely Changed:
> >>>>>
> >>>>>
> >>>>> This data is change not very frequent and it should be enough
> >>>>> to
> >>>>> update this only once in a while. Most commonly this data
> >>>>> changes
> >>>>> after changes made in the UI or after a migration of the VM to
> >>>>> another Host. Status = Running acpiEnable = true vmType = kvm
> >>>>> guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8
> >>>>> kvmEnable = true # this should be constant and never changed
> >>>>> pauseCode = NOERR monitorResponse = 0 session = Locked # unused
> >>>>> netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC',
> >>>>> 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet':
> >>>>> ['10.34.60.148'],
> >>>>> 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4',
> >>>>> 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64
> >>>>> 3.2.2',
> >>>>> 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64
> >>>>> 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB
> >>>>> 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid =
> >>>>> 11314
> >>>>> guestIPs = 10.34.60.148 # duplicated info displayIp = 0
> >>>>> displayPort
> >>>>> = 5902 displaySecurePort = 5903 username = user at W864GUESTAGENTT
> >>>>> clientIp = lastLogin = 1361976900.67 Often Changed:
> >>>>>
> >>>>>
> >>>>> This data is changed quite often however it is not necessary to
> >>>>> update this data every 15 seconds. As this is cumulative data
> >>>>> and
> >>>>> reflects the current status, and it does not need to be
> >>>>> snapshotted
> >>>>> every 15 seconds to retrieve statistics. The data can be
> >>>>> retrieved
> >>>>> in much more generous time slices. (e.g. Every 5 minutes)
> >>>>> network
> >>>>> =
> >>>>> {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0',
> >>>>> 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate':
> >>>>> '0.0',
> >>>>> 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name':
> >>>>> 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total':
> >>>>> '64055406592',
> >>>>> 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\', 'total':
> >>>>> '3490912256', 'fs': 'UDF', 'used': '3490912256'}] timeOffset =
> >>>>> 14422
> >>>>> elapsedTime = 68591 hash = 2335461227228498964 statsAge = 0.09
> >>>>> #
> >>>>> unused Often Changed but unused
> >>>>>
> >>>>>
> >>>>> This data does not seem to be used in the engine at all. It is
> >>>>> not
> >>>>> even used in the data warehouse. memoryStats = {'swap_out':
> >>>>> '0',
> >>>>> 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0',
> >>>>> 'pageflt':
> >>>>> '0', 'mem_total': '2096736', 'mem_unused': '1466884'}
> >>>>> balloonInfo
> >>>>> =
> >>>>> {'balloon_max': 2097152, 'balloon_cur': 2097152} disks =
> >>>>> {'vda':
> >>>>> {'readLatency': '0', 'apparentsize': '64424509440',
> >>>>> 'writeLatency':
> >>>>> '1754496', 'imageID': '28abb923-7b89-4638-84f8-1700f0b76482',
> >>>>> 'flushLatency': '156549', 'readRate': '0.00', 'truesize':
> >>>>> '18855059456', 'writeRate': '952.05'}, 'hdc': {'readLatency':
> >>>>> '0',
> >>>>> 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0',
> >>>>> 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}} Very
> >>>>> frequent uppdates needed by webadmin portal:
> >>>>>
> >>>>>
> >>>>> This data is mostly needed for the webadmin portal and might be
> >>>>> required to be updated quite often. An exception here is the
> >>>>> statsAge field, which seems to be unused by the Engine. This
> >>>>> data
> >>>>> could be requested every 15 seconds to keep things as they are
> >>>>> now.
> >>>>> cpuSys = 2.32 cpuUser = 1.34 memUsage = 30 Proposed Solution
> >>>>> for
> >>>>> VDSM & Engine:
> >>>>>
> >>>>>
> >>>>> We will introduce new optional parameters to getVmStats,
> >>>>> getAllVmStats and list to allow a finer grained specification
> >>>>> of
> >>>>> data which should be included.
> >>>>>
> >>>>> Parameter: statsType = <string> (getVmStats, getAllVmStats
> >>>>> only)
> >>>>> Allowed values:
> >>>>>
> >>>>> * full (default to keep backwards compatibility)
> >>>>> * app-list (Just send the application list)
> >>>>> * rare (include everything from rarely changed to very
> >>>>> frequent)
> >>>>> * often (include everything from often changed to very
> >>>>> frequent)
> >>>>> * frequent (only send the very frequently changed items)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Parameter: clientId = <string> The client id is specified by
> >>>>> the
> >>>>> client and should be unique however constantly used.
> >>>>>
> >>>>> Parameter: diff = <boolean> In combination with the clientId
> >>>>> VDSM
> >>>>> will send only differences to the previous request from the
> >>>>> named
> >>>>> clientId. (if diff=true)
> >>>>>
> >>>>>
> >>>>> Additional Change:
> >>>>>
> >>>>>
> >>>>> Besides the introduction of the new parameters for list,
> >>>>> getVmStats
> >>>>> and getAllVmStats it might make sense to include a hash for the
> >>>>> appList into the rarely changed section of the response which
> >>>>> would
> >>>>> allow to identify changes and avoid having to sent the complete
> >>>>> appList every so often and only if the hash known to the client
> >>>>> is
> >>>>> outdated.
> >>>>>
> >>>>> Note: The appList (Application List) reported by the guest
> >>>>> agent
> >>>>> could be fully implemented on request only, as long as the
> >>>>> guest
> >>>>> agent installed supports this. As there seems to be a request
> >>>>> to
> >>>>> have the complete list of installed applications on all guests
> >>>>> this
> >>>>> data could be quite extensive and a huge list. On the other
> >>>>> hand
> >>>>> this data is only rarely visible and therefore it should not be
> >>>>> requested all the time and only on demand. Improvement of the
> >>>>> Guest
> >>>>> Agent:
> >>>>>
> >>>>>
> >>>>> As part of the proposed solution it is necessary to improve the
> >>>>> guest
> >>>>> agent as well. For the full application list there should be
> >>>>> implemented a caching system which will be fully reactive and
> >>>>> should
> >>>>> not poll the application list for example all the time. The
> >>>>> guest
> >>>>> can create a prepared data file containing all data in the JSON
> >>>>> format (as used for the communication with VDSM via VIO) and
> >>>>> just
> >>>>> have to read that file from disk and directly sends it to VDSM.
> >>>>> However it is quite possible that this list is to big and it
> >>>>> might
> >>>>> have to be chunked into pieces. (Multiple messages, which would
> >>>>> have
> >>>>> to be supported by VDSM then as well) The solution for this is
> >>>>> to
> >>>>> make VDSM request this data and it will retrieve the data
> >>>>> necessary
> >>>>> on request only. --
> >>>>> Regards,
> >>>>>
> >>>>> Vinzenz Feenstra | Senior Software Engineer
> >>>>> RedHat Engineering Virtualization R & D
> >>>>> Phone: +420 532 294 625
> >>>>> IRC: vfeenstr or evilissimo
> >>>>>
> >>>>> Better technology. Faster innovation. Powered by community
> >>>>> collaboration.
> >>>>> See how it works at redhat.com
> >>>>> _______________________________________________
> >>>>> Engine-devel mailing list
> >>>>> Engine-devel at ovirt.org
> >>>>> http://lists.ovirt.org/mailman/listinfo/engine-devel
> >>>>>
> >>>> _______________________________________________
> >>>> vdsm-devel mailing list
> >>>> vdsm-devel at lists.fedorahosted.org
> >>>> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> >>>>
>
>
> --
> Regards,
>
> Vinzenz Feenstra | Senior Software Engineer
> RedHat Engineering Virtualization R & D
> Phone: +420 532 294 625
> IRC: vfeenstr or evilissimo
>
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com
>
>
More information about the Engine-devel
mailing list