[Engine-devel] [vdsm] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization
Mark Wu
wudxw at linux.vnet.ibm.com
Fri Mar 8 02:30:50 UTC 2013
On 03/08/2013 06:11 AM, Dan Kenigsberg wrote:
> On Thu, Mar 07, 2013 at 12:25:54PM +0100, Vinzenz Feenstra wrote:
>> Please find the prettier version on the wiki:
>> http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval
>>
>>
>> Proposal VDSM - Engine Data Statistics Retrieval
>>
>>
>> VDSM <=> Engine data retrieval optimization
>>
>>
>> Motivation:
>>
>> Currently the RHEVM engine is polling the a lot of data from VDSM
>> every 15 seconds. This should be optimized and the amount of data
>> requested should be more specific.
> It feels like a good idea, but do you have numbers? How much traffic
> would be saved? Remember the added computation incurred on each host -
> there's always a price to pay.
>
>> For each VM the data currently contains much more information than
>> actually needed which blows up the size of the XML content quite
>> big. We could optimize this by splitting the reply on the getVmStats
>> based on the request of the engine into sections. For this reason
>> Omer Frenkel and me have split up the data into parts based on their
>> usage.
>>
>> This data can and usually does change during the lifetime of the VM.
>>
>>
>> Rarely Changed:
>>
>> This data is change not very frequent and it should be enough to
>> update this only once in a while. Most commonly this data changes
>> after changes made in the UI or after a migration of the VM to
>> another Host.
>>
>> *Status* = Running
> Status does not change much, but when it does, it is important to report
> that quickly.
For this kind of data, it is suitable to use an event report, which
should be available in the jsonrpc API.
>
>> *acpiEnable* = true
>> *vmType* = kvm
>> *guestName* = W864GUESTAGENTT
>> *displayType* = qxl
>> *guestOs* = Win 8
>> *kvmEnable* = true #/*this should be constant and never changed*/
>> *pauseCode* = NOERR
>> *monitorResponse* = 0
>> *session* = Locked # unused
>> *netIfaces* = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6': ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': '00:1a:4a:22:3c:db'}]
>> *appsList* = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2']
>> *pid* = 11314
>> *guestIPs* = 10.34.60.148 # duplicated info
>> *displayIp* = 0
>> *displayPort* = 5902
>> *displaySecurePort* = 5903
>> *username* = user at W864GUESTAGENTT
>> *clientIp* =
>> *lastLogin* = 1361976900.67
>>
>>
>> Often Changed:
>>
>> This data is changed quite often however it is not necessary to
>> update this data every 15 seconds. As this is cumulative data and
>> reflects the current status, and it does not need to be snapshotted
>> every 15 seconds to retrieve statistics. The data can be retrieved
>> in much more generous time slices. (e.g. Every 5 minutes)
>>
>> *network* = {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0', 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name': 'vnet1'}}
>> *disksUsage* = [{'path': 'c:\\', 'total': '64055406592', 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\', 'total': '3490912256', 'fs': 'UDF', 'used': '3490912256'}]
>> *timeOffset* = 14422
>> *elapsedTime* = 68591
>> *hash* = 2335461227228498964
>> *statsAge* = 0.09 # unused
>>
>>
>> Often Changed but unused
>>
>> This data does not seem to be used in the engine at all. It is *not*
>> even used in the data warehouse.
>>
>> *memoryStats* = {'swap_out': '0', 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0', 'pageflt': '0', 'mem_total': '2096736', 'mem_unused': '1466884'}
>> *balloonInfo* = {'balloon_max': 2097152, 'balloon_cur': 2097152}
>> *disks* = {'vda': {'readLatency': '0', 'apparentsize': '64424509440', 'writeLatency': '1754496', 'imageID': '28abb923-7b89-4638-84f8-1700f0b76482', 'flushLatency': '156549', 'readRate': '0.00', 'truesize': '18855059456', 'writeRate': '952.05'}, 'hdc': {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}
> I am pretty sure that {read,write,flush}Latency is collected and
> reported by Engine. `git grep writeLatency` reinforces my vague memory.
>>
>> Very frequent uppdates needed by webadmin portal:
>>
>> This data is mostly needed for the webadmin portal and might be
>> required to be updated quite often. An exception here is the
>> statsAge field, which seems to be unused by the Engine. This data
>> could be requested every 15 seconds to keep things as they are now.
>>
>> *cpuSys* = 2.32
>> *cpuUser* = 1.34
>> *memUsage* = 30
>>
>>
>> Proposed Solution for VDSM & Engine:
>>
>> We will introduce new optional parameters to getVmStats,
>> getAllVmStats and list to allow a finer grained specification of
>> data which should be included.
>>
>> *Parameter:* *statsType*=/*<string>*/ (getVmStats, getAllVmStats
>> only) *Allowed values:*
>>
>> * full (default to keep backwards compatibility)
>> * app-list (Just send the application list)
>> * rare (include everything from rarely changed to very frequent)
>> * often (include everything from often changed to very frequent)
>> * frequent (only send the very frequently changed items)
> I think that a nice way to think of this, is that Engine ask for a set
> of keys it is interested about. Asking for getVmStats(keys=[displayType,
> netIfaces]) would return only the requrested values of the VM.
+1. It could split the information according to different functions,
not just change frequency.
> "full",
> "rare", "often" and "frequent" are simply pre-defined sets of key names.
>
> A side effect of this pov is that we can avoid the vague name
> "statsType".
>
>>
>> *Parameter:* *clientId*=*<string>* The client id is specified by the
>> client and should be unique however constantly used.
>>
>> *Parameter:* *diff*=*<boolean>* In combination with the clientId
>> VDSM will send only differences to the previous request from the
>> named clientId. (if diff=true)
> The semantics of "diff" is not completely defined: how about complex
> structures like that of "network"? It is most likely to be reported
> every time.
>
> Since this requires a caching mechanism on vdsm side, Engine must expect
> that the cache may be evicted in any moment, and that a full list is
> received.
Every data collector should be responsible to invalidate/update the cache.
It could reduce the time to calculate the diff.
>>
>> Additional Change:
>>
>> Besides the introduction of the new parameters for list, getVmStats
>> and getAllVmStats it might make sense to include a hash for the
>> appList into the rarely changed section of the response which would
>> allow to identify changes and avoid having to sent the complete
>> appList every so often and only if the hash known to the client is
>> outdated.
>>
>> *Note:* The appList (Application List) reported by the guest agent
>> could be fully implemented on request only, as long as the guest
>> agent installed supports this. As there seems to be a request to
>> have the complete list of installed applications on all guests this
>> data could be quite extensive and a huge list. On the other hand
>> this data is only rarely visible and therefore it should not be
>> requested all the time and only on demand.
>>
>>
>> Improvement of the Guest Agent:
>>
>> As part of the proposed solution it is necessary to improve the
>> guest agent as well.
> Improving the agent may be a good idea, but I do not see the necessity
> in it. It's also important to improve the horrible multithreaded
> vdsm/libvirt statistics acquisition, but just as unrelated to the core
> of this feature.
>
>> For the full application list there should be
>> implemented a caching system which will be fully reactive and should
>> not poll the application list for example all the time. The guest
>> can create a prepared data file containing all data in the JSON
>> format (as used for the communication with VDSM via VIO) and just
>> have to read that file from disk and directly sends it to VDSM.
>> However it is quite possible that this list is to big and it might
>> have to be chunked into pieces. (Multiple messages, which would have
>> to be supported by VDSM then as well) The solution for this is to
>> make VDSM request this data and it will retrieve the data necessary
>> on request only.
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel at lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
More information about the Engine-devel
mailing list