[ovirt-users] Cumulative VM network usagePERIOD_TIME

Amador Pahim apahim at redhat.com
Thu Nov 20 12:27:30 UTC 2014


On 11/20/2014 09:20 AM, Amador Pahim wrote:
> Hi Lior,
>
> Thank you for this. Indeed I have seen multiple requests for this. I 
> also have a bugzilla for it: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1108144. Some comments bellow.
>
> On 11/11/2014 07:07 AM, Lior Vernia wrote:
>> Hello,
>>
>> The need to monitor cumulative VM network usage has come up several
>> times in the past; while this should be handled as part of
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1063343), in the mean time
>> I've written a small Python script that monitors those statistics,
>> attached here.
>>
>> The script polls the engine via RESTful API periodically and dumps the
>> up-to-date total usage into a file. The output is a multi-level
>> map/dictionary in JSON format, where:
>> * The top level keys are VM names.
>> * Under each VM, the next level keys are vNIC names.
>> * Under each vNIC, there are keys for total 'rx' (received) and 'tx'
>> (transmitted), where the values are in Bytes.
>>
>> The script is built to run forever. It may be stopped at any time, but
>> while it's not running VM network usage data will "be lost". When it's
>> re-run, it'll go back to accumulating data on top of its previous data.
>
> This could be mitigated if along with rx and tx data, vdsm was 
> reporting a timestamp reflecting the time when data was collected. So, 
> even with gaps, we should be able to calculate the cumulative information.

Actually vdsm is not reporting rx/tx bytes. They are "tx/rx rate". So, 
we're only able to see the average consumption for the time between 
pooling periods.

>
>> A few disclaimers:
>> * I haven't tested this with any edge cases (engine service dies, etc.).
>> * Tested this with tens of VMs, not sure it'll work fine with hundreds.
>> * The PERIOD_TIME (polling interval) should be set so that it matches
>> both the engine's and vdsm's polling interval (see comments inside the
>> script), otherwise data will be either lost or counted multiple times.
>>  From 3.4 onwards, default configuration should be fine with 15 seconds.
>
> Here we have another issue. In 3.4, 15 seconds is fine... backend and 
> vdsm are in line with 15 seconds. But up to 3.3, vdsm is pooling the 
> data every 5 seconds and backend is collecting data every 15 seconds. 
> So 2 in 3 vdsm poolings are droped. Since you're handling total bytes, 
> this might not be a big issue.

Forget the last sentence. It's a big issue since the data is not 
cumulative, but the average of the period between vdsm checks. 
bz#1066570 is the solution for precise calculations here.

>
>> * The precision of traffic measurement on a NIC is 0.1% of the
>> interface's speed over each PERIOD_TIME interval. For example, on a
>> 1Gbps vNIC, when PERIOD_TIME = 15s, data will only be measured in 15Mb
>> (~2MB) quanta. Specifically what this means is, that in this example,
>> any traffic smaller than 2MB over a 15-second period would be negligible
>> and wouldn't be recorded.
>
> Looking to the code, if "overhead" is bigger than "PERIOD_TIME", 
> cumulative data for a given period will never be accurate. Anyway the 
> script will fall in exception when that is the case (negative value 
> for time.sleep()). The mentioned timestamp reported by vdsm could drop 
> the need for the "overhead" calculation.
>
>> Knock yourselves out :)
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20141120/0d12587a/attachment-0001.html>


More information about the Users mailing list