Re: [ovirt-devel] [monitoring][collectd] the collectd virt plugin is now on par with Vdsm needs

Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com IRC : ydary On Feb 21, 2017 13:06, "Francesco Romani" <fromani@redhat.com> wrote: Hello everyone, in the last weeks I've been submitting PRs to collectd upstream, to bring the virt plugin up to date with Vdsm and oVirt needs. Previously, the collectd virt plugin reported only a subset of metrics oVirt uses. In current collectd master, the collectd virt plugin provides all the data Vdsm (thus Engine) needs. This means that it is now possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and have the same data. There are only two caveats: 1. it is yet to be seen which version of collectd will ship all those enhancements 2. collectd *intentionally* report metrics as rates, not as absolute values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store. How does this work? If we want to show memory usage over time for example, we need to have the usage, not the rate. How would this be reported? Please keep reading for more details: How to get the code? -------------------------------- This somehow tricky until we get one official release. If one is familiar with the RPM build process, it is easy to build one custom packages from a snapshot from collectd master (https://github.com/collectd/collectd) and a recent 5.7.1 RPM (like https://koji.fedoraproject.org/koji/buildinfo?buildID=835669) How to configure it? ------------------------------ Most thing work out of the box. One currently in progress Vdsm patch ships the recommended configuration https://gerrit.ovirt.org/#/c/71176/6/static/etc/collectd.d/virt.conf The meaning of the configuration option is documented in man 5 collectd.conf How it looks like? -------------------------- Let me post one "screenshot" :) $ collectdctl listval | grep a0 a0/virt/disk_octets-hdc a0/virt/disk_octets-vda a0/virt/disk_ops-hdc a0/virt/disk_ops-vda a0/virt/disk_time-hdc a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 a0/virt/if_errors-vnet0 a0/virt/if_octets-vnet0 a0/virt/if_packets-vnet0 a0/virt/memory-actual_balloon a0/virt/memory-rss a0/virt/memory-total a0/virt/ps_cputime a0/virt/total_requests-flush-hdc a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total a0/virt/virt_vcpu-0 a0/virt/virt_vcpu-1 How to consume the data? ----------------------------------------- Among the ways to query collectd, the two most popular (and most fitting for oVirt use case) ways are perhaps the network protocol (https://collectd.org/wiki/index.php/Binary_protocol) and the plain text protocol (https://collectd.org/wiki/index.php/Plain_text_protocol). The first could be used by Engine to get the data directly, or to consolidate the metrics in one database (e.g to run any kind of query, for historical series...). The latter will be used by Vdsm to keep reporting the metrics (again https://gerrit.ovirt.org/#/c/71176/6) Please note that the performance of the plain text protocol are known to be lower than the binary protocol What about the unresponsive hosts? ------------------------------------------------------- We know from experience that hosts may become unresponsive, and this can disrupt monitoring. however, we do want to keep monitoring the responsive hosts, avoiding that one rogue hosts makes us lose all the monitoring data. To cope with this need, the virt plugin gained support for "partition tag". With this, we can group VMs together using one arbitrary tag. This is completely transparent to collectd, and also completely optional. oVirt can use this tag to group VMs per-storage-domain, or however it sees fit, trying to minimize the disruption should one host become unresponsive. Read the full docs here: https://github.com/collectd/collectd/commit/999efc28d8e2e96bc15f535254d412 a79755ca4f What about the collectd-ovirt plugin? -------------------------------------------------------- Some time ago I implemented one out-of-tree collectd plugin leveraging the libvirt bulk stats: https://github.com/fromanirh/collectd-ovirt This plugin is meant to be a modern, drop-in replacement for the existing virt plugin. The development of that out of tree plugin is now halted, because we have everything we need in the upstream collectd plugin. Future work ------------------ We believe we have reached feature parity, so we are looking for bugixes/performance tuning in the near term future. I'll be happy to provide more patches/PRs about that. Thanks and bests, -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

This is a multi-part message in MIME format. --------------D9BDB513741273D7BA913655 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 02/21/2017 11:55 PM, Yaniv Dary wrote:
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109
Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com <mailto:ydary@redhat.com> IRC : ydary
On Feb 21, 2017 13:06, "Francesco Romani" <fromani@redhat.com <mailto:fromani@redhat.com>> wrote:
Hello everyone,
in the last weeks I've been submitting PRs to collectd upstream, to bring the virt plugin up to date with Vdsm and oVirt needs.
Previously, the collectd virt plugin reported only a subset of metrics oVirt uses.
In current collectd master, the collectd virt plugin provides all the data Vdsm (thus Engine) needs. This means that it is now
possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and have the same data.
There are only two caveats:
1. it is yet to be seen which version of collectd will ship all those enhancements
2. collectd *intentionally* report metrics as rates, not as absolute values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store.
How does this work? If we want to show memory usage over time for example, we need to have the usage, not the rate. How would this be reported?
I was imprecise, my fault. Let me retry: collectd intentionally report quite a lot of metrics we care about as rates, not as absolute values. Memory is actually ok fine. a0/virt/disk_octets-hdc -> rate a0/virt/disk_octets-vda a0/virt/disk_ops-hdc -> rate a0/virt/disk_ops-vda a0/virt/disk_time-hdc -> rate a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 -> rate a0/virt/if_errors-vnet0 -> rate a0/virt/if_octets-vnet0 -> rate a0/virt/if_packets-vnet0 -> rate a0/virt/memory-actual_balloon -> absolute a0/virt/memory-rss -> absolute a0/virt/memory-total -> absolute a0/virt/ps_cputime -> rate a0/virt/total_requests-flush-hdc -> rate a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc -> rate a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total -> rate a0/virt/virt_vcpu-0 -> rate a0/virt/virt_vcpu-1 collectd "just" reports the changes since the last sampling. I'm not sure which is the best way to handle that; I've sent a mail to collectd list some time ago, no answer so far. -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani --------------D9BDB513741273D7BA913655 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> On 02/21/2017 11:55 PM, Yaniv Dary wrote:<br> <blockquote cite="mid:CACKMAy9CwXBZpJSOXqpfKg8j4O7hGn4qR94VXAKyWxtGroUD8w@mail.gmail.com" type="cite"> <div dir="auto"> <div><br> <br> <div data-smartmail="gmail_signature">Yaniv Dary<br> Technical Product Manager<br> Red Hat Israel Ltd.<br> 34 Jerusalem Road<br> Building A, 4th floor<br> Ra'anana, Israel 4350109<br> <br> Tel : +972 (9) 7692306<br> 8272306<br> Email: <a moz-do-not-send="true" href="mailto:ydary@redhat.com">ydary@redhat.com</a><br> IRC : ydary</div> <div class="gmail_extra"><br> <div class="gmail_quote">On Feb 21, 2017 13:06, "Francesco Romani" <<a moz-do-not-send="true" href="mailto:fromani@redhat.com">fromani@redhat.com</a>> wrote:<br type="attribution"> <blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello everyone,<br> <br> <br> in the last weeks I've been submitting PRs to collectd upstream, to<br> bring the virt plugin up to date with Vdsm and oVirt needs.<br> <br> Previously, the collectd virt plugin reported only a subset of metrics<br> oVirt uses.<br> <br> In current collectd master, the collectd virt plugin provides all the<br> data Vdsm (thus Engine) needs. This means that it is now<br> <br> possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and<br> have the same data.<br> <br> <br> There are only two caveats:<br> <br> 1. it is yet to be seen which version of collectd will ship all those<br> enhancements<br> <br> 2. collectd *intentionally* report metrics as rates, not as absolute<br> values as Vdsm does. This may be one issue in presence of restarts/data<br> loss in the link between collectd and the metrics store.<br> </blockquote> </div> </div> </div> <div dir="auto"><br> </div> <div dir="auto">How does this work? </div> <div dir="auto">If we want to show memory usage over time for example, we need to have the usage, not the rate. </div> <div dir="auto">How would this be reported? <br> </div> </div> </blockquote> <br> I was imprecise, my fault.<br> <br> Let me retry:<br> collectd intentionally report quite a lot of metrics we care about as rates, not as absolute values.<br> Memory is actually ok fine.<br> <br> a0/virt/disk_octets-hdc -> rate<br> a0/virt/disk_octets-vda<br> a0/virt/disk_ops-hdc -> rate<br> a0/virt/disk_ops-vda<br> a0/virt/disk_time-hdc -> rate<br> a0/virt/disk_time-vda<br> a0/virt/if_dropped-vnet0 -> rate<br> a0/virt/if_errors-vnet0 -> rate<br> a0/virt/if_octets-vnet0 -> rate<br> a0/virt/if_packets-vnet0 -> rate<br> a0/virt/memory-actual_balloon -> absolute<br> a0/virt/memory-rss -> absolute<br> a0/virt/memory-total -> absolute<br> a0/virt/ps_cputime -> rate<br> a0/virt/total_requests-flush-hdc -> rate<br> a0/virt/total_requests-flush-vda<br> a0/virt/total_time_in_ms-flush-hdc -> rate<br> a0/virt/total_time_in_ms-flush-vda<br> a0/virt/virt_cpu_total -> rate<br> a0/virt/virt_vcpu-0 -> rate<br> a0/virt/virt_vcpu-1<br> <br> collectd "just" reports the changes since the last sampling. I'm not sure which is the best way to handle that; I've sent a mail to collectd list some time ago, no answer so far.<br> <br> <br> <br> <pre class="moz-signature" cols="72">-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani</pre> </body> </html> --------------D9BDB513741273D7BA913655--

Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com IRC : ydary On Wed, Feb 22, 2017 at 5:57 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/21/2017 11:55 PM, Yaniv Dary wrote:
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109
Tel : +972 (9) 7692306 <+972%209-769-2306> 8272306 Email: ydary@redhat.com IRC : ydary
On Feb 21, 2017 13:06, "Francesco Romani" <fromani@redhat.com> wrote:
Hello everyone,
in the last weeks I've been submitting PRs to collectd upstream, to bring the virt plugin up to date with Vdsm and oVirt needs.
Previously, the collectd virt plugin reported only a subset of metrics oVirt uses.
In current collectd master, the collectd virt plugin provides all the data Vdsm (thus Engine) needs. This means that it is now
possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and have the same data.
There are only two caveats:
1. it is yet to be seen which version of collectd will ship all those enhancements
2. collectd *intentionally* report metrics as rates, not as absolute values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store.
How does this work? If we want to show memory usage over time for example, we need to have the usage, not the rate. How would this be reported?
I was imprecise, my fault.
Let me retry: collectd intentionally report quite a lot of metrics we care about as rates, not as absolute values. Memory is actually ok fine.
a0/virt/disk_octets-hdc -> rate a0/virt/disk_octets-vda a0/virt/disk_ops-hdc -> rate a0/virt/disk_ops-vda a0/virt/disk_time-hdc -> rate a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 -> rate a0/virt/if_errors-vnet0 -> rate a0/virt/if_octets-vnet0 -> rate a0/virt/if_packets-vnet0 -> rate a0/virt/memory-actual_balloon -> absolute a0/virt/memory-rss -> absolute a0/virt/memory-total -> absolute a0/virt/ps_cputime -> rate a0/virt/total_requests-flush-hdc -> rate a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc -> rate a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total -> rate a0/virt/virt_vcpu-0 -> rate a0/virt/virt_vcpu-1
collectd "just" reports the changes since the last sampling. I'm not sure which is the best way to handle that; I've sent a mail to collectd list some time ago, no answer so far.
Can you CC on that thread? I don't know how ES would work with rates at all. I want to be able to show CPU usage over time and I need to know if its 80% or 10%.
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

This is a multi-part message in MIME format. --------------A2FDAF300209D96688B3E8CC Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 02/26/2017 03:13 PM, Yaniv Dary wrote:
2. collectd *intentionally* report metrics as rates, not as absolute values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store.
How does this work? If we want to show memory usage over time for example, we need to have the usage, not the rate. How would this be reported?
I was imprecise, my fault.
Let me retry: collectd intentionally report quite a lot of metrics we care about as rates, not as absolute values. Memory is actually ok fine.
a0/virt/disk_octets-hdc -> rate a0/virt/disk_octets-vda a0/virt/disk_ops-hdc -> rate a0/virt/disk_ops-vda a0/virt/disk_time-hdc -> rate a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 -> rate a0/virt/if_errors-vnet0 -> rate a0/virt/if_octets-vnet0 -> rate a0/virt/if_packets-vnet0 -> rate a0/virt/memory-actual_balloon -> absolute a0/virt/memory-rss -> absolute a0/virt/memory-total -> absolute a0/virt/ps_cputime -> rate a0/virt/total_requests-flush-hdc -> rate a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc -> rate a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total -> rate a0/virt/virt_vcpu-0 -> rate a0/virt/virt_vcpu-1
collectd "just" reports the changes since the last sampling. I'm not sure which is the best way to handle that; I've sent a mail to collectd list some time ago, no answer so far.
Can you CC on that thread? I don't know how ES would work with rates at all. I want to be able to show CPU usage over time and I need to know if its 80% or 10%.
Thanks to the awkward gmail interface I can't reply to myself and CC other people, but I can share the link: https://mailman.verplant.org/pipermail/collectd/2017-January/006965.html -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani --------------A2FDAF300209D96688B3E8CC Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <br> <div class="moz-cite-prefix">On 02/26/2017 03:13 PM, Yaniv Dary wrote:<br> </div> <blockquote cite="mid:CACKMAy_kfX=LCNe+7EFD2Cjw_Wit+aDS8FR+RPKEo=cMXU+H+A@mail.gmail.com" type="cite"> <div dir="ltr"><br> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"> <div> <div class="h5"> <blockquote type="cite"> <div dir="auto"> <div> <div class="gmail_extra"> <div class="gmail_quote"> <blockquote class="m_-684850395146887592quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2. collectd *intentionally* report metrics as rates, not as absolute<br> values as Vdsm does. This may be one issue in presence of restarts/data<br> loss in the link between collectd and the metrics store.<br> </blockquote> </div> </div> </div> <div dir="auto"><br> </div> <div dir="auto">How does this work? </div> <div dir="auto">If we want to show memory usage over time for example, we need to have the usage, not the rate. </div> <div dir="auto">How would this be reported? <br> </div> </div> </blockquote> <br> </div> </div> I was imprecise, my fault.<br> <br> Let me retry:<br> collectd intentionally report quite a lot of metrics we care about as rates, not as absolute values.<br> Memory is actually ok fine.<br> <br> a0/virt/disk_octets-hdc -> rate<br> a0/virt/disk_octets-vda<br> a0/virt/disk_ops-hdc -> rate<br> a0/virt/disk_ops-vda<br> a0/virt/disk_time-hdc -> rate<br> a0/virt/disk_time-vda<br> a0/virt/if_dropped-vnet0 -> rate<br> a0/virt/if_errors-vnet0 -> rate<br> a0/virt/if_octets-vnet0 -> rate<br> a0/virt/if_packets-vnet0 -> rate<br> a0/virt/memory-actual_balloon -> absolute<br> a0/virt/memory-rss -> absolute<br> a0/virt/memory-total -> absolute<br> a0/virt/ps_cputime -> rate<br> a0/virt/total_requests-flush-<wbr>hdc -> rate<br> a0/virt/total_requests-flush-<wbr>vda<br> a0/virt/total_time_in_ms-<wbr>flush-hdc -> rate<br> a0/virt/total_time_in_ms-<wbr>flush-vda<br> a0/virt/virt_cpu_total -> rate<br> a0/virt/virt_vcpu-0 -> rate<br> a0/virt/virt_vcpu-1<br> <br> collectd "just" reports the changes since the last sampling. I'm not sure which is the best way to handle that; I've sent a mail to collectd list some time ago, no answer so far.</div> </blockquote> <div><br> </div> <div>Can you CC on that thread?</div> <div>I don't know how ES would work with rates at all. </div> <div>I want to be able to show CPU usage over time and I need to know if its 80% or 10%.</div> <br> </div> </div> </div> </blockquote> <br> Thanks to the awkward gmail interface I can't reply to myself and CC other people, but I can share the link:<br> <br> <a class="moz-txt-link-freetext" href="https://mailman.verplant.org/pipermail/collectd/2017-January/006965.html">https://mailman.verplant.org/pipermail/collectd/2017-January/006965.html</a><br> <br> <pre class="moz-signature" cols="72">-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani</pre> </body> </html> --------------A2FDAF300209D96688B3E8CC--

This is about accumulative values, I'm also asking about stats like CPU usage of the VM\Host that is not reported in absolute value. Can you bump the thread? Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com IRC : ydary On Mon, Feb 27, 2017 at 10:11 AM, Francesco Romani <fromani@redhat.com> wrote:
On 02/26/2017 03:13 PM, Yaniv Dary wrote:
2. collectd *intentionally* report metrics as rates, not as absolute
values as Vdsm does. This may be one issue in presence of restarts/data loss in the link between collectd and the metrics store.
How does this work? If we want to show memory usage over time for example, we need to have the usage, not the rate. How would this be reported?
I was imprecise, my fault.
Let me retry: collectd intentionally report quite a lot of metrics we care about as rates, not as absolute values. Memory is actually ok fine.
a0/virt/disk_octets-hdc -> rate a0/virt/disk_octets-vda a0/virt/disk_ops-hdc -> rate a0/virt/disk_ops-vda a0/virt/disk_time-hdc -> rate a0/virt/disk_time-vda a0/virt/if_dropped-vnet0 -> rate a0/virt/if_errors-vnet0 -> rate a0/virt/if_octets-vnet0 -> rate a0/virt/if_packets-vnet0 -> rate a0/virt/memory-actual_balloon -> absolute a0/virt/memory-rss -> absolute a0/virt/memory-total -> absolute a0/virt/ps_cputime -> rate a0/virt/total_requests-flush-hdc -> rate a0/virt/total_requests-flush-vda a0/virt/total_time_in_ms-flush-hdc -> rate a0/virt/total_time_in_ms-flush-vda a0/virt/virt_cpu_total -> rate a0/virt/virt_vcpu-0 -> rate a0/virt/virt_vcpu-1
collectd "just" reports the changes since the last sampling. I'm not sure which is the best way to handle that; I've sent a mail to collectd list some time ago, no answer so far.
Can you CC on that thread? I don't know how ES would work with rates at all. I want to be able to show CPU usage over time and I need to know if its 80% or 10%.
Thanks to the awkward gmail interface I can't reply to myself and CC other people, but I can share the link:
https://mailman.verplant.org/pipermail/collectd/2017-January/006965.html
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

On 02/27/2017 01:32 PM, Yaniv Dary wrote:
This is about accumulative values, I'm also asking about stats like CPU usage of the VM\Host that is not reported in absolute value. Can you bump the thread?
Done, let's see. Speaking about options: during the reviews of my pull requests we also discussed the (semi?)recommended way to report more values without adding new collectd types, which is something the collectd upstream really tries to avoid. So we could report the current values *and* the absolutes, making everyone happy; but I'm afraid this will require a new plugin, like the one I had in the working (https://github.com/fromanirh/collectd-ovirt) TL;DR: in the worst case, we have one safe fallback option. -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

We need good answers from them to why they do not support this use case. Maybe a github issue on the use case would get more attention. They should allow us to choose how to present and collect the data. Can you open one? Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 <+972%209-769-2306> 8272306 Email: ydary@redhat.com IRC : ydary On Tue, Feb 28, 2017 at 1:07 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/27/2017 01:32 PM, Yaniv Dary wrote:
This is about accumulative values, I'm also asking about stats like CPU usage of the VM\Host that is not reported in absolute value. Can you bump the thread?
Done, let's see.
Speaking about options: during the reviews of my pull requests we also discussed the (semi?)recommended way to report more values without adding new collectd types, which is something the collectd upstream really tries to avoid.
So we could report the current values *and* the absolutes, making everyone happy; but I'm afraid this will require a new plugin, like the one I had in the working (https://github.com/fromanirh/collectd-ovirt)
TL;DR: in the worst case, we have one safe fallback option.
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

On 02/28/2017 12:24 PM, Yaniv Dary wrote:
We need good answers from them to why they do not support this use case. Maybe a github issue on the use case would get more attention. They should allow us to choose how to present and collect the data. Can you open one?
I can, and I will if I get no answer in few more days. Meantime, among other things, I'm doing my homework to understand why they do like that. This is the best source of information I found so far (please check the whole thread, it's pretty short): https://mailman.verplant.org/pipermail/collectd/2013-September/005924.html Quoting part of the email """ We only came up with one use case where having the raw counter values is beneficial: If you want to calculate the average rate over arbitrary time spans, it's easier to look up the raw counter values for those points in time and go from there. However, you can also sum up the individual rates to reach the same result. Finally, when handling counter resets / overflows within this interval, integrating over / summing rates is trivial by comparison. Do you have any other use-case for raw counter values? Pro: * Handling of values becomes easier. * The rate is calculated only once, in contrast to potentially several times, which might be more efficient (currently each rate conversion involves a lookup call). * Together with (1), this removes the need for having the "types.db", which could be removed then. We were in wild agreement that this would be a worthwhile goal. Contra: * Original raw value is lost. It can be reconstructed except for a (more or less) constant offset, though. """ Looks like this change was intentional and implemented after some discussion. Bests, -- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com IRC : ydary On Tue, Feb 28, 2017 at 4:06 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/28/2017 12:24 PM, Yaniv Dary wrote:
We need good answers from them to why they do not support this use case. Maybe a github issue on the use case would get more attention. They should allow us to choose how to present and collect the data. Can you open one?
I can, and I will if I get no answer in few more days. Meantime, among other things, I'm doing my homework to understand why they do like that.
This is the best source of information I found so far (please check the whole thread, it's pretty short):
https://mailman.verplant.org/pipermail/collectd/2013-September/005924.html
Quoting part of the email
"""
We only came up with one use case where having the raw counter values is beneficial: If you want to calculate the average rate over arbitrary time spans, it's easier to look up the raw counter values for those points in time and go from there. However, you can also sum up the individual rates to reach the same result. Finally, when handling counter resets / overflows within this interval, integrating over / summing rates is trivial by comparison.
Do you have any other use-case for raw counter values?
Pro:
* Handling of values becomes easier. * The rate is calculated only once, in contrast to potentially several times, which might be more efficient (currently each rate conversion involves a lookup call). * Together with (1), this removes the need for having the "types.db", which could be removed then. We were in wild agreement that this would be a worthwhile goal.
Not for adding units: https://github.com/collectd/collectd/issues/2047
Contra:
* Original raw value is lost. It can be reconstructed except for a (more or less) constant offset, though.
How is this done?
"""
Looks like this change was intentional and implemented after some discussion.
I understand this, but most monitoring system will not know what to do with this value.
Bests,
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

Any updates on the usage of collectd with rates? Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com IRC : ydary On Tue, Feb 28, 2017 at 4:17 PM, Yaniv Dary <ydary@redhat.com> wrote:
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109
Tel : +972 (9) 7692306 <+972%209-769-2306> 8272306 Email: ydary@redhat.com IRC : ydary
On Tue, Feb 28, 2017 at 4:06 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/28/2017 12:24 PM, Yaniv Dary wrote:
We need good answers from them to why they do not support this use case. Maybe a github issue on the use case would get more attention. They should allow us to choose how to present and collect the data. Can you open one?
I can, and I will if I get no answer in few more days. Meantime, among other things, I'm doing my homework to understand why they do like that.
This is the best source of information I found so far (please check the whole thread, it's pretty short):
https://mailman.verplant.org/pipermail/collectd/2013-Septemb er/005924.html
Quoting part of the email
"""
We only came up with one use case where having the raw counter values is beneficial: If you want to calculate the average rate over arbitrary time spans, it's easier to look up the raw counter values for those points in time and go from there. However, you can also sum up the individual rates to reach the same result. Finally, when handling counter resets / overflows within this interval, integrating over / summing rates is trivial by comparison.
Do you have any other use-case for raw counter values?
Pro:
* Handling of values becomes easier. * The rate is calculated only once, in contrast to potentially several times, which might be more efficient (currently each rate conversion involves a lookup call). * Together with (1), this removes the need for having the "types.db", which could be removed then. We were in wild agreement that this would be a worthwhile goal.
Not for adding units: https://github.com/collectd/collectd/issues/2047
Contra:
* Original raw value is lost. It can be reconstructed except for a (more or less) constant offset, though.
How is this done?
"""
Looks like this change was intentional and implemented after some discussion.
I understand this, but most monitoring system will not know what to do with this value.
Bests,
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani

This is a multi-part message in MIME format. --------------E85C2B9AFE0A650E0EA7DD1A Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit No updates yet. I'll move forward filing a github issue, hoping to gather more feedback. Bests, On 03/12/2017 03:38 PM, Yaniv Dary wrote:
Any updates on the usage of collectd with rates?
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: ydary@redhat.com <mailto:ydary@redhat.com> IRC : ydary
On Tue, Feb 28, 2017 at 4:17 PM, Yaniv Dary <ydary@redhat.com <mailto:ydary@redhat.com>> wrote:
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 <tel:+972%209-769-2306> 8272306 Email: ydary@redhat.com <mailto:ydary@redhat.com> IRC : ydary
On Tue, Feb 28, 2017 at 4:06 PM, Francesco Romani <fromani@redhat.com <mailto:fromani@redhat.com>> wrote:
On 02/28/2017 12:24 PM, Yaniv Dary wrote: > We need good answers from them to why they do not support this use case. > Maybe a github issue on the use case would get more attention. They > should allow us to choose how to present and collect the data. > Can you open one? >
I can, and I will if I get no answer in few more days. Meantime, among other things, I'm doing my homework to understand why they do like that.
This is the best source of information I found so far (please check the whole thread, it's pretty short):
https://mailman.verplant.org/pipermail/collectd/2013-September/005924.html <https://mailman.verplant.org/pipermail/collectd/2013-September/005924.html>
Quoting part of the email
"""
We only came up with one use case where having the raw counter values is beneficial: If you want to calculate the average rate over arbitrary time spans, it's easier to look up the raw counter values for those points in time and go from there. However, you can also sum up the individual rates to reach the same result. Finally, when handling counter resets / overflows within this interval, integrating over / summing rates is trivial by comparison.
Do you have any other use-case for raw counter values?
Pro:
* Handling of values becomes easier. * The rate is calculated only once, in contrast to potentially several times, which might be more efficient (currently each rate conversion involves a lookup call). * Together with (1), this removes the need for having the "types.db", which could be removed then. We were in wild agreement that this would be a worthwhile goal.
Not for adding units: https://github.com/collectd/collectd/issues/2047 <https://github.com/collectd/collectd/issues/2047>
Contra:
* Original raw value is lost. It can be reconstructed except for a (more or less) constant offset, though.
How is this done?
"""
Looks like this change was intentional and implemented after some discussion.
I understand this, but most monitoring system will not know what to do with this value.
Bests,
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani --------------E85C2B9AFE0A650E0EA7DD1A Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>No updates yet. I'll move forward filing a github issue, hoping to gather more feedback.<br> </p> <p><br> </p> <p>Bests,<br> </p> <br> <div class="moz-cite-prefix">On 03/12/2017 03:38 PM, Yaniv Dary wrote:<br> </div> <blockquote cite="mid:CACKMAy8eed-UG35Fgw9R3jjcqwqQ6S6TimL9MYOob3EFh+Hvfw@mail.gmail.com" type="cite"> <div dir="ltr">Any updates on the usage of collectd with rates?</div> <div class="gmail_extra"><br clear="all"> <div> <div class="gmail_signature" data-smartmail="gmail_signature"> <div dir="ltr"> <div> <div dir="ltr"> <pre cols="72"><span style="font-family:arial,helvetica,sans-serif">Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : +972 (9) 7692306 8272306 Email: <a moz-do-not-send="true" href="mailto:ydary@redhat.com" target="_blank">ydary@redhat.com</a> IRC : ydary</span></pre> </div> </div> </div> </div> </div> <br> <div class="gmail_quote">On Tue, Feb 28, 2017 at 4:17 PM, Yaniv Dary <span dir="ltr"><<a moz-do-not-send="true" href="mailto:ydary@redhat.com" target="_blank">ydary@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"><br> <div class="gmail_extra"><span class=""><br clear="all"> <div> <div class="m_7881526957566044995gmail_signature"> <div dir="ltr"> <div> <div dir="ltr"> <pre cols="72"><span style="font-family:arial,helvetica,sans-serif">Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109 Tel : <a moz-do-not-send="true" href="tel:+972%209-769-2306" value="+97297692306" target="_blank">+972 (9) 7692306</a> 8272306 Email: <a moz-do-not-send="true" href="mailto:ydary@redhat.com" target="_blank">ydary@redhat.com</a> IRC : ydary</span></pre> </div> </div> </div> </div> </div> <br> </span> <div class="gmail_quote"> <div> <div class="h5">On Tue, Feb 28, 2017 at 4:06 PM, Francesco Romani <span dir="ltr"><<a moz-do-not-send="true" href="mailto:fromani@redhat.com" target="_blank">fromani@redhat.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_7881526957566044995gmail-"><br> On 02/28/2017 12:24 PM, Yaniv Dary wrote:<br> > We need good answers from them to why they do not support this use case.<br> > Maybe a github issue on the use case would get more attention. They<br> > should allow us to choose how to present and collect the data.<br> > Can you open one?<br> ><br> <br> </span>I can, and I will if I get no answer in few more days.<br> Meantime, among other things, I'm doing my homework to understand why<br> they do like that.<br> <br> This is the best source of information I found so far (please check the<br> whole thread, it's pretty short):<br> <br> <a moz-do-not-send="true" href="https://mailman.verplant.org/pipermail/collectd/2013-September/005924.html" rel="noreferrer" target="_blank">https://mailman.verplant.org/p<wbr>ipermail/collectd/2013-Septemb<wbr>er/005924.html</a><br> <br> Quoting part of the email<br> <br> """<br> <br> We only came up with one use case where having the raw counter values is<br> beneficial: If you want to calculate the average rate over arbitrary<br> time spans, it's easier to look up the raw counter values for those<br> points in time and go from there. However, you can also sum up the<br> individual rates to reach the same result. Finally, when handling<br> counter resets / overflows within this interval, integrating over /<br> summing rates is trivial by comparison.<br> <br> Do you have any other use-case for raw counter values?<br> <br> Pro:<br> <br> * Handling of values becomes easier.<br> * The rate is calculated only once, in contrast to potentially several<br> times, which might be more efficient (currently each rate conversion<br> involves a lookup call).<br> * Together with (1), this removes the need for having the "types.db",<br> which could be removed then. We were in wild agreement that this<br> would be a worthwhile goal.<br> </blockquote> <div><br> </div> </div> </div> <div>Not for adding units:</div> <div><a moz-do-not-send="true" href="https://github.com/collectd/collectd/issues/2047" target="_blank">https://github.com/collectd/<wbr>collectd/issues/2047</a><br> </div> <span class=""> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Contra:<br> <br> * Original raw value is lost. It can be reconstructed except for a<br> (more or less) constant offset, though.<br> </blockquote> <div><br> </div> </span> <div>How is this done?</div> <span class=""> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> """<br> <br> <br> Looks like this change was intentional and implemented after some<br> discussion.<br> </blockquote> <div><br> </div> </span> <div>I understand this, but most monitoring system will not know what to do with this value.</div> <span class=""> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> Bests,<br> <div class="m_7881526957566044995gmail-HOEnZb"> <div class="m_7881526957566044995gmail-h5"><br> --<br> Francesco Romani<br> Red Hat Engineering Virtualization R & D<br> IRC: fromani<br> <br> </div> </div> </blockquote> </span></div> <br> </div> </div> </blockquote> </div> <br> </div> </blockquote> <br> <pre class="moz-signature" cols="72">-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani</pre> </body> </html> --------------E85C2B9AFE0A650E0EA7DD1A--

+Yaniv Dary <ydary@redhat.com> can you help with a gap analysis to understand what values we expecet as RAW and not supplied by collectd? On Mon, Mar 13, 2017 at 9:45 AM Francesco Romani <fromani@redhat.com> wrote:
No updates yet. I'll move forward filing a github issue, hoping to gather more feedback.
Bests,
On 03/12/2017 03:38 PM, Yaniv Dary wrote:
Any updates on the usage of collectd with rates?
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109
Tel : +972 (9) 7692306 <+972%209-769-2306> 8272306 Email: ydary@redhat.com IRC : ydary
On Tue, Feb 28, 2017 at 4:17 PM, Yaniv Dary <ydary@redhat.com> wrote:
Yaniv Dary Technical Product Manager Red Hat Israel Ltd. 34 Jerusalem Road Building A, 4th floor Ra'anana, Israel 4350109
Tel : +972 (9) 7692306 <+972%209-769-2306> 8272306 Email: ydary@redhat.com IRC : ydary
On Tue, Feb 28, 2017 at 4:06 PM, Francesco Romani <fromani@redhat.com> wrote:
On 02/28/2017 12:24 PM, Yaniv Dary wrote:
We need good answers from them to why they do not support this use case. Maybe a github issue on the use case would get more attention. They should allow us to choose how to present and collect the data. Can you open one?
I can, and I will if I get no answer in few more days. Meantime, among other things, I'm doing my homework to understand why they do like that.
This is the best source of information I found so far (please check the whole thread, it's pretty short):
https://mailman.verplant.org/pipermail/collectd/2013-September/005924.html
Quoting part of the email
"""
We only came up with one use case where having the raw counter values is beneficial: If you want to calculate the average rate over arbitrary time spans, it's easier to look up the raw counter values for those points in time and go from there. However, you can also sum up the individual rates to reach the same result. Finally, when handling counter resets / overflows within this interval, integrating over / summing rates is trivial by comparison.
Do you have any other use-case for raw counter values?
Pro:
* Handling of values becomes easier. * The rate is calculated only once, in contrast to potentially several times, which might be more efficient (currently each rate conversion involves a lookup call). * Together with (1), this removes the need for having the "types.db", which could be removed then. We were in wild agreement that this would be a worthwhile goal.
Not for adding units: https://github.com/collectd/collectd/issues/2047
Contra:
* Original raw value is lost. It can be reconstructed except for a (more or less) constant offset, though.
How is this done?
"""
Looks like this change was intentional and implemented after some discussion.
I understand this, but most monitoring system will not know what to do with this value.
Bests,
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani
-- Francesco Romani Red Hat Engineering Virtualization R & D IRC: fromani
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (3)
-
Francesco Romani
-
Roy Golan
-
Yaniv Dary