[ovirt-devel] [monitoring][collectd] the collectd virt plugin is now on par with Vdsm needs

Yaniv Kaul ykaul at redhat.com
Tue Feb 21 13:44:39 UTC 2017


On Tue, Feb 21, 2017 at 1:06 PM Francesco Romani <fromani at redhat.com> wrote:

> Hello everyone,
>
>
> in the last weeks I've been submitting PRs to collectd upstream, to
> bring the virt plugin up to date with Vdsm and oVirt needs.
>
> Previously, the collectd virt plugin reported only a subset of metrics
> oVirt uses.
>
> In current collectd master, the collectd virt plugin provides all the
> data Vdsm (thus Engine) needs. This means that it is now
>
> possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and
> have the same data.
>

Do we wish to ship the unixsock collectd plugin? I'm not sure we do these
days (4.1).
We can do that later, of course, when we ship this.
Y.


>
> There are only two caveats:
>
> 1. it is yet to be seen which version of collectd will ship all those
> enhancements
>
> 2. collectd *intentionally* report metrics as rates, not as absolute
> values as Vdsm does. This may be one issue in presence of restarts/data
> loss in the link between collectd and the metrics store.
>
>
> Please keep reading for more details:
>
>
> How to get the code?
>
> --------------------------------
>
> This somehow tricky until we get one official release. If one is
> familiar with the RPM build process, it is easy to build one custom
> packages
>
> from a snapshot from collectd master
> (https://github.com/collectd/collectd) and a recent 5.7.1 RPM (like
> https://koji.fedoraproject.org/koji/buildinfo?buildID=835669)
>
>
> How to configure it?
>
> ------------------------------
>
> Most thing work out of the box. One currently in progress Vdsm patch
> ships the recommended configuration
> https://gerrit.ovirt.org/#/c/71176/6/static/etc/collectd.d/virt.conf
>
> The meaning of the configuration option is documented in man 5
> collectd.conf
>
>
> How it looks like?
>
> --------------------------
>
>
> Let me post one "screenshot" :)
>
>
>
>   $ collectdctl listval | grep a0
>   a0/virt/disk_octets-hdc
>   a0/virt/disk_octets-vda
>   a0/virt/disk_ops-hdc
>   a0/virt/disk_ops-vda
>   a0/virt/disk_time-hdc
>   a0/virt/disk_time-vda
>   a0/virt/if_dropped-vnet0
>   a0/virt/if_errors-vnet0
>   a0/virt/if_octets-vnet0
>   a0/virt/if_packets-vnet0
>   a0/virt/memory-actual_balloon
>   a0/virt/memory-rss
>   a0/virt/memory-total
>   a0/virt/ps_cputime
>   a0/virt/total_requests-flush-hdc
>   a0/virt/total_requests-flush-vda
>   a0/virt/total_time_in_ms-flush-hdc
>   a0/virt/total_time_in_ms-flush-vda
>   a0/virt/virt_cpu_total
>   a0/virt/virt_vcpu-0
>   a0/virt/virt_vcpu-1
>
>
> How to consume the data?
> -----------------------------------------
>
> Among the ways to query collectd, the two most popular (and most fitting
> for oVirt use case) ways are perhaps the network protocol
> (https://collectd.org/wiki/index.php/Binary_protocol)
> and the plain text protocol
> (https://collectd.org/wiki/index.php/Plain_text_protocol). The first
> could be used by Engine to get the data directly, or to consolidate the
> metrics in one database (e.g to run any kind of query, for historical
> series...).
> The latter will be used by Vdsm to keep reporting the metrics (again
> https://gerrit.ovirt.org/#/c/71176/6)
>
> Please note that the performance of the plain text protocol are known to
> be lower than the binary protocol
>
> What about the unresponsive hosts?
> -------------------------------------------------------
>
> We know from experience that hosts may become unresponsive, and this can
> disrupt monitoring. however, we do want to keep monitoring the
> responsive hosts, avoiding that one rogue hosts makes us lose all the
> monitoring data.
> To  cope with this need, the virt plugin gained support for "partition
> tag". With this, we can group VMs together using one arbitrary tag. This
> is completely transparent to collectd, and also completely optional.
> oVirt can use this tag to group VMs per-storage-domain, or however it
> sees fit, trying to minimize the disruption should one host become
> unresponsive.
>
> Read the full docs here:
>
> https://github.com/collectd/collectd/commit/999efc28d8e2e96bc15f535254d412a79755ca4f
>
>
> What about the collectd-ovirt plugin?
> --------------------------------------------------------
>
> Some time ago I implemented one out-of-tree collectd plugin leveraging
> the libvirt bulk stats: https://github.com/fromanirh/collectd-ovirt
> This plugin is meant to be a modern, drop-in replacement for the
> existing virt plugin.
> The development of that out of tree plugin is now halted, because we
> have everything we need in the upstream collectd plugin.
>
> Future work
> ------------------
>
> We believe we have reached feature parity, so we are looking for
> bugixes/performance tuning in the near term future. I'll be happy to
> provide more patches/PRs about that.
>
>
>
> Thanks and bests,
>
> --
> Francesco Romani
> Red Hat Engineering Virtualization R & D
> IRC: fromani
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170221/5ce503e5/attachment-0001.html>


More information about the Devel mailing list