[ovirt-devel] [monitoring][collectd] the collectd virt plugin is now on par with Vdsm needs

Yaniv Bronheim ybronhei at redhat.com
Tue Feb 21 16:40:10 UTC 2017


On Tue, Feb 21, 2017 at 4:21 PM, Michal Skrivanek <
michal.skrivanek at redhat.com> wrote:

>
> > On 21 Feb 2017, at 14:44, Yaniv Kaul <ykaul at redhat.com> wrote:
> >
> >
> >
> > On Tue, Feb 21, 2017 at 1:06 PM Francesco Romani <fromani at redhat.com>
> wrote:
> > Hello everyone,
> >
> >
> > in the last weeks I've been submitting PRs to collectd upstream, to
> > bring the virt plugin up to date with Vdsm and oVirt needs.
> >
> > Previously, the collectd virt plugin reported only a subset of metrics
> > oVirt uses.
> >
> > In current collectd master, the collectd virt plugin provides all the
> > data Vdsm (thus Engine) needs. This means that it is now
> >
> > possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and
> > have the same data.
> >
> > Do we wish to ship the unixsock collectd plugin? I'm not sure we do
> these days (4.1).
> > We can do that later, of course, when we ship this.
>
> we haven’t decided on the actual solution yet, unixsocket is one
> possibility.
> it is tracked in https://trello.com/c/alAOm1tQ
>
> we can also have engine pulling it from collectd remotely, then we can
> eliminate periodic get vm stats
> or another (crazy) option to use fluentd to push data straight to engine’s
> postgres:)
>

why crazy? sounds like where we want to be, without going through vdsm at
all

>
> Thanks,
> michal
>
> > Y.
> >
> >
> >
> > There are only two caveats:
> >
> > 1. it is yet to be seen which version of collectd will ship all those
> > enhancements
> >
> > 2. collectd *intentionally* report metrics as rates, not as absolute
> > values as Vdsm does. This may be one issue in presence of restarts/data
> > loss in the link between collectd and the metrics store.
> >
> >
> > Please keep reading for more details:
> >
> >
> > How to get the code?
> >
> > --------------------------------
> >
> > This somehow tricky until we get one official release. If one is
> > familiar with the RPM build process, it is easy to build one custom
> packages
> >
> > from a snapshot from collectd master
> > (https://github.com/collectd/collectd) and a recent 5.7.1 RPM (like
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=835669)
> >
> >
> > How to configure it?
> >
> > ------------------------------
> >
> > Most thing work out of the box. One currently in progress Vdsm patch
> > ships the recommended configuration
> > https://gerrit.ovirt.org/#/c/71176/6/static/etc/collectd.d/virt.conf
> >
> > The meaning of the configuration option is documented in man 5
> collectd.conf
> >
> >
> > How it looks like?
> >
> > --------------------------
> >
> >
> > Let me post one "screenshot" :)
> >
> >
> >
> >   $ collectdctl listval | grep a0
> >   a0/virt/disk_octets-hdc
> >   a0/virt/disk_octets-vda
> >   a0/virt/disk_ops-hdc
> >   a0/virt/disk_ops-vda
> >   a0/virt/disk_time-hdc
> >   a0/virt/disk_time-vda
> >   a0/virt/if_dropped-vnet0
> >   a0/virt/if_errors-vnet0
> >   a0/virt/if_octets-vnet0
> >   a0/virt/if_packets-vnet0
> >   a0/virt/memory-actual_balloon
> >   a0/virt/memory-rss
> >   a0/virt/memory-total
> >   a0/virt/ps_cputime
> >   a0/virt/total_requests-flush-hdc
> >   a0/virt/total_requests-flush-vda
> >   a0/virt/total_time_in_ms-flush-hdc
> >   a0/virt/total_time_in_ms-flush-vda
> >   a0/virt/virt_cpu_total
> >   a0/virt/virt_vcpu-0
> >   a0/virt/virt_vcpu-1
> >
> >
> > How to consume the data?
> > -----------------------------------------
> >
> > Among the ways to query collectd, the two most popular (and most fitting
> > for oVirt use case) ways are perhaps the network protocol
> > (https://collectd.org/wiki/index.php/Binary_protocol)
> > and the plain text protocol
> > (https://collectd.org/wiki/index.php/Plain_text_protocol). The first
> > could be used by Engine to get the data directly, or to consolidate the
> > metrics in one database (e.g to run any kind of query, for historical
> > series...).
> > The latter will be used by Vdsm to keep reporting the metrics (again
> > https://gerrit.ovirt.org/#/c/71176/6)
> >
> > Please note that the performance of the plain text protocol are known to
> > be lower than the binary protocol
> >
> > What about the unresponsive hosts?
> > -------------------------------------------------------
> >
> > We know from experience that hosts may become unresponsive, and this can
> > disrupt monitoring. however, we do want to keep monitoring the
> > responsive hosts, avoiding that one rogue hosts makes us lose all the
> > monitoring data.
> > To  cope with this need, the virt plugin gained support for "partition
> > tag". With this, we can group VMs together using one arbitrary tag. This
> > is completely transparent to collectd, and also completely optional.
> > oVirt can use this tag to group VMs per-storage-domain, or however it
> > sees fit, trying to minimize the disruption should one host become
> > unresponsive.
> >
> > Read the full docs here:
> > https://github.com/collectd/collectd/commit/
> 999efc28d8e2e96bc15f535254d412a79755ca4f
> >
> >
> > What about the collectd-ovirt plugin?
> > --------------------------------------------------------
> >
> > Some time ago I implemented one out-of-tree collectd plugin leveraging
> > the libvirt bulk stats: https://github.com/fromanirh/collectd-ovirt
> > This plugin is meant to be a modern, drop-in replacement for the
> > existing virt plugin.
> > The development of that out of tree plugin is now halted, because we
> > have everything we need in the upstream collectd plugin.
> >
> > Future work
> > ------------------
> >
> > We believe we have reached feature parity, so we are looking for
> > bugixes/performance tuning in the near term future. I'll be happy to
> > provide more patches/PRs about that.
> >
> >
> >
> > Thanks and bests,
> >
> > --
> > Francesco Romani
> > Red Hat Engineering Virtualization R & D
> > IRC: fromani
> >
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>



-- 
*Yaniv Bronhaim.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170221/83768ab1/attachment.html>


More information about the Devel mailing list