[ovirt-devel] [monitoring][collectd] the collectd virt plugin is now on par with Vdsm needs
Roy Golan
rgolan at redhat.com
Tue Feb 21 17:29:19 UTC 2017
On Tue, Feb 21, 2017 at 6:40 PM Yaniv Bronheim <ybronhei at redhat.com> wrote:
> On Tue, Feb 21, 2017 at 4:21 PM, Michal Skrivanek <
> michal.skrivanek at redhat.com> wrote:
>
>
> > On 21 Feb 2017, at 14:44, Yaniv Kaul <ykaul at redhat.com> wrote:
> >
> >
> >
> > On Tue, Feb 21, 2017 at 1:06 PM Francesco Romani <fromani at redhat.com>
> wrote:
> > Hello everyone,
> >
> >
> > in the last weeks I've been submitting PRs to collectd upstream, to
> > bring the virt plugin up to date with Vdsm and oVirt needs.
> >
> > Previously, the collectd virt plugin reported only a subset of metrics
> > oVirt uses.
> >
> > In current collectd master, the collectd virt plugin provides all the
> > data Vdsm (thus Engine) needs. This means that it is now
> >
> > possible for Vdsm or Engine to query collectd, not Vdsm/libvirt, and
> > have the same data.
> >
> > Do we wish to ship the unixsock collectd plugin? I'm not sure we do
> these days (4.1).
> > We can do that later, of course, when we ship this.
>
> we haven’t decided on the actual solution yet, unixsocket is one
> possibility.
> it is tracked in https://trello.com/c/alAOm1tQ
>
> we can also have engine pulling it from collectd remotely, then we can
> eliminate periodic get vm stats
> or another (crazy) option to use fluentd to push data straight to engine’s
> postgres:)
>
>
I think that we should start first with changing engine internal -
introduce a client and service to fetch stats in the engine. It's
implementation should be flexible but with stable API. First it will fetch
all stats from the db, later can change. The engine internals that needs
stats should imediatly start to adapt - instead of db calls or direct
entity methods, fetch from a (REST probably) service. After that the door
is open to whatever needed.
>
> why crazy? sounds like where we want to be, without going through vdsm at
> all
>
>
> Thanks,
> michal
>
> > Y.
> >
> >
> >
> > There are only two caveats:
> >
> > 1. it is yet to be seen which version of collectd will ship all those
> > enhancements
> >
> > 2. collectd *intentionally* report metrics as rates, not as absolute
> > values as Vdsm does. This may be one issue in presence of restarts/data
> > loss in the link between collectd and the metrics store.
> >
> >
> > Please keep reading for more details:
> >
> >
> > How to get the code?
> >
> > --------------------------------
> >
> > This somehow tricky until we get one official release. If one is
> > familiar with the RPM build process, it is easy to build one custom
> packages
> >
> > from a snapshot from collectd master
> > (https://github.com/collectd/collectd) and a recent 5.7.1 RPM (like
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=835669)
> >
> >
> > How to configure it?
> >
> > ------------------------------
> >
> > Most thing work out of the box. One currently in progress Vdsm patch
> > ships the recommended configuration
> > https://gerrit.ovirt.org/#/c/71176/6/static/etc/collectd.d/virt.conf
> >
> > The meaning of the configuration option is documented in man 5
> collectd.conf
> >
> >
> > How it looks like?
> >
> > --------------------------
> >
> >
> > Let me post one "screenshot" :)
> >
> >
> >
> > $ collectdctl listval | grep a0
> > a0/virt/disk_octets-hdc
> > a0/virt/disk_octets-vda
> > a0/virt/disk_ops-hdc
> > a0/virt/disk_ops-vda
> > a0/virt/disk_time-hdc
> > a0/virt/disk_time-vda
> > a0/virt/if_dropped-vnet0
> > a0/virt/if_errors-vnet0
> > a0/virt/if_octets-vnet0
> > a0/virt/if_packets-vnet0
> > a0/virt/memory-actual_balloon
> > a0/virt/memory-rss
> > a0/virt/memory-total
> > a0/virt/ps_cputime
> > a0/virt/total_requests-flush-hdc
> > a0/virt/total_requests-flush-vda
> > a0/virt/total_time_in_ms-flush-hdc
> > a0/virt/total_time_in_ms-flush-vda
> > a0/virt/virt_cpu_total
> > a0/virt/virt_vcpu-0
> > a0/virt/virt_vcpu-1
> >
> >
> > How to consume the data?
> > -----------------------------------------
> >
> > Among the ways to query collectd, the two most popular (and most fitting
> > for oVirt use case) ways are perhaps the network protocol
> > (https://collectd.org/wiki/index.php/Binary_protocol)
> > and the plain text protocol
> > (https://collectd.org/wiki/index.php/Plain_text_protocol). The first
> > could be used by Engine to get the data directly, or to consolidate the
> > metrics in one database (e.g to run any kind of query, for historical
> > series...).
> > The latter will be used by Vdsm to keep reporting the metrics (again
> > https://gerrit.ovirt.org/#/c/71176/6)
> >
> > Please note that the performance of the plain text protocol are known to
> > be lower than the binary protocol
> >
> > What about the unresponsive hosts?
> > -------------------------------------------------------
> >
> > We know from experience that hosts may become unresponsive, and this can
> > disrupt monitoring. however, we do want to keep monitoring the
> > responsive hosts, avoiding that one rogue hosts makes us lose all the
> > monitoring data.
> > To cope with this need, the virt plugin gained support for "partition
> > tag". With this, we can group VMs together using one arbitrary tag. This
> > is completely transparent to collectd, and also completely optional.
> > oVirt can use this tag to group VMs per-storage-domain, or however it
> > sees fit, trying to minimize the disruption should one host become
> > unresponsive.
> >
> > Read the full docs here:
> >
> https://github.com/collectd/collectd/commit/999efc28d8e2e96bc15f535254d412a79755ca4f
> >
> >
> > What about the collectd-ovirt plugin?
> > --------------------------------------------------------
> >
> > Some time ago I implemented one out-of-tree collectd plugin leveraging
> > the libvirt bulk stats: https://github.com/fromanirh/collectd-ovirt
> > This plugin is meant to be a modern, drop-in replacement for the
> > existing virt plugin.
> > The development of that out of tree plugin is now halted, because we
> > have everything we need in the upstream collectd plugin.
> >
> > Future work
> > ------------------
> >
> > We believe we have reached feature parity, so we are looking for
> > bugixes/performance tuning in the near term future. I'll be happy to
> > provide more patches/PRs about that.
> >
> >
> >
> > Thanks and bests,
> >
> > --
> > Francesco Romani
> > Red Hat Engineering Virtualization R & D
> > IRC: fromani
> >
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
>
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
>
>
>
> --
> *Yaniv Bronhaim.*
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20170221/dc45ba0b/attachment-0001.html>
More information about the Devel
mailing list