On Tue, Oct 11, 2016 at 2:05 PM, Francesco Romani <fromani(a)redhat.com> wrote:
Hi all,
In the last 2.5 days I was exploring if and how we can integrate collectd and Vdsm.
Some comments regarding storage high watermarks only. I will comment later
on other aspects.
The final picture could look like:
1. collectd does all the monitoring and reporting currently Vdsm does
2. Engine consumes data from collectd
3. Vdsm consumes *notifications* from collectd - for few but important tasks like Drive
high water mark monitoring
Drive high watermark is our core business, we cannot outsource
it to collectd.
Vdsm will always monitor high watermarks directly from libvirt.
Benefits (aka: why to bother?):
1. less code in Vdsm / long-awaited modularization of Vdsm
2. better integration with the system, reuse of well-known components
3. more flexibility in monitoring/reporting: collectd is special purpose existing
solution
4. faster, more scalable operation because all the monitoring can be done in C
If the problem in monitoring is python, we can have small and simple
helper doing the monitoring (for storage), like ioprocess.
Setting threshhold and getting notifications when treshold is reached
sounds like the best design for monitoring drive high watermarks.
But I would like to depend on component that does *only* this task, and
service only vdsm.
3. a libvirt plugin
https://collectd.org/wiki/index.php/Plugin:virt
So, the picture is like
1. we start requiring collectd as dependency of Vdsm
2. we either configure it appropriately (collectd support config drop-ins:
/etc/collectd.d) or we document our requirements (or both)
3. collectd monitors the hosts and libvirt
4. Engine polls collectd
5. Vdsm listens from notifications
Sounds good
Should libvirt deliver us the event we need (see
https://bugzilla.redhat.com/show_bug.cgi?id=1181659),
we can just stop using collectd notifications, everything else works as previously.
Challenges:
1. Collectd does NOT consider the plugin API stable
(
https://collectd.org/wiki/index.php/Plugin_architecture#The_interface.27s...)
so the plugins should be inclueded in the main tree, much like the modules of the
linux kernel
Worth mentioning that the plugin API itself has a good deal of rough edges.
we will need to maintain this plugin ourselves, *and* we need to maintain our thin
API
layer, to make sure the plugin loads and works with recent versions of collectd.
2. the virt plugin is out of date, doesn't report some data we need: see
https://github.com/collectd/collectd/issues/1945
3. the notification message(s) are tailored for human consumption, those messages are not
easy
to parse for machines.
4. the threshold support in collectd seems to match values against constants; it
doesn't seem possible
to match a value against another one, as we need to do for high water monitoring
(capacity VS allocation).
How I'm addressing, or how I plan to address those challenges (aka action items):
1. I've been experimenting with out-of-tree plugins, and I managed develop, build,
install and run
one out-of-tree plugin:
https://github.com/mojaves/vmon/tree/master/collectd
The development pace of collectd looks sustainable, so this doesn't look such a
big deal.
Furthermore, we can engage with upstream to merge our plugins, either as-is or to
extend existing ones.
2. Write another collectd plugin based on the Vdsm python code and/or my past accelerator
executable project
(
https://github.com/mojaves/vmon)
3. patch the collectd notification code. It is yet another plugin
OR
4. send notification from the new virt module as per #2, bypassing the threshold system.
This move could preclude
the new virt module to be merged in the collectd tree.
Current status of the action items:
1. done BUT PoC quality
2. To be done (more work than #1/possible dupe with github issue)
3. need more investigation, conflicts with #4
4. need more investigation, conflicts with #3
All the code I'm working on will be found on
https://github.com/mojaves/vmon
Comments are appreciated
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani