Hi,

I created a blogpost on how to monitor your oVirt datacenter with Prometheus [1].

This way you get access to real time metrics and can use the powerful Prometheus query language, the Prometheus Alertmanager and visualization tools like Grafana.

Tasks like finding hosts which are underutilized, finding VMs which are not behaving or even comparing resource usage profiles of whole cluster over days become only a matter of writing the right Prometheus query. Further you can use the Prometheus Alertmanager for alerting.

An example which is also part of the post [1]: Visualizing the sum of the CPU usage of all VMs for ever host is as simple as

sum(vm_cpu_user) by (host).

To close the gap between oVirt Engine and Prometheus I had to create two services vdsm-prometheus (github[2], docker hub[3]) and ovirt-prometheus-bridge (github[4], copr[5]).

An ansible role [6] to roll out vdsm-prometheus to all oVirt 3.5 or 3.6 hosts exists too.

So make sure to checkout the post [1] and let me know if you find this useful.

Best Regards,

Roman

[1] http://rmohr.github.io/virtualization/2016/04/12/monitor-your-ovirt-datacenter-with-prometheus

[2] https://github.com/rmohr/vdsm-prometheus

[3] https://copr.fedorainfracloud.org/coprs/rfenkhuber/vdsm-prometheus/
[4] https://github.com/rmohr/ovirt-prometheus-bridge
[5] https://hub.docker.com/r/rmohr/ovirt-prometheus-bridge/
[6] https://galaxy.ansible.com/rmohr/vdsm-prometheus/