Hi,
I created a blogpost on how to monitor your oVirt datacenter with
Prometheus [1].
This way you get access to real time metrics and can use the powerful
Prometheus query language, the Prometheus Alertmanager and visualization
tools like Grafana.
Tasks like finding hosts which are underutilized, finding VMs which are not
behaving or even comparing resource usage profiles of whole cluster over
days become only a matter of writing the right Prometheus query. Further
you can use the Prometheus Alertmanager for alerting.
An example which is also part of the post [1]: Visualizing the sum of the
CPU usage of all VMs for ever host is as simple as
sum(vm_cpu_user) by (host).
To close the gap between oVirt Engine and Prometheus I had to create two
services vdsm-prometheus (github[2], docker hub[3]) and
ovirt-prometheus-bridge (github[4], copr[5]).
An ansible role [6] to roll out vdsm-prometheus to all oVirt 3.5 or 3.6
hosts exists too.
So make sure to checkout the post [1] and let me know if you find this
useful.
Best Regards,
Roman
[1]
http://rmohr.github.io/virtualization/2016/04/12/monitor-your-ovirt-datac...
[2]
https://github.com/rmohr/vdsm-prometheus
[3]
https://copr.fedorainfracloud.org/coprs/rfenkhuber/vdsm-prometheus/
[4]
https://github.com/rmohr/ovirt-prometheus-bridge
[5]
https://hub.docker.com/r/rmohr/ovirt-prometheus-bridge/
[6]
https://galaxy.ansible.com/rmohr/vdsm-prometheus/