I have different collection schedules depending on the importance of the data i'm collecting.  You can adjust as you need accordingly.

For your IO issues you can easily simply poll your machine's IO load statistics from the 5/10/15 minute averages.  That will not give you precise intervals but certainly will tell you if something is going wrong.

To be honest,  Zabbix is flexible enough to get you what you need even if you're not monitoring the metric directly.   Anything you can do to raise system visibility is good stuff!

Enjoy!

On 2024-02-13 14:32, Jorge Visentini wrote:
Hi.

I also use Zabbix here. Its problem is that it collects metrics in real time, this is not its function.
There are other alternatives like Elasticsearch + metricbeat, but from what I've tested, it's very heavy and uses a lot of disk space lol.

I never used Prometheus, I found it interesting. I'll do some tests.

@Patrick Dubois How often do you collect information with Zabbix? Every 1 minute?
Because for example... for the information to be used correctly for analysis, we have to have an IO load of at least 1 continuous minute so that Zabbix can collect the correct information.

Cheers!

Em ter., 13 de fev. de 2024 às 13:44, Patrick Dubois via Users <users@ovirt.org> escreveu:
For detailed monitoring I use Zabbix.  This way I get detailed metrics
on my hypervisors, VMs as well as my network storage.

If a machine starts generating large IO I get alerts highlighting the
responsible machine as well as the impacted services.  For example,  you
might get high IO on a VM but also the correlated high latency on
systems sharing the storage.

Sometimes users will report the the high latency, masking the real
problem so it's nice to have a holistic view of the entire environment.

Patrick.Dubois

On 2024-02-13 11:19, marek wrote:
> hi,
>
> i have prometheus based ovirt hosts monitoring (node_exporter,
> smartcl_exporter, ipmi_exporter)
>
> https://prometheus-community.github.io/ansible/branch/main/ and alerts
> from https://samber.github.io/awesome-prometheus-alerts/
>
> after i started this monitoring  i found that one VM is overloading
> local storage (so i must check IO limiting documentation as a homework
> :) )
>
> but my question is
>
> how do you monitor IO traffic per VM? (IOPS, read/write traffic,..)
>
> some qemu/libvirt exporter? some custom text file + node_exporter?
>
> thanks for tips
>
> Marek
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/6HVHFX464QJPJTVXUFCF7RAGAUFD33HE/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/L4SU7YZ52PO4FPCFBF4NWP6LE67ERSX2/


--
Att,
Jorge Visentini
+55 55 98432-9868