On Mon, May 7, 2018 at 6:48 PM Chris Adams <cma(a)cmadams.net> wrote:
I have a problem with a memory leak in vdsm. I have a dev cluster
that
right now is:
- two nodes
- CentOS 7.4 (up to date)
- oVirt 4.2.2 (installed as 3.5 and upgraded version by version)
- hosted engine (no other running VM at the moment)
- iSCSI storage
I have a script that writes the vdsm RSS to a file every five minutes,
and on the node holding the hosted engine, vdsm RSS grows around
300-1500KB every snapshot.
I maintain several oVirt clusters for others, and they all seem to have
this problem. The production clusters are all still on oVirt 4.1, but
they all have this problem too, so I guess it is something about how I
set them up? On a couple I just checked, the vdsm RSS is over 1G.
Any tips on instrumenting vdsm to track this down? I am unfortunately
only passingly familiar with python (I can make small changes, but not
knowledgeable enough to figure this out).
To debug these issues, you should enable the health monitor by creating
a drop-in configuration file:
$ cat /etc/vdsm/vdsm.conf.d/health.conf
[devel]
health_monitor_enable = true
And restart vdsm to start the health monitor.
The health logs are using DEBUG level so you need to enable
DEBUG level for the "health" logger. You can do this with:
$ vdsm-client Host setLogLevel level=DEBUG name=health
Or by adding new logger configuration to /etc/vdsm/logger.conf:
1. add health logger to [loggers]
[loggers]
keys=root,vds,storage,virt,ovirt_hosted_engine_ha,ovirt_hosted_engine_ha_config,IOProcess,devel,health
2. add [logger_health] section
[logger_health]
level=DEBUG
handlers=logthread
qualname=health
propagate=0
Finally post here the [health] logs from vdsm.log.
Nir