
On Fri, Jan 15, 2016 at 11:22 AM, Francesco Romani <fromani@redhat.com> wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "devel" <devel@ovirt.org>, "Francesco Romani" <fromani@redhat.com> Sent: Friday, January 15, 2016 9:52:13 AM Subject: Re: [VDSM] health: Introduce Vdsm health monitoring
I tested with https://gerrit.ovirt.org/#/c/51630
libvirt.virDomain object does not leak any more, but pthreading object are still leaking, and the number of uncollectible objects is still very high (12,915).
Each time we start and stop a vm, we leak about 6000 objects which are part of a reference cycle, and each time we create and leak 27 pthreading locks/conditions which are probably the reason for this leak (because they implement __del__ or reference object implementing __del__.
*Surely* this deserve more investigation, *most likley* there is more to be fixed, and *surely* I'll invest more energy in research, but I think we should also apply some grains of salt to the raw results gc gives us. I've been running a test overnight like
- start 20 vms - sleep 2m - stop 20 vms - sleep 30s
repeated 100 times, monitoring VmRSS (/proc/$VDSM_PID/status) and I didn't observed any _significant_ leak. The VmRSS growth was pretty low, almost flat. So something doesn't click here.
I mean, 6000 objects (at least) per vm each time *should* be noticeable after all!
Once 51630 is in good shape I'll broad the net and go hunt for more virt leaks.
After removing the gc.DEBUG_SAVEALL, we see now only the libvirt.virDomain leak. Turns out that the pthreading objects leak was caused by saving all the unreachable objects when using gc.DEBUG_SAVEALL. Here is a run with https://gerrit.ovirt.org/51708, but without https://gerrit.ovirt.org/51630 # tail -f /var/log/vdsm/vdsm.log | grep health MainThread::INFO::2016-01-15 16:14:14,789::health::58::health::(start) Starting health monitor (interval=60) Thread-15::DEBUG::2016-01-15 16:14:14,793::health::70::health::(_run) Health monitor started Thread-15::DEBUG::2016-01-15 16:15:14,794::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:15:14,815::health::86::health::(_check) Collected 54 objects activating host... Thread-15::DEBUG::2016-01-15 16:16:14,815::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:16:14,837::health::86::health::(_check) Collected 299 objects Thread-15::DEBUG::2016-01-15 16:17:14,838::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:17:14,858::health::86::health::(_check) Collected 128 objects starting vm... Thread-15::DEBUG::2016-01-15 16:18:14,858::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:18:14,883::health::86::health::(_check) Collected 1771 objects Thread-15::DEBUG::2016-01-15 16:19:14,884::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:19:14,909::health::86::health::(_check) Collected 167 objects stopping vm... Thread-15::DEBUG::2016-01-15 16:20:14,909::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:20:14,934::health::86::health::(_check) Collected 179 objects vm is down... Thread-15::DEBUG::2016-01-15 16:21:14,934::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:21:14,960::health::86::health::(_check) Collected 2284 objects Thread-15::WARNING::2016-01-15 16:21:14,960::health::94::health::(_check) Found 1 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>'] starting vm... Thread-15::DEBUG::2016-01-15 16:22:14,961::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:22:14,987::health::86::health::(_check) Collected 1721 objects Thread-15::WARNING::2016-01-15 16:22:14,987::health::94::health::(_check) Found 1 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>'] stopping vm... Thread-15::DEBUG::2016-01-15 16:23:14,988::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:23:15,014::health::86::health::(_check) Collected 189 objects Thread-15::WARNING::2016-01-15 16:23:15,014::health::94::health::(_check) Found 1 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>'] Thread-15::DEBUG::2016-01-15 16:24:15,014::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:24:15,040::health::86::health::(_check) Collected 2240 objects Thread-15::WARNING::2016-01-15 16:24:15,041::health::94::health::(_check) Found 2 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>', '<libvirt.virDomain object at 0x7f79ec10cf90>'] vm is down here (probbaly was done before the log) Thread-15::DEBUG::2016-01-15 16:25:15,041::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:25:15,065::health::86::health::(_check) Collected 128 objects Thread-15::WARNING::2016-01-15 16:25:15,065::health::94::health::(_check) Found 2 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>', '<libvirt.virDomain object at 0x7f79ec10cf90>'] starting vm... Thread-15::DEBUG::2016-01-15 16:26:15,067::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:26:15,094::health::86::health::(_check) Collected 1749 objects Thread-15::WARNING::2016-01-15 16:26:15,094::health::94::health::(_check) Found 2 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>', '<libvirt.virDomain object at 0x7f79ec10cf90>'] stopping vm... Thread-15::DEBUG::2016-01-15 16:27:15,096::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:27:15,119::health::86::health::(_check) Collected 167 objects Thread-15::WARNING::2016-01-15 16:27:15,120::health::94::health::(_check) Found 2 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>', '<libvirt.virDomain object at 0x7f79ec10cf90>'] vm is shutting down... Thread-15::DEBUG::2016-01-15 16:28:15,120::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:28:15,144::health::86::health::(_check) Collected 2262 objects Thread-15::WARNING::2016-01-15 16:28:15,144::health::94::health::(_check) Found 3 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>', '<libvirt.virDomain object at 0x7f79ec10cf90>', '<libvirt.virDomain object at 0x1f0a2d0>'] vm is down... Thread-15::DEBUG::2016-01-15 16:29:15,145::health::84::health::(_check) Checking health Thread-15::DEBUG::2016-01-15 16:29:15,167::health::86::health::(_check) Collected 128 objects Thread-15::WARNING::2016-01-15 16:29:15,167::health::94::health::(_check) Found 3 uncollectable objects: ['<libvirt.virDomain object at 0x1ef6550>', '<libvirt.virDomain object at 0x7f79ec10cf90>', '<libvirt.virDomain object at 0x1f0a2d0>'] When running with both https://gerrit.ovirt.org/51708 and https://gerrit.ovirt.org/51630, no object leak because of reference cycles and __del__. I'll abandon https://gerrit.ovirt.org/51868 since it is not needed. Nir