On Fri, Jan 15, 2016 at 11:22 AM, Francesco Romani <fromani(a)redhat.com> wrote:
----- Original Message -----
> From: "Nir Soffer" <nsoffer(a)redhat.com>
> To: "devel" <devel(a)ovirt.org>, "Francesco Romani"
<fromani(a)redhat.com>
> Sent: Friday, January 15, 2016 9:52:13 AM
> Subject: Re: [VDSM] health: Introduce Vdsm health monitoring
>
> I tested with
https://gerrit.ovirt.org/#/c/51630
>
> libvirt.virDomain object does not leak any more, but pthreading object
> are still leaking,
> and the number of uncollectible objects is still very high (12,915).
>
> Each time we start and stop a vm, we leak about 6000 objects which are part
> of a reference cycle, and each time we create and leak 27 pthreading
> locks/conditions
> which are probably the reason for this leak (because they implement
> __del__ or reference object
> implementing __del__.
*Surely* this deserve more investigation, *most likley* there is more to be fixed,
and *surely* I'll invest more energy in research, but I think we should also apply
some grains of salt to the raw results gc gives us.
I've been running a test overnight like
- start 20 vms
- sleep 2m
- stop 20 vms
- sleep 30s
repeated 100 times, monitoring VmRSS (/proc/$VDSM_PID/status) and I didn't observed
any _significant_ leak.
The VmRSS growth was pretty low, almost flat. So something doesn't click here.
I mean, 6000 objects (at least) per vm each time *should* be noticeable after all!
Once 51630 is in good shape I'll broad the net and go hunt for more virt leaks.
After removing the gc.DEBUG_SAVEALL, we see now only the libvirt.virDomain leak.
Turns out that the pthreading objects leak was caused by saving all
the unreachable
objects when using gc.DEBUG_SAVEALL.
Here is a run with
https://gerrit.ovirt.org/51708, but without
https://gerrit.ovirt.org/51630
# tail -f /var/log/vdsm/vdsm.log | grep health
MainThread::INFO::2016-01-15 16:14:14,789::health::58::health::(start)
Starting health monitor (interval=60)
Thread-15::DEBUG::2016-01-15 16:14:14,793::health::70::health::(_run)
Health monitor started
Thread-15::DEBUG::2016-01-15
16:15:14,794::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:15:14,815::health::86::health::(_check) Collected 54 objects
activating host...
Thread-15::DEBUG::2016-01-15
16:16:14,815::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:16:14,837::health::86::health::(_check) Collected 299 objects
Thread-15::DEBUG::2016-01-15
16:17:14,838::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:17:14,858::health::86::health::(_check) Collected 128 objects
starting vm...
Thread-15::DEBUG::2016-01-15
16:18:14,858::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:18:14,883::health::86::health::(_check) Collected 1771 objects
Thread-15::DEBUG::2016-01-15
16:19:14,884::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:19:14,909::health::86::health::(_check) Collected 167 objects
stopping vm...
Thread-15::DEBUG::2016-01-15
16:20:14,909::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:20:14,934::health::86::health::(_check) Collected 179 objects
vm is down...
Thread-15::DEBUG::2016-01-15
16:21:14,934::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:21:14,960::health::86::health::(_check) Collected 2284 objects
Thread-15::WARNING::2016-01-15
16:21:14,960::health::94::health::(_check) Found 1 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>']
starting vm...
Thread-15::DEBUG::2016-01-15
16:22:14,961::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:22:14,987::health::86::health::(_check) Collected 1721 objects
Thread-15::WARNING::2016-01-15
16:22:14,987::health::94::health::(_check) Found 1 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>']
stopping vm...
Thread-15::DEBUG::2016-01-15
16:23:14,988::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:23:15,014::health::86::health::(_check) Collected 189 objects
Thread-15::WARNING::2016-01-15
16:23:15,014::health::94::health::(_check) Found 1 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>']
Thread-15::DEBUG::2016-01-15
16:24:15,014::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:24:15,040::health::86::health::(_check) Collected 2240 objects
Thread-15::WARNING::2016-01-15
16:24:15,041::health::94::health::(_check) Found 2 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>',
'<libvirt.virDomain object at 0x7f79ec10cf90>']
vm is down here (probbaly was done before the log)
Thread-15::DEBUG::2016-01-15
16:25:15,041::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:25:15,065::health::86::health::(_check) Collected 128 objects
Thread-15::WARNING::2016-01-15
16:25:15,065::health::94::health::(_check) Found 2 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>',
'<libvirt.virDomain object at 0x7f79ec10cf90>']
starting vm...
Thread-15::DEBUG::2016-01-15
16:26:15,067::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:26:15,094::health::86::health::(_check) Collected 1749 objects
Thread-15::WARNING::2016-01-15
16:26:15,094::health::94::health::(_check) Found 2 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>',
'<libvirt.virDomain object at 0x7f79ec10cf90>']
stopping vm...
Thread-15::DEBUG::2016-01-15
16:27:15,096::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:27:15,119::health::86::health::(_check) Collected 167 objects
Thread-15::WARNING::2016-01-15
16:27:15,120::health::94::health::(_check) Found 2 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>',
'<libvirt.virDomain object at 0x7f79ec10cf90>']
vm is shutting down...
Thread-15::DEBUG::2016-01-15
16:28:15,120::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:28:15,144::health::86::health::(_check) Collected 2262 objects
Thread-15::WARNING::2016-01-15
16:28:15,144::health::94::health::(_check) Found 3 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>',
'<libvirt.virDomain object at 0x7f79ec10cf90>', '<libvirt.virDomain
object at 0x1f0a2d0>']
vm is down...
Thread-15::DEBUG::2016-01-15
16:29:15,145::health::84::health::(_check) Checking health
Thread-15::DEBUG::2016-01-15
16:29:15,167::health::86::health::(_check) Collected 128 objects
Thread-15::WARNING::2016-01-15
16:29:15,167::health::94::health::(_check) Found 3 uncollectable
objects: ['<libvirt.virDomain object at 0x1ef6550>',
'<libvirt.virDomain object at 0x7f79ec10cf90>', '<libvirt.virDomain
object at 0x1f0a2d0>']
When running with both
https://gerrit.ovirt.org/51708 and
https://gerrit.ovirt.org/51630,
no object leak because of reference cycles and __del__.
I'll abandon
https://gerrit.ovirt.org/51868 since it is not needed.
Nir