On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
> Chris Adams <cma(a)cmadams.net> writes:
>
>> Once upon a time, Sven Kieske <s.kieske(a)mittwald.de> said:
>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>> We also face this problem since 3.5 in two different installations...
>>>> Hope it's fixed soon
>>>
>>> Nothing will get fixed if no one bothers to
>>> open BZs and send relevants log files to help
>>> track down the problems.
>>
>> There's already an open BZ:
>>
>>
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>
>> I'm not sure if that is exactly the same problem I'm seeing or not; my
>> vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
>> period just now; VSZ didn't change).
>
> For those following this I've added a comment on the bz [1], although in
> my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
> in the original bug report by Daniel Helgenberger .
>
> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
That's interesting (and worrying).
Could you check your suggestion by editing sampling.py so that
_get_interfaces_and_samples() returns the empty dict immediately?
Would this make the leak disappear?
Looks like you’ve got something there. Just a quick test for now, watching RSS in top.
I’ll let it go this way for a while and see what it looks in a few hours.
System 1: 13 VMs w/ 24 interfaces between them
11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
11:47: 97xxx
11:57 135544 and climbing
12:00 136400
restarted with sampling.py modified to just return empty set:
def _get_interfaces_and_samples():
links_and_samples = {}
return links_and_samples
12:02 quickly grew to 127694
12:13: 133352
12:20: 132476
12:31: 132732
12:40: 132656
12:50: 132800
1:30: 133928
1:40: 133136
1:50: 133116
2:00: 133128
interestingly, it looks like overall system load dropped significantly (from ~40-45% to
10% reported). mostly ksmd getting out of the way after freeing 9G, but feels like more
than that. (this is a 6 core system, usually saw ksmd using ~80% of a single cpu, roughly
15% of the total available)
Second system, 10 Vms w/ 17 interfaces
vdsmd @ 5.027G RSS (slightly less uptime that previous host) freeing this ram caused a
~16% utilization drop as ksmd stopped running as hard.
restarted at 12:10
12:10: 106224
12:20: 111220
12:31: 114616
12:40: 117500
12:50: 120504
1:30: 133040
1:40: 136140
1:50: 139032
2:00: 142292