
On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
Chris Adams <cma@cmadams.net> writes:
Once upon a time, Sven Kieske <s.kieske@mittwald.de> said:
On 13/03/15 12:29, Kapetanakis Giannis wrote:
We also face this problem since 3.5 in two different installations... Hope it's fixed soon
Nothing will get fixed if no one bothers to open BZs and send relevants log files to help track down the problems.
There's already an open BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
I'm not sure if that is exactly the same problem I'm seeing or not; my vdsm process seems to be growing faster (RSS grew 952K in a 5 minute period just now; VSZ didn't change).
For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger .
That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear?
Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours. System 1: 13 VMs w/ 24 interfaces between them 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running) 11:47: 97xxx 11:57 135544 and climbing 12:00 136400 restarted with sampling.py modified to just return empty set: def _get_interfaces_and_samples(): links_and_samples = {} return links_and_samples 12:02 quickly grew to 127694 12:13: 133352 12:20: 132476 12:31: 132732 12:40: 132656 12:50: 132800 1:30: 133928 1:40: 133136 1:50: 133116 2:00: 133128 interestingly, it looks like overall system load dropped significantly (from ~40-45% to 10% reported). mostly ksmd getting out of the way after freeing 9G, but feels like more than that. (this is a 6 core system, usually saw ksmd using ~80% of a single cpu, roughly 15% of the total available) Second system, 10 Vms w/ 17 interfaces vdsmd @ 5.027G RSS (slightly less uptime that previous host) freeing this ram caused a ~16% utilization drop as ksmd stopped running as hard. restarted at 12:10 12:10: 106224 12:20: 111220 12:31: 114616 12:40: 117500 12:50: 120504 1:30: 133040 1:40: 136140 1:50: 139032 2:00: 142292