
On Mar 26, 2015, at 6:42 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Wed, Mar 25, 2015 at 01:29:25PM -0500, Darrell Budic wrote:
On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken@redhat.com> wrote:
On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
Chris Adams <cma@cmadams.net> writes:
> Once upon a time, Sven Kieske <s.kieske@mittwald.de> said: >> On 13/03/15 12:29, Kapetanakis Giannis wrote: >>> We also face this problem since 3.5 in two different installations... >>> Hope it's fixed soon >> >> Nothing will get fixed if no one bothers to >> open BZs and send relevants log files to help >> track down the problems. > > There's already an open BZ: > > https://bugzilla.redhat.com/show_bug.cgi?id=1158108 > > I'm not sure if that is exactly the same problem I'm seeing or not; my > vdsm process seems to be growing faster (RSS grew 952K in a 5 minute > period just now; VSZ didn't change).
For those following this I've added a comment on the bz [1], although in my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h in the original bug report by Daniel Helgenberger .
That's interesting (and worrying). Could you check your suggestion by editing sampling.py so that _get_interfaces_and_samples() returns the empty dict immediately? Would this make the leak disappear?
Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours.
System 1: 13 VMs w/ 24 interfaces between them
11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
11:47: 97xxx 11:57 135544 and climbing 12:00 136400
restarted with sampling.py modified to just return empty set:
def _get_interfaces_and_samples(): links_and_samples = {} return links_and_samples
Thanks for the input. Just to be a little more certain that the culprit is _get_interfaces_and_samples() per se, would you please decorate it with memoized, and add a log line in the end
@utils.memoized # add this line def _get_interfaces_and_samples(): ... logging.debug('LINKS %s', links_and_samples) ## and this line return links_and_samples
I'd like to see what happens when the function is run only once, and returns a non-empty reasonable dictionary of links and samples.
Looks similar, I modified my second server for this test:
Thanks again. Would you be kind to search further? Does the following script leak anything on your host, when placed in your /usr/share/vdsm:
#!/usr/bin/python
from time import sleep from virt.sampling import _get_interfaces_and_samples
while True: _get_interfaces_and_samples() sleep(0.2)
Something that can be a bit harder would be to: # service vdsmd stop # su - vdsm -s /bin/bash # cd /usr/share/vdsm # valgrind --leak-check=full --log-file=/tmp/your.log vdsm
as suggested by Thomas on https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c6
Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 26744 a minute in, about 5 minutes later it’s at 39384 and climbing. Been abusing a production server for those simple tests, but didn’t want to run valgrind against it right this minute. Did run it against the test.py script above though, got this (fpaste.org didn’t like, too long maybe?): http://tower.onholyground.com/valgrind-test.log To comment on some other posts in this thread, I also see leaks on my test system which is running Centos 6.6, but it only has 3 VMs across 2 servers and 3 configured networks and it leaks MUCH slower. I suspect people don’t notice this on test systems because they don’t have a lot of VMs/interfaces running, and don’t leave them up for weeks at a time. That’s why I was running these tests on my production box, to have more VMs up.