> On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
>
> On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
>>
>>> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
>>>
>>> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
>>>> Chris Adams <cma(a)cmadams.net> writes:
>>>>
>>>>> Once upon a time, Sven Kieske <s.kieske(a)mittwald.de> said:
>>>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>>>>> We also face this problem since 3.5 in two different
installations...
>>>>>>> Hope it's fixed soon
>>>>>>
>>>>>> Nothing will get fixed if no one bothers to
>>>>>> open BZs and send relevants log files to help
>>>>>> track down the problems.
>>>>>
>>>>> There's already an open BZ:
>>>>>
>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>>>
>>>>> I'm not sure if that is exactly the same problem I'm seeing
or not; my
>>>>> vdsm process seems to be growing faster (RSS grew 952K in a 5
minute
>>>>> period just now; VSZ didn't change).
>>>>
>>>> For those following this I've added a comment on the bz [1],
although in
>>>> my case the memory leak is, like Chris Adams, a lot more than the
300KiB/h
>>>> in the original bug report by Daniel Helgenberger .
>>>>
>>>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>
>>> That's interesting (and worrying).
>>> Could you check your suggestion by editing sampling.py so that
>>> _get_interfaces_and_samples() returns the empty dict immediately?
>>> Would this make the leak disappear?
>>
>> Looks like you’ve got something there. Just a quick test for now, watching RSS
in top. I’ll let it go this way for a while and see what it looks in a few hours.
>>
>> System 1: 13 VMs w/ 24 interfaces between them
>>
>> 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
>>
>> 11:47: 97xxx
>> 11:57 135544 and climbing
>> 12:00 136400
>>
>> restarted with sampling.py modified to just return empty set:
>>
>> def _get_interfaces_and_samples():
>> links_and_samples = {}
>> return links_and_samples
>
> Thanks for the input. Just to be a little more certain that the culprit
> is _get_interfaces_and_samples() per se, would you please decorate it
> with memoized, and add a log line in the end
>
> @utils.memoized # add this line
> def _get_interfaces_and_samples():
> ...
> logging.debug('LINKS %s', links_and_samples) ## and this line
> return links_and_samples
>
> I'd like to see what happens when the function is run only once, and
> returns a non-empty reasonable dictionary of links and samples.
Looks similar, I modified my second server for this test:
Thanks again. Would you be kind to search further?
Does the following script leak anything on your host, when placed in your
/usr/share/vdsm:
#!/usr/bin/python
from time import sleep
from virt.sampling import _get_interfaces_and_samples
while True:
_get_interfaces_and_samples()
sleep(0.2)
Something that can be a bit harder would be to:
# service vdsmd stop
# su - vdsm -s /bin/bash
# cd /usr/share/vdsm
# valgrind --leak-check=full --log-file=/tmp/your.log vdsm
as suggested by Thomas on