On Mar 26, 2015, at 6:42 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
On Wed, Mar 25, 2015 at 01:29:25PM -0500, Darrell Budic wrote:
>
>> On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
>>
>> On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
>>>
>>>> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
>>>>
>>>> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
>>>>> Chris Adams <cma(a)cmadams.net> writes:
>>>>>
>>>>>> Once upon a time, Sven Kieske <s.kieske(a)mittwald.de> said:
>>>>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>>>>>> We also face this problem since 3.5 in two different
installations...
>>>>>>>> Hope it's fixed soon
>>>>>>>
>>>>>>> Nothing will get fixed if no one bothers to
>>>>>>> open BZs and send relevants log files to help
>>>>>>> track down the problems.
>>>>>>
>>>>>> There's already an open BZ:
>>>>>>
>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>>>>
>>>>>> I'm not sure if that is exactly the same problem I'm
seeing or not; my
>>>>>> vdsm process seems to be growing faster (RSS grew 952K in a 5
minute
>>>>>> period just now; VSZ didn't change).
>>>>>
>>>>> For those following this I've added a comment on the bz [1],
although in
>>>>> my case the memory leak is, like Chris Adams, a lot more than the
300KiB/h
>>>>> in the original bug report by Daniel Helgenberger .
>>>>>
>>>>> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>>
>>>> That's interesting (and worrying).
>>>> Could you check your suggestion by editing sampling.py so that
>>>> _get_interfaces_and_samples() returns the empty dict immediately?
>>>> Would this make the leak disappear?
>>>
>>> Looks like you’ve got something there. Just a quick test for now, watching
RSS in top. I’ll let it go this way for a while and see what it looks in a few hours.
>>>
>>> System 1: 13 VMs w/ 24 interfaces between them
>>>
>>> 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
>>>
>>> 11:47: 97xxx
>>> 11:57 135544 and climbing
>>> 12:00 136400
>>>
>>> restarted with sampling.py modified to just return empty set:
>>>
>>> def _get_interfaces_and_samples():
>>> links_and_samples = {}
>>> return links_and_samples
>>
>> Thanks for the input. Just to be a little more certain that the culprit
>> is _get_interfaces_and_samples() per se, would you please decorate it
>> with memoized, and add a log line in the end
>>
>> @utils.memoized # add this line
>> def _get_interfaces_and_samples():
>> ...
>> logging.debug('LINKS %s', links_and_samples) ## and this line
>> return links_and_samples
>>
>> I'd like to see what happens when the function is run only once, and
>> returns a non-empty reasonable dictionary of links and samples.
>
> Looks similar, I modified my second server for this test:
Thanks again. Would you be kind to search further?
Does the following script leak anything on your host, when placed in your
/usr/share/vdsm:
#!/usr/bin/python
from time import sleep
from virt.sampling import _get_interfaces_and_samples
while True:
_get_interfaces_and_samples()
sleep(0.2)
Something that can be a bit harder would be to:
# service vdsmd stop
# su - vdsm -s /bin/bash
# cd /usr/share/vdsm
# valgrind --leak-check=full --log-file=/tmp/your.log vdsm
as suggested by Thomas on
https://bugzilla.redhat.com/show_bug.cgi?id=1158108#c6
Yes, this script leaks quickly. Started out at a RSS of 21000ish, already at 26744 a
minute in, about 5 minutes later it’s at 39384 and climbing.
Been abusing a production server for those simple tests, but didn’t want to run valgrind
against it right this minute. Did run it against the test.py script above though, got this
(
To comment on some other posts in this thread, I also see leaks on my test system which is
running Centos 6.6, but it only has 3 VMs across 2 servers and 3 configured networks and
it leaks MUCH slower. I suspect people don’t notice this on test systems because they
don’t have a lot of VMs/interfaces running, and don’t leave them up for weeks at a time.
That’s why I was running these tests on my production box, to have more VMs up.