[ovirt-users] VDSM memory consumption

Darrell Budic budic at onholyground.com
Tue Mar 24 19:01:40 UTC 2015


> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken at redhat.com> wrote:
> 
> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
>> Chris Adams <cma at cmadams.net> writes:
>> 
>>> Once upon a time, Sven Kieske <s.kieske at mittwald.de> said:
>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>>> We also face this problem since 3.5 in two different installations...
>>>>> Hope it's fixed soon
>>>> 
>>>> Nothing will get fixed if no one bothers to
>>>> open BZs and send relevants log files to help
>>>> track down the problems.
>>> 
>>> There's already an open BZ:
>>> 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>> 
>>> I'm not sure if that is exactly the same problem I'm seeing or not; my
>>> vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
>>> period just now; VSZ didn't change).
>> 
>> For those following this I've added a comment on the bz [1], although in
>> my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
>> in the original bug report by Daniel Helgenberger .
>> 
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
> 
> That's interesting (and worrying).
> Could you check your suggestion by editing sampling.py so that
> _get_interfaces_and_samples() returns the empty dict immediately?
> Would this make the leak disappear?

Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours.

System 1: 13 VMs w/ 24 interfaces between them

11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)

11:47: 97xxx
11:57 135544 and climbing
12:00 136400

restarted with sampling.py modified to just return empty set:

def _get_interfaces_and_samples():
    links_and_samples = {}
    return links_and_samples

12:02 quickly grew to 127694
12:13: 133352
12:20: 132476
12:31: 132732
12:40: 132656
12:50: 132800
1:30: 133928
1:40: 133136
1:50: 133116
2:00: 133128

interestingly, it looks like overall system load dropped significantly (from ~40-45% to 10% reported). mostly ksmd getting out of the way after freeing 9G, but feels like more than that. (this is a 6 core system, usually saw ksmd using ~80% of a single cpu, roughly 15% of the total available)


Second system, 10 Vms w/ 17 interfaces

vdsmd @ 5.027G RSS (slightly less uptime that previous host) freeing this ram caused a ~16% utilization drop as ksmd stopped running as hard.

restarted at 12:10

12:10: 106224
12:20: 111220
12:31: 114616
12:40: 117500
12:50: 120504
1:30: 133040
1:40: 136140
1:50: 139032
2:00: 142292






More information about the Users mailing list