[ovirt-users] VDSM memory consumption
Darrell Budic
budic at onholyground.com
Wed Mar 25 18:29:25 UTC 2015
> On Mar 25, 2015, at 5:34 AM, Dan Kenigsberg <danken at redhat.com> wrote:
>
> On Tue, Mar 24, 2015 at 02:01:40PM -0500, Darrell Budic wrote:
>>
>>> On Mar 24, 2015, at 4:33 AM, Dan Kenigsberg <danken at redhat.com> wrote:
>>>
>>> On Mon, Mar 23, 2015 at 04:00:14PM -0400, John Taylor wrote:
>>>> Chris Adams <cma at cmadams.net> writes:
>>>>
>>>>> Once upon a time, Sven Kieske <s.kieske at mittwald.de> said:
>>>>>> On 13/03/15 12:29, Kapetanakis Giannis wrote:
>>>>>>> We also face this problem since 3.5 in two different installations...
>>>>>>> Hope it's fixed soon
>>>>>>
>>>>>> Nothing will get fixed if no one bothers to
>>>>>> open BZs and send relevants log files to help
>>>>>> track down the problems.
>>>>>
>>>>> There's already an open BZ:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>>>
>>>>> I'm not sure if that is exactly the same problem I'm seeing or not; my
>>>>> vdsm process seems to be growing faster (RSS grew 952K in a 5 minute
>>>>> period just now; VSZ didn't change).
>>>>
>>>> For those following this I've added a comment on the bz [1], although in
>>>> my case the memory leak is, like Chris Adams, a lot more than the 300KiB/h
>>>> in the original bug report by Daniel Helgenberger .
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
>>>
>>> That's interesting (and worrying).
>>> Could you check your suggestion by editing sampling.py so that
>>> _get_interfaces_and_samples() returns the empty dict immediately?
>>> Would this make the leak disappear?
>>
>> Looks like you’ve got something there. Just a quick test for now, watching RSS in top. I’ll let it go this way for a while and see what it looks in a few hours.
>>
>> System 1: 13 VMs w/ 24 interfaces between them
>>
>> 11:47 killed a vdsm @ 9.116G RSS (after maybe a week and a half running)
>>
>> 11:47: 97xxx
>> 11:57 135544 and climbing
>> 12:00 136400
>>
>> restarted with sampling.py modified to just return empty set:
>>
>> def _get_interfaces_and_samples():
>> links_and_samples = {}
>> return links_and_samples
>
> Thanks for the input. Just to be a little more certain that the culprit
> is _get_interfaces_and_samples() per se, would you please decorate it
> with memoized, and add a log line in the end
>
> @utils.memoized # add this line
> def _get_interfaces_and_samples():
> ...
> logging.debug('LINKS %s', links_and_samples) ## and this line
> return links_and_samples
>
> I'd like to see what happens when the function is run only once, and
> returns a non-empty reasonable dictionary of links and samples.
Looks similar, I modified my second server for this test:
12:25, still growing from yesterday: 544512
restarted with mods for logging and memoize:
stabilized @ 12:32: 114284
1:23: 115300
Thread-12::DEBUG::2015-03-25 12:28:08,080::sampling::243::root::(_get_interfaces_and_samples) LINKS {'vnet18': <virt.sampling.InterfaceSample instance at 0x7f38c03e85f0>, 'vnet19': <virt.sampling.InterfaceSample instance at 0x7f38b42cbcf8>, 'bond0': <virt.sampling.InterfaceSample instance at 0x7f38b429afc8>, 'vnet13': <virt.sampling.InterfaceSample instance at 0x7f38b42c8680>, 'vnet16': <virt.sampling.InterfaceSample instance at 0x7f38b42cb368>, 'private': <virt.sampling.InterfaceSample instance at 0x7f38b42b8bd8>, 'bond0.100': <virt.sampling.InterfaceSample instance at 0x7f38b42bdd88>, 'vnet0': <virt.sampling.InterfaceSample instance at 0x7f38b42c1f80>, 'enp3s0': <virt.sampling.InterfaceSample instance at 0x7f38b429cef0>, 'vnet2': <virt.sampling.InterfaceSample instance at 0x7f38b42bbbd8>, 'vnet3': <virt.sampling.InterfaceSample instance at 0x7f38b42c37e8>, 'vnet4': <virt.sampling.InterfaceSample instance at 0x7f38b42c5518>, 'vnet5': <virt.sampling.InterfaceSample instance at 0x7f38b42c6ab8>, 'vnet6': <virt.sampling.InterfaceSample instance at 0x7f38b42c7248>, 'vnet7': <virt.sampling.InterfaceSample instance at 0x7f38c03e7a28>, 'vnet8': <virt.sampling.InterfaceSample instance at 0x7f38b42c7c20>, 'bond0.1100': <virt.sampling.InterfaceSample instance at 0x7f38b42be710>, 'bond0.1103': <virt.sampling.InterfaceSample instance at 0x7f38b429dc68>, 'ovirtmgmt': <virt.sampling.InterfaceSample instance at 0x7f38b42b16c8>, 'lo': <virt.sampling.InterfaceSample instance at 0x7f38b429a8c0>, 'vnet22': <virt.sampling.InterfaceSample instance at 0x7f38c03e7128>, 'vnet21': <virt.sampling.InterfaceSample instance at 0x7f38b42cd368>, 'vnet20': <virt.sampling.InterfaceSample instance at 0x7f38b42cc7a0>, 'internet': <virt.sampling.InterfaceSample instance at 0x7f38b42aa098>, 'bond0.1203': <virt.sampling.InterfaceSample instance at 0x7f38b42aa8c0>, 'bond0.1223': <virt.sampling.InterfaceSample instance at 0x7f38b42bb128>, ‘XXXXXXXXXXX': <virt.sampling.InterfaceSample instance at 0x7f38b42bee60>, ‘XXXXXXX': <virt.sampling.InterfaceSample instance at 0x7f38b42beef0>, ';vdsmdummy;': <virt.sampling.InterfaceSample instance at 0x7f38b42bdc20>, 'vnet14': <virt.sampling.InterfaceSample instance at 0x7f38b42ca050>, 'mgmt': <virt.sampling.InterfaceSample instance at 0x7f38b42be248>, 'vnet15': <virt.sampling.InterfaceSample instance at 0x7f38b42cab00>, 'enp2s0': <virt.sampling.InterfaceSample instance at 0x7f38b429c200>, 'bond0.1110': <virt.sampling.InterfaceSample instance at 0x7f38b42bed40>, 'vnet1': <virt.sampling.InterfaceSample instance at 0x7f38b42c27e8>, 'bond0.1233': <virt.sampling.InterfaceSample instance at 0x7f38b42bedd0>, 'bond0.1213': <virt.sampling.InterfaceSample instance at 0x7f38b42b2128>}
Didn’t see the significant CPU use difference on this one, so thinking it was all ksmd on yesterdays tests.
Yesterdays test is still going, and still hovering around 135016 or so.
More information about the Users
mailing list