Finally got a chance to implement this, so testing this on my centos7 hosts, and it looks
good. I’ll keep eye on it for a couple days, but after a couple of hours, there’s no
evidence of any leakage.
On Mar 30, 2015, at 4:14 PM, John Taylor <jtt77777(a)yahoo.com>
Dan Kenigsberg <danken(a)redhat.com> writes:
> On Sat, Mar 28, 2015 at 10:20:25AM -0400, John Taylor wrote:
>> Daniel Helgenberger <daniel.helgenberger(a)m-box.de> writes:
>>> Hello Everyone,
>>> I did create the original BZ on this. In the mean time, lab system I
>>> used is dismantled and the production system is yet to deploy.
>>> As I wrote in BZ1147148 , I experienced two different issues. One,
>>> one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
>>> The larger leak was indeed related to SSL in some way; not necessarily
>>> M2Crypto. However, after disabling SSL this was gone leaving the smaller
>>>  https://bugzilla.redhat.com/show_bug.cgi?id=1147148
>> I think there are, at least for the purpose of this discussion, 3 leaks:
>> 1. the M2Crypto leak
>> 2. a slower leak
>> 3. a large leak that's not M2Crypto related that's part of sampling
>> My efforts have been around finding the source of my larger leak, which
>> I think is #3. I had disabled ssl so I knew that M2Crypto
>> isn't/shouldn't be the problem as in bz1147148, and ssl is beside the
>> point as it happens with a deactived host. It's part of sampling which
>> always runs.
>> What I've found is, after trying to get the smallest reproducer, that
>> it's not the netlink.iter_links that I commented on in  that is the
>> problem. But in the _get_intefaces_and_samples loop is the call to
>> create an InterfaceSample and that has getLinkSpeed() which, for vlans,
>> ends up calling ipwrapper.getLink, and that to
>> netlink.get_link(name) *is* the source of my big leak. This is vdsm
>> 4.16.10, so it is  and it's been changed in master for the removal of
>> support for libnl v1 so it might not be a problem anymore.
>> def get_link(name):
>> """Returns the information dictionary of the name specified
>> with _pool.socket() as sock:
>> with _nl_link_cache(sock) as cache:
>> link = _rtnl_link_get_by_name(cache, name)
>> if not link:
>> raise IOError(errno.ENODEV, '%s is not present in the
>> return _link_info(cache, link)
>> The libnl documentation note at  says that for the rtnl_link_get_by_name
>> The reference counter of the returned link object will be incremented. Use
rtnl_link_put() to release the reference."
>> So I took that hint, and made a change that does the rtnl_link_put() in
>> get_link(name) and it looks like it works for me.
>> diff oldnetlink.py netlink.py
>> < return _link_info(cache, link)
>>> li = _link_info(cache, link)
>>> return li
>>> _rtnl_link_put = _none_proto(('rtnl_link_put', LIBNL_ROUTE))
>> Hope that helps. And if someone else could confirm that would be great.
>>  https://bugzilla.redhat.com/show_bug.cgi?id=1158108
> Thanks, John, for a great detective work.
> I'm afraid that with even on the master branch we keep calling
> rtnl_link_get_link() and rtnl_link_get_by_name() without clearing the
> reference count, so a fix is due there, too.
> Would you consider posting a fully-fledged fix to gerrit? I still need
> to understand what is the use of that refcount, so that we do not
> release it too early.
I'm happy to , although I've probably gotten something wrong with how
it's supposed to be done :) It's for the version I'm using so it's for
Users mailing list