[ovirt-users] VDSM memory consumption

John Taylor jtt77777 at yahoo.com
Sat Mar 28 10:20:25 EDT 2015


Daniel Helgenberger <daniel.helgenberger at m-box.de> writes:

> Hello Everyone,
>
> I did create the original BZ on this. In the mean time, lab system I
> used is dismantled and the production system is yet to deploy.
>
> As I wrote in BZ1147148 [1], I experienced two different issues. One,
> one big mem leak of about 15MiB/h and a smaller one, ~300KiB. These seem
> unrelated.
>
> The larger leak was indeed related to SSL in some way; not necessarily
> M2Crypto. However, after disabling SSL this was gone leaving the smaller
> leak.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1147148


I think there are, at least for the purpose of this discussion, 3 leaks:
1. the M2Crypto leak
2. a slower leak 
3. a large leak that's not M2Crypto related that's part of sampling

My efforts have been around finding the source of my larger leak, which
I think is #3.  I had disabled ssl so I knew that M2Crypto
isn't/shouldn't be the problem as in bz1147148, and ssl is beside the
point as it happens with a deactived host. It's part of sampling which
always runs.

What I've found is, after trying to get the smallest reproducer, that
it's not the netlink.iter_links that I commented on in [1] that is the
problem. But in the _get_intefaces_and_samples loop is the call to
create an InterfaceSample and that has getLinkSpeed() which, for vlans,
ends up calling ipwrapper.getLink, and that to
netlink.get_link(name)

netlink.get_link(name) *is* the source of my big leak. This is vdsm
4.16.10, so it is [2] and it's been changed in master for the removal of
support for libnl v1 so it might not be a problem anymore. 
 
def get_link(name):
    """Returns the information dictionary of the name specified link."""
    with _pool.socket() as sock:
        with _nl_link_cache(sock) as cache:
            link = _rtnl_link_get_by_name(cache, name)
            if not link:
                raise IOError(errno.ENODEV, '%s is not present in the system' %
                              name)
            return _link_info(cache, link)


The libnl documentation note at [3] says that for the rtnl_link_get_by_name function 
"Attention
    The reference counter of the returned link object will be incremented. Use rtnl_link_put() to release the reference."

So I took that hint, and made a change that does the rtnl_link_put() in
get_link(name) and it looks like it works for me.

diff oldnetlink.py netlink.py
67d66
<             return _link_info(cache, link)
68a68,70
>             li = _link_info(cache, link)
>             _rtnl_link_put(link)
>             return li
333a336,337
> 
> _rtnl_link_put  = _none_proto(('rtnl_link_put', LIBNL_ROUTE))

Hope that helps. And if someone else could confirm that would be great.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158108
[2] https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/netlink.py;h=afae5cecb5ce701d00fb8f019ec92b3331a39036;hb=5608cfdf43db9186dabac4b2a779f9557e798968
[3] http://www.infradead.org/~tgr/libnl/doc/api/group__link.html#ga1d583e4f0b43c89d854e5e681a529fad

-John


More information about the Users mailing list