[ovirt-users] memory leak in 3.5.6 - not vdsm

Nir Soffer nsoffer at redhat.com
Fri Jan 22 22:42:58 UTC 2016


On Fri, Jan 22, 2016 at 11:30 PM, Charles Kozler <charles at fixflyer.com> wrote:
> Hi Nir -
>
> do you have a release target date for 3.5.8? Any estimate would help.
>
> If its not VDSM, what is it exactly? Sorry, I understood from the ticket it
> was something inside vdsm, was I mistaken?

The bug I mentioned in my previous mail *is* a vdsm leak. This issue is not.

>
> CentOS 6 is the servers. 6.7 to be exact
>
> I have done all forms of flushing that I can (page cache, inodes, dentry's,
> etc) and as well moved VM's around to other nodes and nothing changes the
> memory. How can I find the leak? Where is the leak? RES shows the following
> of which, the totals dont add up to 20GB
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  19044 qemu      20   0 8876m 4.0g 5680 S  3.6 12.9   1571:44 qemu-kvm
>  26143 qemu      20   0 5094m 1.1g 5624 S  9.2  3.7   6012:12 qemu-kvm
>   5837 root       0 -20  964m 624m 3664 S  0.0  2.0  85:22.09 glusterfs
>  14328 root       0 -20  635m 169m 3384 S  0.0  0.5  43:15.23 glusterfs
>   5134 vdsm       0 -20 4368m 111m  10m S  5.9  0.3   3710:50 vdsm
>   4095 root      15  -5  727m  43m  10m S  0.0  0.1   0:02.00
> supervdsmServer
>
> 4.0G + 1.1G + 624M + 169 + 111M + 43M = ~7GB
>
> This was top sorted by RES from highest to lowest

Can you you list *all* processes  and sum the RSS of all of them?

You something like:

    for status in /proc/*/status; do egrep '^VmRSS' $status; done |
awk '{sum+=$2} END {print sum}'

> At that point I wouldnt know where else to look except slab / kernel
> structures. Of which slab shows:
>
> [compute[root at node1 ~]$ cat /proc/meminfo | grep -i slab
> Slab:            2549748 kB
>
> So roughly 2-3GB. Adding that to the other use of 7GB we have still about
> 10GB unaccounted for
>
> On Fri, Jan 22, 2016 at 4:24 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>
>> On Fri, Jan 22, 2016 at 11:08 PM, Charles Kozler <charles at fixflyer.com>
>> wrote:
>> > Hi Nir -
>> >
>> > Thanks for getting back to me. Will the patch to 3.6 be backported to
>> > 3.5?
>>
>> We plan to include them in 3.5.8.
>>
>> > As you can tell from the images, it takes days and days for it to
>> > increase
>> > over time. I also wasnt sure if that was the right bug because VDSM
>> > memory
>> > shows normal from top ...
>> >
>> >    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> >   5134 vdsm       0 -20 4368m 111m  10m S  2.0  0.3   3709:28 vdsm
>>
>> As you wrote, this issue is not related to vdsm.
>>
>> >
>> > Res is only 111M. This is from node1 which is showing currently 20GB of
>> > 32GB
>> > used with only 2 VMs running on it - 1 with 4G and another with ~1 GB of
>> > RAM
>> > configured
>> >
>> > The images are from nagios and the value here is a direct correlation to
>> > what you would see in the free command output. See below from an example
>> > of
>> > node 1 and node 2
>> >
>> > [compute[root at node1 ~]$ free
>> >              total       used       free     shared    buffers
>> > cached
>> > Mem:      32765316   20318156   12447160        252      30884
>> > 628948
>> > -/+ buffers/cache:   19658324   13106992
>> > Swap:     19247100          0   19247100
>> > [compute[root at node1 ~]$ free -m
>> >              total       used       free     shared    buffers
>> > cached
>> > Mem:         31997      19843      12153          0         30
>> > 614
>> > -/+ buffers/cache:      19199      12798
>> > Swap:        18795          0      18795
>> >
>> > And its correlated image http://i.imgur.com/PZLEgyx.png (~19GB used)
>> >
>> > And as a control, node 2 that I just restarted today
>> >
>> > [compute[root at node2 ~]$ free
>> >              total       used       free     shared    buffers
>> > cached
>> > Mem:      32765316    1815324   30949992        212      35784
>> > 717320
>> > -/+ buffers/cache:    1062220   31703096
>> > Swap:     19247100          0   19247100
>>
>> Is this rhel/centos 6?
>>
>> > [compute[root at node2 ~]$ free -m
>> >              total       used       free     shared    buffers
>> > cached
>> > Mem:         31997       1772      30225          0         34
>> > 700
>> > -/+ buffers/cache:       1036      30960
>> > Swap:        18795          0      18795
>> >
>> > And its correlated image http://i.imgur.com/8ldPVqY.png  (~2GB used).
>> > Note
>> > how 1772 in the image is exactly what is registered under 'used' in free
>> > command
>>
>> I guess you should start looking at the processes running on these nodes.
>>
>> Maybe try to collect memory usage per process using ps?
>>
>> >
>> > On Fri, Jan 22, 2016 at 3:59 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>> >>
>> >> On Fri, Jan 22, 2016 at 9:25 PM, Charles Kozler <charles at fixflyer.com>
>> >> wrote:
>> >> > Here is a screenshot of my three nodes and their increased memory
>> >> > usage
>> >> > over
>> >> > 30 days. Note that node #2 had 1 single VM that had 4GB of RAM
>> >> > assigned
>> >> > to
>> >> > it. I had since shut it down and saw no memory reclamation occur.
>> >> > Further, I
>> >> > flushed page caches and inodes and ran 'sync'. I tried everything but
>> >> > nothing brought the memory usage down. vdsm was low too (couple
>> >> > hundred
>> >> > MB)
>> >>
>> >> Note that there is an old leak in vdsm, will be fixed in next 3.6
>> >> build:
>> >> https://bugzilla.redhat.com/1269424
>> >>
>> >> > and there was no qemu-kvm process running so I'm at a loss
>> >> >
>> >> > http://imgur.com/a/aFPcK
>> >> >
>> >> > Please advise on what I can do to debug this. Note I have restarted
>> >> > node
>> >> > 2
>> >> > (which is why you see the drop) to see if it raises in memory use
>> >> > over
>> >> > tim
>> >> > even with no VM's running
>> >>
>> >> Not sure what is "memory" that you show in the graphs. Theoretically
>> >> this
>> >> may be
>> >> normal memory usage, Linux using free memory for the buffer cache.
>> >>
>> >> Can you instead show the output of "free", during one day, maybe run
>> >> once
>> >> per hour?
>> >>
>> >> You may also like to install sysstat for collecting and monitoring
>> >> resources usage.
>> >>
>> >> >
>> >> > [compute[root at node2 log]$ rpm -qa | grep -i ovirt
>> >> > libgovirt-0.3.2-1.el6.x86_64
>> >> > ovirt-release35-006-1.noarch
>> >> > ovirt-hosted-engine-ha-1.2.8-1.el6.noarch
>> >> > ovirt-hosted-engine-setup-1.2.6.1-1.el6.noarch
>> >> > ovirt-engine-sdk-python-3.5.6.0-1.el6.noarch
>> >> > ovirt-host-deploy-1.3.2-1.el6.noarch
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Charles Kozler
>> >> > Vice President, IT Operations
>> >> >
>> >> > FIX Flyer, LLC
>> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>> >> > 1-888-349-3593
>> >> > http://www.fixflyer.com
>> >> >
>> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>> >> > RECIPIENT(S)
>> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION WHICH IS
>> >> > PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>> >> > DISTRIBUTION,
>> >> > OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>> >> > INFORMATION
>> >> > IS
>> >> > RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED RECIPIENT,
>> >> > PLEASE
>> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS E-MAIL FROM
>> >> > YOUR
>> >> > SYSTEM AND DESTROY ANY COPIES.
>> >> >
>> >> > _______________________________________________
>> >> > Users mailing list
>> >> > Users at ovirt.org
>> >> > http://lists.ovirt.org/mailman/listinfo/users
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Charles Kozler
>> > Vice President, IT Operations
>> >
>> > FIX Flyer, LLC
>> > 225 Broadway | Suite 1600 | New York, NY 10007
>> > 1-888-349-3593
>> > http://www.fixflyer.com
>> >
>> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>> > RECIPIENT(S)
>> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION WHICH IS
>> > PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>> > DISTRIBUTION,
>> > OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS INFORMATION
>> > IS
>> > RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED RECIPIENT,
>> > PLEASE
>> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS E-MAIL FROM
>> > YOUR
>> > SYSTEM AND DESTROY ANY COPIES.
>
>
>
>
> --
>
> Charles Kozler
> Vice President, IT Operations
>
> FIX Flyer, LLC
> 225 Broadway | Suite 1600 | New York, NY 10007
> 1-888-349-3593
> http://www.fixflyer.com
>
> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED RECIPIENT(S)
> OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION WHICH IS
> PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING, DISTRIBUTION,
> OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS INFORMATION IS
> RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE
> CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS E-MAIL FROM YOUR
> SYSTEM AND DESTROY ANY COPIES.



More information about the Users mailing list