[ovirt-users] memory leak in 3.5.6 - not vdsm

Charles Kozler charles at fixflyer.com
Mon Feb 1 17:33:09 UTC 2016


So what about the bug that I hit for vdsm as listed above by Nir? Will I
have that patch to avoid the memory leak or no? Upgrading an entire node to
centos 7 is not actually feasible and was previously outlined above that I
just needed to upgrade to ovirt 3.6 and no mention of OS change ...
On Feb 1, 2016 12:30 PM, "Simone Tiraboschi" <stirabos at redhat.com> wrote:

>
>
> On Mon, Feb 1, 2016 at 5:40 PM, Charles Kozler <charles at fixflyer.com>
> wrote:
>
>> Sandro / Nir -
>>
>> I followed your steps plus
>>
>> http://www.ovirt.org/OVirt_3.6_Release_Notes#Fedora_.2F_CentOS_.2F_RHEL
>>
>> Engine upgraded fine but then when I got to upgrading a node I did:
>>
>> $ yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
>> $ yum update -y
>>
>> And then rebooted the node. I noticed libvirt was updated by a .1 release
>> number but vdsm (where the memory leak issue I thought was?) was not
>> upgraded. In fact, very little of ovirt packages on the node were
>> noticeably not updated
>>
>>
> We are not building vdsm for el6 in 3.6, you need also to upgrade to el7
> if you want that.
>
>
>> Updated node received the following updated packages during the install:
>>
>> http://pastebin.ca/3362714
>>
>> Note specifically the only packages updated via the ovirt3.6 repository
>> was ioprocess, otopi, ovirt-engine-sdk-python, ovirt-host-deploy,
>> ovirt-release36, and python-ioprocess. I had expected to see some packages
>> like vdsm and the likes updated - or was this not the case?
>>
>> Upgraded node:
>>
>> [compute[root at node02 yum.repos.d]$ rpm -qa | grep -i vdsm
>> vdsm-4.16.30-0.el6.x86_64
>> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
>> vdsm-cli-4.16.30-0.el6.noarch
>> vdsm-yajsonrpc-4.16.30-0.el6.noarch
>> vdsm-jsonrpc-4.16.30-0.el6.noarch
>> vdsm-xmlrpc-4.16.30-0.el6.noarch
>> vdsm-python-4.16.30-0.el6.noarch
>>
>> Nonupgraded node
>>
>> [compute[root at node01 ~]$ rpm -qa | grep -i vdsm
>> vdsm-cli-4.16.30-0.el6.noarch
>> vdsm-jsonrpc-4.16.30-0.el6.noarch
>> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
>> vdsm-xmlrpc-4.16.30-0.el6.noarch
>> vdsm-yajsonrpc-4.16.30-0.el6.noarch
>> vdsm-4.16.30-0.el6.x86_64
>> vdsm-python-4.16.30-0.el6.noarch
>>
>> Also, the docs stated that the engine VM would migrate to the freshly
>> upgraded node since it would have a higher number but it did not
>>
>> So I cant really confirm whether or not my issue will be resolved? Or
>> that if the node was actually updated properly?
>>
>> Please advise on how to confirm
>>
>> Thank you!
>>
>> On Sat, Jan 23, 2016 at 12:55 AM, Charles Kozler <charles at fixflyer.com>
>> wrote:
>>
>>> Thanks Sandro. Should clarify my storage is external on a redundant SAN.
>>> The steps I was concerned about was the actual upgrade. I tried to upgrade
>>> before and it brought my entire stack crumbling down so I'm hesitant. This
>>> bug seems like a huge bug that should at least somehow backported if at all
>>> possible because, to me, it renders the entire 3.5.6 branch unusable as no
>>> VMs can be deployed since OOM will eventually kill them. In any case that's
>>> just my opinion and I'm a new user to ovirt. The docs I followed originally
>>> got me going how I need and somehow didn't work for 3.6 in the same fashion
>>> so naturally I'm hesitant to upgrade but clearly have no option if I want
>>> to continue my infrastructure on ovirt. Thank you again for taking the time
>>> out to assist me, I truly appreciate it. I will try an upgrade next week
>>> and pray it all goes well :-)
>>> On Jan 23, 2016 12:40 AM, "Sandro Bonazzola" <sbonazzo at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jan 22, 2016 at 10:53 PM, Charles Kozler <charles at fixflyer.com>
>>>> wrote:
>>>>
>>>>> Sandro -
>>>>>
>>>>> Do you have available documentation that can support upgrading self
>>>>> hosted? I followed this
>>>>> http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/
>>>>>
>>>>> Would it be as easy as installing the RPM and then running yum upgrade?
>>>>>
>>>>>
>>>> Note that mentioned article describes an unsupported hyperconverged
>>>> setup running NFS over Gluster.
>>>> That said,
>>>> 1) put the hosted-engine storage domain into global maintenance mode
>>>> 2) upgrade the engine VM
>>>> 3) select the first host to upgrade and put it under maintenance from
>>>> the engine, wait for the engine vm to migrate if needed.
>>>> 4) yum upgrade the first host and wait until ovirt-ha-agent completes
>>>> 5) exit global and local maintenance mode
>>>> 6) repeat 3-5 on all the other hosts
>>>> 7) once all hosts are updated you can increase the cluster
>>>> compatibility level to 3.6. At this point the engine will trigger the
>>>> auto-import of the hosted-engine storage domain.
>>>>
>>>> Simone, Roy, can you confirm above steps? Maybe also you can update
>>>> http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine
>>>>
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>> On Fri, Jan 22, 2016 at 4:42 PM, Sandro Bonazzola <sbonazzo at redhat.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>> Il 22/Gen/2016 22:31, "Charles Kozler" <charles at fixflyer.com> ha
>>>>>> scritto:
>>>>>> >
>>>>>> > Hi Nir -
>>>>>> >
>>>>>> > do you have a release target date for 3.5.8? Any estimate would
>>>>>> help.
>>>>>> >
>>>>>>
>>>>>> There won't be any supported release after 3.5.6. Please update to
>>>>>> 3.6.2 next week
>>>>>>
>>>>>> > If its not VDSM, what is it exactly? Sorry, I understood from the
>>>>>> ticket it was something inside vdsm, was I mistaken?
>>>>>> >
>>>>>> > CentOS 6 is the servers. 6.7 to be exact
>>>>>> >
>>>>>> > I have done all forms of flushing that I can (page cache, inodes,
>>>>>> dentry's, etc) and as well moved VM's around to other nodes and nothing
>>>>>> changes the memory. How can I find the leak? Where is the leak? RES shows
>>>>>> the following of which, the totals dont add up to 20GB
>>>>>> >
>>>>>> >    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>>>  COMMAND
>>>>>>
>>>>>> >  19044 qemu      20   0 8876m 4.0g 5680 S  3.6 12.9   1571:44
>>>>>> qemu-kvm
>>>>>>
>>>>>> >  26143 qemu      20   0 5094m 1.1g 5624 S  9.2  3.7   6012:12
>>>>>> qemu-kvm
>>>>>>
>>>>>> >   5837 root       0 -20  964m 624m 3664 S  0.0  2.0  85:22.09
>>>>>> glusterfs
>>>>>>
>>>>>> >  14328 root       0 -20  635m 169m 3384 S  0.0  0.5  43:15.23
>>>>>> glusterfs
>>>>>>
>>>>>> >   5134 vdsm       0 -20 4368m 111m  10m S  5.9  0.3   3710:50 vdsm
>>>>>>
>>>>>>
>>>>>> >   4095 root      15  -5  727m  43m  10m S  0.0  0.1   0:02.00
>>>>>> supervdsmServer
>>>>>> >
>>>>>> > 4.0G + 1.1G + 624M + 169 + 111M + 43M = ~7GB
>>>>>> >
>>>>>> > This was top sorted by RES from highest to lowest
>>>>>> >
>>>>>> > At that point I wouldnt know where else to look except slab /
>>>>>> kernel structures. Of which slab shows:
>>>>>> >
>>>>>> > [compute[root at node1 ~]$ cat /proc/meminfo | grep -i slab
>>>>>> > Slab:            2549748 kB
>>>>>> >
>>>>>> > So roughly 2-3GB. Adding that to the other use of 7GB we have still
>>>>>> about 10GB unaccounted for
>>>>>> >
>>>>>> > On Fri, Jan 22, 2016 at 4:24 PM, Nir Soffer <nsoffer at redhat.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> On Fri, Jan 22, 2016 at 11:08 PM, Charles Kozler <
>>>>>> charles at fixflyer.com> wrote:
>>>>>> >> > Hi Nir -
>>>>>> >> >
>>>>>> >> > Thanks for getting back to me. Will the patch to 3.6 be
>>>>>> backported to 3.5?
>>>>>> >>
>>>>>> >> We plan to include them in 3.5.8.
>>>>>> >>
>>>>>> >> > As you can tell from the images, it takes days and days for it
>>>>>> to increase
>>>>>> >> > over time. I also wasnt sure if that was the right bug because
>>>>>> VDSM memory
>>>>>> >> > shows normal from top ...
>>>>>> >> >
>>>>>> >> >    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>>> COMMAND
>>>>>> >> >   5134 vdsm       0 -20 4368m 111m  10m S  2.0  0.3   3709:28
>>>>>> vdsm
>>>>>> >>
>>>>>> >> As you wrote, this issue is not related to vdsm.
>>>>>> >>
>>>>>> >> >
>>>>>> >> > Res is only 111M. This is from node1 which is showing currently
>>>>>> 20GB of 32GB
>>>>>> >> > used with only 2 VMs running on it - 1 with 4G and another with
>>>>>> ~1 GB of RAM
>>>>>> >> > configured
>>>>>> >> >
>>>>>> >> > The images are from nagios and the value here is a direct
>>>>>> correlation to
>>>>>> >> > what you would see in the free command output. See below from an
>>>>>> example of
>>>>>> >> > node 1 and node 2
>>>>>> >> >
>>>>>> >> > [compute[root at node1 ~]$ free
>>>>>> >> >              total       used       free     shared    buffers
>>>>>>    cached
>>>>>> >> > Mem:      32765316   20318156   12447160        252      30884
>>>>>>    628948
>>>>>> >> > -/+ buffers/cache:   19658324   13106992
>>>>>> >> > Swap:     19247100          0   19247100
>>>>>> >> > [compute[root at node1 ~]$ free -m
>>>>>> >> >              total       used       free     shared    buffers
>>>>>>    cached
>>>>>> >> > Mem:         31997      19843      12153          0         30
>>>>>>       614
>>>>>> >> > -/+ buffers/cache:      19199      12798
>>>>>> >> > Swap:        18795          0      18795
>>>>>> >> >
>>>>>> >> > And its correlated image http://i.imgur.com/PZLEgyx.png (~19GB
>>>>>> used)
>>>>>> >> >
>>>>>> >> > And as a control, node 2 that I just restarted today
>>>>>> >> >
>>>>>> >> > [compute[root at node2 ~]$ free
>>>>>> >> >              total       used       free     shared    buffers
>>>>>>    cached
>>>>>> >> > Mem:      32765316    1815324   30949992        212      35784
>>>>>>    717320
>>>>>> >> > -/+ buffers/cache:    1062220   31703096
>>>>>> >> > Swap:     19247100          0   19247100
>>>>>> >>
>>>>>> >> Is this rhel/centos 6?
>>>>>> >>
>>>>>> >> > [compute[root at node2 ~]$ free -m
>>>>>> >> >              total       used       free     shared    buffers
>>>>>>    cached
>>>>>> >> > Mem:         31997       1772      30225          0         34
>>>>>>       700
>>>>>> >> > -/+ buffers/cache:       1036      30960
>>>>>> >> > Swap:        18795          0      18795
>>>>>> >> >
>>>>>> >> > And its correlated image http://i.imgur.com/8ldPVqY.png  (~2GB
>>>>>> used). Note
>>>>>> >> > how 1772 in the image is exactly what is registered under 'used'
>>>>>> in free
>>>>>> >> > command
>>>>>> >>
>>>>>> >> I guess you should start looking at the processes running on these
>>>>>> nodes.
>>>>>> >>
>>>>>> >> Maybe try to collect memory usage per process using ps?
>>>>>> >>
>>>>>> >> >
>>>>>> >> > On Fri, Jan 22, 2016 at 3:59 PM, Nir Soffer <nsoffer at redhat.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> On Fri, Jan 22, 2016 at 9:25 PM, Charles Kozler <
>>>>>> charles at fixflyer.com>
>>>>>> >> >> wrote:
>>>>>> >> >> > Here is a screenshot of my three nodes and their increased
>>>>>> memory usage
>>>>>> >> >> > over
>>>>>> >> >> > 30 days. Note that node #2 had 1 single VM that had 4GB of
>>>>>> RAM assigned
>>>>>> >> >> > to
>>>>>> >> >> > it. I had since shut it down and saw no memory reclamation
>>>>>> occur.
>>>>>> >> >> > Further, I
>>>>>> >> >> > flushed page caches and inodes and ran 'sync'. I tried
>>>>>> everything but
>>>>>> >> >> > nothing brought the memory usage down. vdsm was low too
>>>>>> (couple hundred
>>>>>> >> >> > MB)
>>>>>> >> >>
>>>>>> >> >> Note that there is an old leak in vdsm, will be fixed in next
>>>>>> 3.6 build:
>>>>>> >> >> https://bugzilla.redhat.com/1269424
>>>>>> >> >>
>>>>>> >> >> > and there was no qemu-kvm process running so I'm at a loss
>>>>>> >> >> >
>>>>>> >> >> > http://imgur.com/a/aFPcK
>>>>>> >> >> >
>>>>>> >> >> > Please advise on what I can do to debug this. Note I have
>>>>>> restarted node
>>>>>> >> >> > 2
>>>>>> >> >> > (which is why you see the drop) to see if it raises in memory
>>>>>> use over
>>>>>> >> >> > tim
>>>>>> >> >> > even with no VM's running
>>>>>> >> >>
>>>>>> >> >> Not sure what is "memory" that you show in the graphs.
>>>>>> Theoretically this
>>>>>> >> >> may be
>>>>>> >> >> normal memory usage, Linux using free memory for the buffer
>>>>>> cache.
>>>>>> >> >>
>>>>>> >> >> Can you instead show the output of "free", during one day,
>>>>>> maybe run once
>>>>>> >> >> per hour?
>>>>>> >> >>
>>>>>> >> >> You may also like to install sysstat for collecting and
>>>>>> monitoring
>>>>>> >> >> resources usage.
>>>>>> >> >>
>>>>>> >> >> >
>>>>>> >> >> > [compute[root at node2 log]$ rpm -qa | grep -i ovirt
>>>>>> >> >> > libgovirt-0.3.2-1.el6.x86_64
>>>>>> >> >> > ovirt-release35-006-1.noarch
>>>>>> >> >> > ovirt-hosted-engine-ha-1.2.8-1.el6.noarch
>>>>>> >> >> > ovirt-hosted-engine-setup-1.2.6.1-1.el6.noarch
>>>>>> >> >> > ovirt-engine-sdk-python-3.5.6.0-1.el6.noarch
>>>>>> >> >> > ovirt-host-deploy-1.3.2-1.el6.noarch
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > --
>>>>>> >> >> >
>>>>>> >> >> > Charles Kozler
>>>>>> >> >> > Vice President, IT Operations
>>>>>> >> >> >
>>>>>> >> >> > FIX Flyer, LLC
>>>>>> >> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>>> >> >> > 1-888-349-3593
>>>>>> >> >> > http://www.fixflyer.com
>>>>>> >> >> >
>>>>>> >> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE
>>>>>> INTENDED
>>>>>> >> >> > RECIPIENT(S)
>>>>>> >> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>>> WHICH IS
>>>>>> >> >> > PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>>> >> >> > DISTRIBUTION,
>>>>>> >> >> > OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>>> INFORMATION
>>>>>> >> >> > IS
>>>>>> >> >> > RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>>> RECIPIENT,
>>>>>> >> >> > PLEASE
>>>>>> >> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>>> E-MAIL FROM
>>>>>> >> >> > YOUR
>>>>>> >> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>>> >> >> >
>>>>>> >> >> > _______________________________________________
>>>>>> >> >> > Users mailing list
>>>>>> >> >> > Users at ovirt.org
>>>>>> >> >> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> >
>>>>>> >> > Charles Kozler
>>>>>> >> > Vice President, IT Operations
>>>>>> >> >
>>>>>> >> > FIX Flyer, LLC
>>>>>> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>>> >> > 1-888-349-3593
>>>>>> >> > http://www.fixflyer.com
>>>>>> >> >
>>>>>> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>>> RECIPIENT(S)
>>>>>> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION WHICH
>>>>>> IS
>>>>>> >> > PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>>> DISTRIBUTION,
>>>>>> >> > OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>>> INFORMATION IS
>>>>>> >> > RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>>> RECIPIENT, PLEASE
>>>>>> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS E-MAIL
>>>>>> FROM YOUR
>>>>>> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > Charles Kozler
>>>>>> > Vice President, IT Operations
>>>>>> >
>>>>>> > FIX Flyer, LLC
>>>>>> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>>> > 1-888-349-3593
>>>>>> > http://www.fixflyer.com
>>>>>> >
>>>>>> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>>> INFORMATION IS RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Users mailing list
>>>>>> > Users at ovirt.org
>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Charles Kozler*
>>>>> *Vice President, IT Operations*
>>>>>
>>>>> FIX Flyer, LLC
>>>>> 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> 1-888-349-3593
>>>>> http://www.fixflyer.com <http://fixflyer.com>
>>>>>
>>>>> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>> INFORMATION IS RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sandro Bonazzola
>>>> Better technology. Faster innovation. Powered by community
>>>> collaboration.
>>>> See how it works at redhat.com
>>>>
>>>
>>
>>
>> --
>>
>> *Charles Kozler*
>> *Vice President, IT Operations*
>>
>> FIX Flyer, LLC
>> 225 Broadway | Suite 1600 | New York, NY 10007
>> 1-888-349-3593
>> http://www.fixflyer.com <http://fixflyer.com>
>>
>> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>> WHICH IS PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>> INFORMATION IS RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160201/5ace4fb5/attachment-0001.html>


More information about the Users mailing list