[ovirt-users] memory leak in 3.5.6 - not vdsm

Simone Tiraboschi stirabos at redhat.com
Mon Feb 1 17:29:36 UTC 2016


On Mon, Feb 1, 2016 at 5:40 PM, Charles Kozler <charles at fixflyer.com> wrote:

> Sandro / Nir -
>
> I followed your steps plus
>
> http://www.ovirt.org/OVirt_3.6_Release_Notes#Fedora_.2F_CentOS_.2F_RHEL
>
> Engine upgraded fine but then when I got to upgrading a node I did:
>
> $ yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
> $ yum update -y
>
> And then rebooted the node. I noticed libvirt was updated by a .1 release
> number but vdsm (where the memory leak issue I thought was?) was not
> upgraded. In fact, very little of ovirt packages on the node were
> noticeably not updated
>
>
We are not building vdsm for el6 in 3.6, you need also to upgrade to el7 if
you want that.


> Updated node received the following updated packages during the install:
>
> http://pastebin.ca/3362714
>
> Note specifically the only packages updated via the ovirt3.6 repository
> was ioprocess, otopi, ovirt-engine-sdk-python, ovirt-host-deploy,
> ovirt-release36, and python-ioprocess. I had expected to see some packages
> like vdsm and the likes updated - or was this not the case?
>
> Upgraded node:
>
> [compute[root at node02 yum.repos.d]$ rpm -qa | grep -i vdsm
> vdsm-4.16.30-0.el6.x86_64
> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
> vdsm-cli-4.16.30-0.el6.noarch
> vdsm-yajsonrpc-4.16.30-0.el6.noarch
> vdsm-jsonrpc-4.16.30-0.el6.noarch
> vdsm-xmlrpc-4.16.30-0.el6.noarch
> vdsm-python-4.16.30-0.el6.noarch
>
> Nonupgraded node
>
> [compute[root at node01 ~]$ rpm -qa | grep -i vdsm
> vdsm-cli-4.16.30-0.el6.noarch
> vdsm-jsonrpc-4.16.30-0.el6.noarch
> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
> vdsm-xmlrpc-4.16.30-0.el6.noarch
> vdsm-yajsonrpc-4.16.30-0.el6.noarch
> vdsm-4.16.30-0.el6.x86_64
> vdsm-python-4.16.30-0.el6.noarch
>
> Also, the docs stated that the engine VM would migrate to the freshly
> upgraded node since it would have a higher number but it did not
>
> So I cant really confirm whether or not my issue will be resolved? Or that
> if the node was actually updated properly?
>
> Please advise on how to confirm
>
> Thank you!
>
> On Sat, Jan 23, 2016 at 12:55 AM, Charles Kozler <charles at fixflyer.com>
> wrote:
>
>> Thanks Sandro. Should clarify my storage is external on a redundant SAN.
>> The steps I was concerned about was the actual upgrade. I tried to upgrade
>> before and it brought my entire stack crumbling down so I'm hesitant. This
>> bug seems like a huge bug that should at least somehow backported if at all
>> possible because, to me, it renders the entire 3.5.6 branch unusable as no
>> VMs can be deployed since OOM will eventually kill them. In any case that's
>> just my opinion and I'm a new user to ovirt. The docs I followed originally
>> got me going how I need and somehow didn't work for 3.6 in the same fashion
>> so naturally I'm hesitant to upgrade but clearly have no option if I want
>> to continue my infrastructure on ovirt. Thank you again for taking the time
>> out to assist me, I truly appreciate it. I will try an upgrade next week
>> and pray it all goes well :-)
>> On Jan 23, 2016 12:40 AM, "Sandro Bonazzola" <sbonazzo at redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Jan 22, 2016 at 10:53 PM, Charles Kozler <charles at fixflyer.com>
>>> wrote:
>>>
>>>> Sandro -
>>>>
>>>> Do you have available documentation that can support upgrading self
>>>> hosted? I followed this
>>>> http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/
>>>>
>>>> Would it be as easy as installing the RPM and then running yum upgrade?
>>>>
>>>>
>>> Note that mentioned article describes an unsupported hyperconverged
>>> setup running NFS over Gluster.
>>> That said,
>>> 1) put the hosted-engine storage domain into global maintenance mode
>>> 2) upgrade the engine VM
>>> 3) select the first host to upgrade and put it under maintenance from
>>> the engine, wait for the engine vm to migrate if needed.
>>> 4) yum upgrade the first host and wait until ovirt-ha-agent completes
>>> 5) exit global and local maintenance mode
>>> 6) repeat 3-5 on all the other hosts
>>> 7) once all hosts are updated you can increase the cluster compatibility
>>> level to 3.6. At this point the engine will trigger the auto-import of the
>>> hosted-engine storage domain.
>>>
>>> Simone, Roy, can you confirm above steps? Maybe also you can update
>>> http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine
>>>
>>>
>>>
>>>> Thanks
>>>>
>>>> On Fri, Jan 22, 2016 at 4:42 PM, Sandro Bonazzola <sbonazzo at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Il 22/Gen/2016 22:31, "Charles Kozler" <charles at fixflyer.com> ha
>>>>> scritto:
>>>>> >
>>>>> > Hi Nir -
>>>>> >
>>>>> > do you have a release target date for 3.5.8? Any estimate would help.
>>>>> >
>>>>>
>>>>> There won't be any supported release after 3.5.6. Please update to
>>>>> 3.6.2 next week
>>>>>
>>>>> > If its not VDSM, what is it exactly? Sorry, I understood from the
>>>>> ticket it was something inside vdsm, was I mistaken?
>>>>> >
>>>>> > CentOS 6 is the servers. 6.7 to be exact
>>>>> >
>>>>> > I have done all forms of flushing that I can (page cache, inodes,
>>>>> dentry's, etc) and as well moved VM's around to other nodes and nothing
>>>>> changes the memory. How can I find the leak? Where is the leak? RES shows
>>>>> the following of which, the totals dont add up to 20GB
>>>>> >
>>>>> >    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>>  COMMAND
>>>>>
>>>>> >  19044 qemu      20   0 8876m 4.0g 5680 S  3.6 12.9   1571:44
>>>>> qemu-kvm
>>>>>
>>>>> >  26143 qemu      20   0 5094m 1.1g 5624 S  9.2  3.7   6012:12
>>>>> qemu-kvm
>>>>>
>>>>> >   5837 root       0 -20  964m 624m 3664 S  0.0  2.0  85:22.09
>>>>> glusterfs
>>>>>
>>>>> >  14328 root       0 -20  635m 169m 3384 S  0.0  0.5  43:15.23
>>>>> glusterfs
>>>>>
>>>>> >   5134 vdsm       0 -20 4368m 111m  10m S  5.9  0.3   3710:50 vdsm
>>>>>
>>>>>
>>>>> >   4095 root      15  -5  727m  43m  10m S  0.0  0.1   0:02.00
>>>>> supervdsmServer
>>>>> >
>>>>> > 4.0G + 1.1G + 624M + 169 + 111M + 43M = ~7GB
>>>>> >
>>>>> > This was top sorted by RES from highest to lowest
>>>>> >
>>>>> > At that point I wouldnt know where else to look except slab / kernel
>>>>> structures. Of which slab shows:
>>>>> >
>>>>> > [compute[root at node1 ~]$ cat /proc/meminfo | grep -i slab
>>>>> > Slab:            2549748 kB
>>>>> >
>>>>> > So roughly 2-3GB. Adding that to the other use of 7GB we have still
>>>>> about 10GB unaccounted for
>>>>> >
>>>>> > On Fri, Jan 22, 2016 at 4:24 PM, Nir Soffer <nsoffer at redhat.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Fri, Jan 22, 2016 at 11:08 PM, Charles Kozler <
>>>>> charles at fixflyer.com> wrote:
>>>>> >> > Hi Nir -
>>>>> >> >
>>>>> >> > Thanks for getting back to me. Will the patch to 3.6 be
>>>>> backported to 3.5?
>>>>> >>
>>>>> >> We plan to include them in 3.5.8.
>>>>> >>
>>>>> >> > As you can tell from the images, it takes days and days for it to
>>>>> increase
>>>>> >> > over time. I also wasnt sure if that was the right bug because
>>>>> VDSM memory
>>>>> >> > shows normal from top ...
>>>>> >> >
>>>>> >> >    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>> COMMAND
>>>>> >> >   5134 vdsm       0 -20 4368m 111m  10m S  2.0  0.3   3709:28 vdsm
>>>>> >>
>>>>> >> As you wrote, this issue is not related to vdsm.
>>>>> >>
>>>>> >> >
>>>>> >> > Res is only 111M. This is from node1 which is showing currently
>>>>> 20GB of 32GB
>>>>> >> > used with only 2 VMs running on it - 1 with 4G and another with
>>>>> ~1 GB of RAM
>>>>> >> > configured
>>>>> >> >
>>>>> >> > The images are from nagios and the value here is a direct
>>>>> correlation to
>>>>> >> > what you would see in the free command output. See below from an
>>>>> example of
>>>>> >> > node 1 and node 2
>>>>> >> >
>>>>> >> > [compute[root at node1 ~]$ free
>>>>> >> >              total       used       free     shared    buffers
>>>>>  cached
>>>>> >> > Mem:      32765316   20318156   12447160        252      30884
>>>>>  628948
>>>>> >> > -/+ buffers/cache:   19658324   13106992
>>>>> >> > Swap:     19247100          0   19247100
>>>>> >> > [compute[root at node1 ~]$ free -m
>>>>> >> >              total       used       free     shared    buffers
>>>>>  cached
>>>>> >> > Mem:         31997      19843      12153          0         30
>>>>>     614
>>>>> >> > -/+ buffers/cache:      19199      12798
>>>>> >> > Swap:        18795          0      18795
>>>>> >> >
>>>>> >> > And its correlated image http://i.imgur.com/PZLEgyx.png (~19GB
>>>>> used)
>>>>> >> >
>>>>> >> > And as a control, node 2 that I just restarted today
>>>>> >> >
>>>>> >> > [compute[root at node2 ~]$ free
>>>>> >> >              total       used       free     shared    buffers
>>>>>  cached
>>>>> >> > Mem:      32765316    1815324   30949992        212      35784
>>>>>  717320
>>>>> >> > -/+ buffers/cache:    1062220   31703096
>>>>> >> > Swap:     19247100          0   19247100
>>>>> >>
>>>>> >> Is this rhel/centos 6?
>>>>> >>
>>>>> >> > [compute[root at node2 ~]$ free -m
>>>>> >> >              total       used       free     shared    buffers
>>>>>  cached
>>>>> >> > Mem:         31997       1772      30225          0         34
>>>>>     700
>>>>> >> > -/+ buffers/cache:       1036      30960
>>>>> >> > Swap:        18795          0      18795
>>>>> >> >
>>>>> >> > And its correlated image http://i.imgur.com/8ldPVqY.png  (~2GB
>>>>> used). Note
>>>>> >> > how 1772 in the image is exactly what is registered under 'used'
>>>>> in free
>>>>> >> > command
>>>>> >>
>>>>> >> I guess you should start looking at the processes running on these
>>>>> nodes.
>>>>> >>
>>>>> >> Maybe try to collect memory usage per process using ps?
>>>>> >>
>>>>> >> >
>>>>> >> > On Fri, Jan 22, 2016 at 3:59 PM, Nir Soffer <nsoffer at redhat.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> On Fri, Jan 22, 2016 at 9:25 PM, Charles Kozler <
>>>>> charles at fixflyer.com>
>>>>> >> >> wrote:
>>>>> >> >> > Here is a screenshot of my three nodes and their increased
>>>>> memory usage
>>>>> >> >> > over
>>>>> >> >> > 30 days. Note that node #2 had 1 single VM that had 4GB of RAM
>>>>> assigned
>>>>> >> >> > to
>>>>> >> >> > it. I had since shut it down and saw no memory reclamation
>>>>> occur.
>>>>> >> >> > Further, I
>>>>> >> >> > flushed page caches and inodes and ran 'sync'. I tried
>>>>> everything but
>>>>> >> >> > nothing brought the memory usage down. vdsm was low too
>>>>> (couple hundred
>>>>> >> >> > MB)
>>>>> >> >>
>>>>> >> >> Note that there is an old leak in vdsm, will be fixed in next
>>>>> 3.6 build:
>>>>> >> >> https://bugzilla.redhat.com/1269424
>>>>> >> >>
>>>>> >> >> > and there was no qemu-kvm process running so I'm at a loss
>>>>> >> >> >
>>>>> >> >> > http://imgur.com/a/aFPcK
>>>>> >> >> >
>>>>> >> >> > Please advise on what I can do to debug this. Note I have
>>>>> restarted node
>>>>> >> >> > 2
>>>>> >> >> > (which is why you see the drop) to see if it raises in memory
>>>>> use over
>>>>> >> >> > tim
>>>>> >> >> > even with no VM's running
>>>>> >> >>
>>>>> >> >> Not sure what is "memory" that you show in the graphs.
>>>>> Theoretically this
>>>>> >> >> may be
>>>>> >> >> normal memory usage, Linux using free memory for the buffer
>>>>> cache.
>>>>> >> >>
>>>>> >> >> Can you instead show the output of "free", during one day, maybe
>>>>> run once
>>>>> >> >> per hour?
>>>>> >> >>
>>>>> >> >> You may also like to install sysstat for collecting and
>>>>> monitoring
>>>>> >> >> resources usage.
>>>>> >> >>
>>>>> >> >> >
>>>>> >> >> > [compute[root at node2 log]$ rpm -qa | grep -i ovirt
>>>>> >> >> > libgovirt-0.3.2-1.el6.x86_64
>>>>> >> >> > ovirt-release35-006-1.noarch
>>>>> >> >> > ovirt-hosted-engine-ha-1.2.8-1.el6.noarch
>>>>> >> >> > ovirt-hosted-engine-setup-1.2.6.1-1.el6.noarch
>>>>> >> >> > ovirt-engine-sdk-python-3.5.6.0-1.el6.noarch
>>>>> >> >> > ovirt-host-deploy-1.3.2-1.el6.noarch
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > --
>>>>> >> >> >
>>>>> >> >> > Charles Kozler
>>>>> >> >> > Vice President, IT Operations
>>>>> >> >> >
>>>>> >> >> > FIX Flyer, LLC
>>>>> >> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> >> >> > 1-888-349-3593
>>>>> >> >> > http://www.fixflyer.com
>>>>> >> >> >
>>>>> >> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> >> >> > RECIPIENT(S)
>>>>> >> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>> WHICH IS
>>>>> >> >> > PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>> >> >> > DISTRIBUTION,
>>>>> >> >> > OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>> INFORMATION
>>>>> >> >> > IS
>>>>> >> >> > RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT,
>>>>> >> >> > PLEASE
>>>>> >> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>> E-MAIL FROM
>>>>> >> >> > YOUR
>>>>> >> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>> >> >> >
>>>>> >> >> > _______________________________________________
>>>>> >> >> > Users mailing list
>>>>> >> >> > Users at ovirt.org
>>>>> >> >> > http://lists.ovirt.org/mailman/listinfo/users
>>>>> >> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> >
>>>>> >> > Charles Kozler
>>>>> >> > Vice President, IT Operations
>>>>> >> >
>>>>> >> > FIX Flyer, LLC
>>>>> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> >> > 1-888-349-3593
>>>>> >> > http://www.fixflyer.com
>>>>> >> >
>>>>> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> RECIPIENT(S)
>>>>> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION WHICH
>>>>> IS
>>>>> >> > PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>> DISTRIBUTION,
>>>>> >> > OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>> INFORMATION IS
>>>>> >> > RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT, PLEASE
>>>>> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS E-MAIL
>>>>> FROM YOUR
>>>>> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> > Charles Kozler
>>>>> > Vice President, IT Operations
>>>>> >
>>>>> > FIX Flyer, LLC
>>>>> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> > 1-888-349-3593
>>>>> > http://www.fixflyer.com
>>>>> >
>>>>> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>>> INFORMATION IS RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>> >
>>>>> > _______________________________________________
>>>>> > Users mailing list
>>>>> > Users at ovirt.org
>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Charles Kozler*
>>>> *Vice President, IT Operations*
>>>>
>>>> FIX Flyer, LLC
>>>> 225 Broadway | Suite 1600 | New York, NY 10007
>>>> 1-888-349-3593
>>>> http://www.fixflyer.com <http://fixflyer.com>
>>>>
>>>> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
>>>> INFORMATION IS RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>
>>>
>>>
>>>
>>> --
>>> Sandro Bonazzola
>>> Better technology. Faster innovation. Powered by community collaboration.
>>> See how it works at redhat.com
>>>
>>
>
>
> --
>
> *Charles Kozler*
> *Vice President, IT Operations*
>
> FIX Flyer, LLC
> 225 Broadway | Suite 1600 | New York, NY 10007
> 1-888-349-3593
> http://www.fixflyer.com <http://fixflyer.com>
>
> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
> WHICH IS PROPRIETARY TO FIX FLYER LLC.  ANY UNAUTHORIZED USE, COPYING,
> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED.  ALL RIGHTS TO THIS
> INFORMATION IS RESERVED BY FIX FLYER LLC.  IF YOU ARE NOT THE INTENDED
> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160201/642bb95a/attachment-0001.html>


More information about the Users mailing list