[ovirt-users] memory leak in 3.5.6 - not vdsm
Simone Tiraboschi
stirabos at redhat.com
Mon Feb 1 17:29:36 UTC 2016
On Mon, Feb 1, 2016 at 5:40 PM, Charles Kozler <charles at fixflyer.com> wrote:
> Sandro / Nir -
>
> I followed your steps plus
>
> http://www.ovirt.org/OVirt_3.6_Release_Notes#Fedora_.2F_CentOS_.2F_RHEL
>
> Engine upgraded fine but then when I got to upgrading a node I did:
>
> $ yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
> $ yum update -y
>
> And then rebooted the node. I noticed libvirt was updated by a .1 release
> number but vdsm (where the memory leak issue I thought was?) was not
> upgraded. In fact, very little of ovirt packages on the node were
> noticeably not updated
>
>
We are not building vdsm for el6 in 3.6, you need also to upgrade to el7 if
you want that.
> Updated node received the following updated packages during the install:
>
> http://pastebin.ca/3362714
>
> Note specifically the only packages updated via the ovirt3.6 repository
> was ioprocess, otopi, ovirt-engine-sdk-python, ovirt-host-deploy,
> ovirt-release36, and python-ioprocess. I had expected to see some packages
> like vdsm and the likes updated - or was this not the case?
>
> Upgraded node:
>
> [compute[root at node02 yum.repos.d]$ rpm -qa | grep -i vdsm
> vdsm-4.16.30-0.el6.x86_64
> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
> vdsm-cli-4.16.30-0.el6.noarch
> vdsm-yajsonrpc-4.16.30-0.el6.noarch
> vdsm-jsonrpc-4.16.30-0.el6.noarch
> vdsm-xmlrpc-4.16.30-0.el6.noarch
> vdsm-python-4.16.30-0.el6.noarch
>
> Nonupgraded node
>
> [compute[root at node01 ~]$ rpm -qa | grep -i vdsm
> vdsm-cli-4.16.30-0.el6.noarch
> vdsm-jsonrpc-4.16.30-0.el6.noarch
> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
> vdsm-xmlrpc-4.16.30-0.el6.noarch
> vdsm-yajsonrpc-4.16.30-0.el6.noarch
> vdsm-4.16.30-0.el6.x86_64
> vdsm-python-4.16.30-0.el6.noarch
>
> Also, the docs stated that the engine VM would migrate to the freshly
> upgraded node since it would have a higher number but it did not
>
> So I cant really confirm whether or not my issue will be resolved? Or that
> if the node was actually updated properly?
>
> Please advise on how to confirm
>
> Thank you!
>
> On Sat, Jan 23, 2016 at 12:55 AM, Charles Kozler <charles at fixflyer.com>
> wrote:
>
>> Thanks Sandro. Should clarify my storage is external on a redundant SAN.
>> The steps I was concerned about was the actual upgrade. I tried to upgrade
>> before and it brought my entire stack crumbling down so I'm hesitant. This
>> bug seems like a huge bug that should at least somehow backported if at all
>> possible because, to me, it renders the entire 3.5.6 branch unusable as no
>> VMs can be deployed since OOM will eventually kill them. In any case that's
>> just my opinion and I'm a new user to ovirt. The docs I followed originally
>> got me going how I need and somehow didn't work for 3.6 in the same fashion
>> so naturally I'm hesitant to upgrade but clearly have no option if I want
>> to continue my infrastructure on ovirt. Thank you again for taking the time
>> out to assist me, I truly appreciate it. I will try an upgrade next week
>> and pray it all goes well :-)
>> On Jan 23, 2016 12:40 AM, "Sandro Bonazzola" <sbonazzo at redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Jan 22, 2016 at 10:53 PM, Charles Kozler <charles at fixflyer.com>
>>> wrote:
>>>
>>>> Sandro -
>>>>
>>>> Do you have available documentation that can support upgrading self
>>>> hosted? I followed this
>>>> http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/
>>>>
>>>> Would it be as easy as installing the RPM and then running yum upgrade?
>>>>
>>>>
>>> Note that mentioned article describes an unsupported hyperconverged
>>> setup running NFS over Gluster.
>>> That said,
>>> 1) put the hosted-engine storage domain into global maintenance mode
>>> 2) upgrade the engine VM
>>> 3) select the first host to upgrade and put it under maintenance from
>>> the engine, wait for the engine vm to migrate if needed.
>>> 4) yum upgrade the first host and wait until ovirt-ha-agent completes
>>> 5) exit global and local maintenance mode
>>> 6) repeat 3-5 on all the other hosts
>>> 7) once all hosts are updated you can increase the cluster compatibility
>>> level to 3.6. At this point the engine will trigger the auto-import of the
>>> hosted-engine storage domain.
>>>
>>> Simone, Roy, can you confirm above steps? Maybe also you can update
>>> http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine
>>>
>>>
>>>
>>>> Thanks
>>>>
>>>> On Fri, Jan 22, 2016 at 4:42 PM, Sandro Bonazzola <sbonazzo at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Il 22/Gen/2016 22:31, "Charles Kozler" <charles at fixflyer.com> ha
>>>>> scritto:
>>>>> >
>>>>> > Hi Nir -
>>>>> >
>>>>> > do you have a release target date for 3.5.8? Any estimate would help.
>>>>> >
>>>>>
>>>>> There won't be any supported release after 3.5.6. Please update to
>>>>> 3.6.2 next week
>>>>>
>>>>> > If its not VDSM, what is it exactly? Sorry, I understood from the
>>>>> ticket it was something inside vdsm, was I mistaken?
>>>>> >
>>>>> > CentOS 6 is the servers. 6.7 to be exact
>>>>> >
>>>>> > I have done all forms of flushing that I can (page cache, inodes,
>>>>> dentry's, etc) and as well moved VM's around to other nodes and nothing
>>>>> changes the memory. How can I find the leak? Where is the leak? RES shows
>>>>> the following of which, the totals dont add up to 20GB
>>>>> >
>>>>> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>>>> COMMAND
>>>>>
>>>>> > 19044 qemu 20 0 8876m 4.0g 5680 S 3.6 12.9 1571:44
>>>>> qemu-kvm
>>>>>
>>>>> > 26143 qemu 20 0 5094m 1.1g 5624 S 9.2 3.7 6012:12
>>>>> qemu-kvm
>>>>>
>>>>> > 5837 root 0 -20 964m 624m 3664 S 0.0 2.0 85:22.09
>>>>> glusterfs
>>>>>
>>>>> > 14328 root 0 -20 635m 169m 3384 S 0.0 0.5 43:15.23
>>>>> glusterfs
>>>>>
>>>>> > 5134 vdsm 0 -20 4368m 111m 10m S 5.9 0.3 3710:50 vdsm
>>>>>
>>>>>
>>>>> > 4095 root 15 -5 727m 43m 10m S 0.0 0.1 0:02.00
>>>>> supervdsmServer
>>>>> >
>>>>> > 4.0G + 1.1G + 624M + 169 + 111M + 43M = ~7GB
>>>>> >
>>>>> > This was top sorted by RES from highest to lowest
>>>>> >
>>>>> > At that point I wouldnt know where else to look except slab / kernel
>>>>> structures. Of which slab shows:
>>>>> >
>>>>> > [compute[root at node1 ~]$ cat /proc/meminfo | grep -i slab
>>>>> > Slab: 2549748 kB
>>>>> >
>>>>> > So roughly 2-3GB. Adding that to the other use of 7GB we have still
>>>>> about 10GB unaccounted for
>>>>> >
>>>>> > On Fri, Jan 22, 2016 at 4:24 PM, Nir Soffer <nsoffer at redhat.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Fri, Jan 22, 2016 at 11:08 PM, Charles Kozler <
>>>>> charles at fixflyer.com> wrote:
>>>>> >> > Hi Nir -
>>>>> >> >
>>>>> >> > Thanks for getting back to me. Will the patch to 3.6 be
>>>>> backported to 3.5?
>>>>> >>
>>>>> >> We plan to include them in 3.5.8.
>>>>> >>
>>>>> >> > As you can tell from the images, it takes days and days for it to
>>>>> increase
>>>>> >> > over time. I also wasnt sure if that was the right bug because
>>>>> VDSM memory
>>>>> >> > shows normal from top ...
>>>>> >> >
>>>>> >> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>>>> COMMAND
>>>>> >> > 5134 vdsm 0 -20 4368m 111m 10m S 2.0 0.3 3709:28 vdsm
>>>>> >>
>>>>> >> As you wrote, this issue is not related to vdsm.
>>>>> >>
>>>>> >> >
>>>>> >> > Res is only 111M. This is from node1 which is showing currently
>>>>> 20GB of 32GB
>>>>> >> > used with only 2 VMs running on it - 1 with 4G and another with
>>>>> ~1 GB of RAM
>>>>> >> > configured
>>>>> >> >
>>>>> >> > The images are from nagios and the value here is a direct
>>>>> correlation to
>>>>> >> > what you would see in the free command output. See below from an
>>>>> example of
>>>>> >> > node 1 and node 2
>>>>> >> >
>>>>> >> > [compute[root at node1 ~]$ free
>>>>> >> > total used free shared buffers
>>>>> cached
>>>>> >> > Mem: 32765316 20318156 12447160 252 30884
>>>>> 628948
>>>>> >> > -/+ buffers/cache: 19658324 13106992
>>>>> >> > Swap: 19247100 0 19247100
>>>>> >> > [compute[root at node1 ~]$ free -m
>>>>> >> > total used free shared buffers
>>>>> cached
>>>>> >> > Mem: 31997 19843 12153 0 30
>>>>> 614
>>>>> >> > -/+ buffers/cache: 19199 12798
>>>>> >> > Swap: 18795 0 18795
>>>>> >> >
>>>>> >> > And its correlated image http://i.imgur.com/PZLEgyx.png (~19GB
>>>>> used)
>>>>> >> >
>>>>> >> > And as a control, node 2 that I just restarted today
>>>>> >> >
>>>>> >> > [compute[root at node2 ~]$ free
>>>>> >> > total used free shared buffers
>>>>> cached
>>>>> >> > Mem: 32765316 1815324 30949992 212 35784
>>>>> 717320
>>>>> >> > -/+ buffers/cache: 1062220 31703096
>>>>> >> > Swap: 19247100 0 19247100
>>>>> >>
>>>>> >> Is this rhel/centos 6?
>>>>> >>
>>>>> >> > [compute[root at node2 ~]$ free -m
>>>>> >> > total used free shared buffers
>>>>> cached
>>>>> >> > Mem: 31997 1772 30225 0 34
>>>>> 700
>>>>> >> > -/+ buffers/cache: 1036 30960
>>>>> >> > Swap: 18795 0 18795
>>>>> >> >
>>>>> >> > And its correlated image http://i.imgur.com/8ldPVqY.png (~2GB
>>>>> used). Note
>>>>> >> > how 1772 in the image is exactly what is registered under 'used'
>>>>> in free
>>>>> >> > command
>>>>> >>
>>>>> >> I guess you should start looking at the processes running on these
>>>>> nodes.
>>>>> >>
>>>>> >> Maybe try to collect memory usage per process using ps?
>>>>> >>
>>>>> >> >
>>>>> >> > On Fri, Jan 22, 2016 at 3:59 PM, Nir Soffer <nsoffer at redhat.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> On Fri, Jan 22, 2016 at 9:25 PM, Charles Kozler <
>>>>> charles at fixflyer.com>
>>>>> >> >> wrote:
>>>>> >> >> > Here is a screenshot of my three nodes and their increased
>>>>> memory usage
>>>>> >> >> > over
>>>>> >> >> > 30 days. Note that node #2 had 1 single VM that had 4GB of RAM
>>>>> assigned
>>>>> >> >> > to
>>>>> >> >> > it. I had since shut it down and saw no memory reclamation
>>>>> occur.
>>>>> >> >> > Further, I
>>>>> >> >> > flushed page caches and inodes and ran 'sync'. I tried
>>>>> everything but
>>>>> >> >> > nothing brought the memory usage down. vdsm was low too
>>>>> (couple hundred
>>>>> >> >> > MB)
>>>>> >> >>
>>>>> >> >> Note that there is an old leak in vdsm, will be fixed in next
>>>>> 3.6 build:
>>>>> >> >> https://bugzilla.redhat.com/1269424
>>>>> >> >>
>>>>> >> >> > and there was no qemu-kvm process running so I'm at a loss
>>>>> >> >> >
>>>>> >> >> > http://imgur.com/a/aFPcK
>>>>> >> >> >
>>>>> >> >> > Please advise on what I can do to debug this. Note I have
>>>>> restarted node
>>>>> >> >> > 2
>>>>> >> >> > (which is why you see the drop) to see if it raises in memory
>>>>> use over
>>>>> >> >> > tim
>>>>> >> >> > even with no VM's running
>>>>> >> >>
>>>>> >> >> Not sure what is "memory" that you show in the graphs.
>>>>> Theoretically this
>>>>> >> >> may be
>>>>> >> >> normal memory usage, Linux using free memory for the buffer
>>>>> cache.
>>>>> >> >>
>>>>> >> >> Can you instead show the output of "free", during one day, maybe
>>>>> run once
>>>>> >> >> per hour?
>>>>> >> >>
>>>>> >> >> You may also like to install sysstat for collecting and
>>>>> monitoring
>>>>> >> >> resources usage.
>>>>> >> >>
>>>>> >> >> >
>>>>> >> >> > [compute[root at node2 log]$ rpm -qa | grep -i ovirt
>>>>> >> >> > libgovirt-0.3.2-1.el6.x86_64
>>>>> >> >> > ovirt-release35-006-1.noarch
>>>>> >> >> > ovirt-hosted-engine-ha-1.2.8-1.el6.noarch
>>>>> >> >> > ovirt-hosted-engine-setup-1.2.6.1-1.el6.noarch
>>>>> >> >> > ovirt-engine-sdk-python-3.5.6.0-1.el6.noarch
>>>>> >> >> > ovirt-host-deploy-1.3.2-1.el6.noarch
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > --
>>>>> >> >> >
>>>>> >> >> > Charles Kozler
>>>>> >> >> > Vice President, IT Operations
>>>>> >> >> >
>>>>> >> >> > FIX Flyer, LLC
>>>>> >> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> >> >> > 1-888-349-3593
>>>>> >> >> > http://www.fixflyer.com
>>>>> >> >> >
>>>>> >> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> >> >> > RECIPIENT(S)
>>>>> >> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>> WHICH IS
>>>>> >> >> > PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE, COPYING,
>>>>> >> >> > DISTRIBUTION,
>>>>> >> >> > OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>>>>> INFORMATION
>>>>> >> >> > IS
>>>>> >> >> > RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT,
>>>>> >> >> > PLEASE
>>>>> >> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>> E-MAIL FROM
>>>>> >> >> > YOUR
>>>>> >> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>> >> >> >
>>>>> >> >> > _______________________________________________
>>>>> >> >> > Users mailing list
>>>>> >> >> > Users at ovirt.org
>>>>> >> >> > http://lists.ovirt.org/mailman/listinfo/users
>>>>> >> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> >
>>>>> >> > Charles Kozler
>>>>> >> > Vice President, IT Operations
>>>>> >> >
>>>>> >> > FIX Flyer, LLC
>>>>> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> >> > 1-888-349-3593
>>>>> >> > http://www.fixflyer.com
>>>>> >> >
>>>>> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> RECIPIENT(S)
>>>>> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION WHICH
>>>>> IS
>>>>> >> > PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE, COPYING,
>>>>> DISTRIBUTION,
>>>>> >> > OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>>>>> INFORMATION IS
>>>>> >> > RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT, PLEASE
>>>>> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS E-MAIL
>>>>> FROM YOUR
>>>>> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> > Charles Kozler
>>>>> > Vice President, IT Operations
>>>>> >
>>>>> > FIX Flyer, LLC
>>>>> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> > 1-888-349-3593
>>>>> > http://www.fixflyer.com
>>>>> >
>>>>> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE, COPYING,
>>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>>>>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>> >
>>>>> > _______________________________________________
>>>>> > Users mailing list
>>>>> > Users at ovirt.org
>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Charles Kozler*
>>>> *Vice President, IT Operations*
>>>>
>>>> FIX Flyer, LLC
>>>> 225 Broadway | Suite 1600 | New York, NY 10007
>>>> 1-888-349-3593
>>>> http://www.fixflyer.com <http://fixflyer.com>
>>>>
>>>> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE, COPYING,
>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>>>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>
>>>
>>>
>>>
>>> --
>>> Sandro Bonazzola
>>> Better technology. Faster innovation. Powered by community collaboration.
>>> See how it works at redhat.com
>>>
>>
>
>
> --
>
> *Charles Kozler*
> *Vice President, IT Operations*
>
> FIX Flyer, LLC
> 225 Broadway | Suite 1600 | New York, NY 10007
> 1-888-349-3593
> http://www.fixflyer.com <http://fixflyer.com>
>
> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
> WHICH IS PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE, COPYING,
> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160201/642bb95a/attachment-0001.html>
More information about the Users
mailing list