On Mon, Feb 1, 2016 at 6:33 PM, Charles Kozler <charles(a)fixflyer.com> wrote:
So what about the bug that I hit for vdsm as listed above by Nir?
Will I
have that patch to avoid the memory leak or no? Upgrading an entire node to
centos 7 is not actually feasible and was previously outlined above that I
just needed to upgrade to ovirt 3.6 and no mention of OS change ...
On Feb 1, 2016 12:30 PM, "Simone Tiraboschi"
<stirabos(a)redhat.com> wrote:
>
>
> On Mon, Feb 1, 2016 at 5:40 PM, Charles Kozler <charles(a)fixflyer.com>
> wrote:
>
>> Sandro / Nir -
>>
>> I followed your steps plus
>>
>>
http://www.ovirt.org/OVirt_3.6_Release_Notes#Fedora_.2F_CentOS_.2F_RHEL
>>
>> Engine upgraded fine but then when I got to upgrading a node I did:
>>
>> $ yum install
>>
http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
>> $ yum update -y
>>
>> And then rebooted the node. I noticed libvirt was updated by a .1
>> release number but vdsm (where the memory leak issue I thought was?) was
>> not upgraded. In fact, very little of ovirt packages on the node were
>> noticeably not updated
>>
>>
> We are not building vdsm for el6 in 3.6, you need also to upgrade to el7
> if you want that.
>
>
>> Updated node received the following updated packages during the install:
>>
>>
http://pastebin.ca/3362714
>>
>> Note specifically the only packages updated via the ovirt3.6 repository
>> was ioprocess, otopi, ovirt-engine-sdk-python, ovirt-host-deploy,
>> ovirt-release36, and python-ioprocess. I had expected to see some packages
>> like vdsm and the likes updated - or was this not the case?
>>
>> Upgraded node:
>>
>> [compute[root@node02 yum.repos.d]$ rpm -qa | grep -i vdsm
>> vdsm-4.16.30-0.el6.x86_64
>> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
>> vdsm-cli-4.16.30-0.el6.noarch
>> vdsm-yajsonrpc-4.16.30-0.el6.noarch
>> vdsm-jsonrpc-4.16.30-0.el6.noarch
>> vdsm-xmlrpc-4.16.30-0.el6.noarch
>> vdsm-python-4.16.30-0.el6.noarch
>>
>> Nonupgraded node
>>
>> [compute[root@node01 ~]$ rpm -qa | grep -i vdsm
>> vdsm-cli-4.16.30-0.el6.noarch
>> vdsm-jsonrpc-4.16.30-0.el6.noarch
>> vdsm-python-zombiereaper-4.16.30-0.el6.noarch
>> vdsm-xmlrpc-4.16.30-0.el6.noarch
>> vdsm-yajsonrpc-4.16.30-0.el6.noarch
>> vdsm-4.16.30-0.el6.x86_64
>> vdsm-python-4.16.30-0.el6.noarch
>>
>> Also, the docs stated that the engine VM would migrate to the freshly
>> upgraded node since it would have a higher number but it did not
>>
>> So I cant really confirm whether or not my issue will be resolved? Or
>> that if the node was actually updated properly?
>>
>> Please advise on how to confirm
>>
>> Thank you!
>>
>> On Sat, Jan 23, 2016 at 12:55 AM, Charles Kozler <charles(a)fixflyer.com>
>> wrote:
>>
>>> Thanks Sandro. Should clarify my storage is external on a redundant
>>> SAN. The steps I was concerned about was the actual upgrade. I tried to
>>> upgrade before and it brought my entire stack crumbling down so I'm
>>> hesitant. This bug seems like a huge bug that should at least somehow
>>> backported if at all possible because, to me, it renders the entire 3.5.6
>>> branch unusable as no VMs can be deployed since OOM will eventually kill
>>> them. In any case that's just my opinion and I'm a new user to ovirt.
The
>>> docs I followed originally got me going how I need and somehow didn't
work
>>> for 3.6 in the same fashion so naturally I'm hesitant to upgrade but
>>> clearly have no option if I want to continue my infrastructure on ovirt.
>>> Thank you again for taking the time out to assist me, I truly appreciate
>>> it. I will try an upgrade next week and pray it all goes well :-)
>>> On Jan 23, 2016 12:40 AM, "Sandro Bonazzola"
<sbonazzo(a)redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jan 22, 2016 at 10:53 PM, Charles Kozler
<charles(a)fixflyer.com
>>>> > wrote:
>>>>
>>>>> Sandro -
>>>>>
>>>>> Do you have available documentation that can support upgrading self
>>>>> hosted? I followed this
>>>>>
http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/
>>>>>
>>>>> Would it be as easy as installing the RPM and then running yum
>>>>> upgrade?
>>>>>
>>>>>
>>>> Note that mentioned article describes an unsupported hyperconverged
>>>> setup running NFS over Gluster.
>>>> That said,
>>>> 1) put the hosted-engine storage domain into global maintenance mode
>>>> 2) upgrade the engine VM
>>>> 3) select the first host to upgrade and put it under maintenance from
>>>> the engine, wait for the engine vm to migrate if needed.
>>>> 4) yum upgrade the first host and wait until ovirt-ha-agent completes
>>>> 5) exit global and local maintenance mode
>>>> 6) repeat 3-5 on all the other hosts
>>>> 7) once all hosts are updated you can increase the cluster
>>>> compatibility level to 3.6. At this point the engine will trigger the
>>>> auto-import of the hosted-engine storage domain.
>>>>
>>>> Simone, Roy, can you confirm above steps? Maybe also you can update
>>>>
http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine
>>>>
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>> On Fri, Jan 22, 2016 at 4:42 PM, Sandro Bonazzola <
>>>>> sbonazzo(a)redhat.com> wrote:
>>>>>
>>>>>>
>>>>>> Il 22/Gen/2016 22:31, "Charles Kozler"
<charles(a)fixflyer.com> ha
>>>>>> scritto:
>>>>>> >
>>>>>> > Hi Nir -
>>>>>> >
>>>>>> > do you have a release target date for 3.5.8? Any estimate
would
>>>>>> help.
>>>>>> >
>>>>>>
>>>>>> There won't be any supported release after 3.5.6. Please
update to
>>>>>> 3.6.2 next week
>>>>>>
>>>>>> > If its not VDSM, what is it exactly? Sorry, I understood
from the
>>>>>> ticket it was something inside vdsm, was I mistaken?
>>>>>> >
>>>>>> > CentOS 6 is the servers. 6.7 to be exact
>>>>>> >
>>>>>> > I have done all forms of flushing that I can (page cache,
inodes,
>>>>>> dentry's, etc) and as well moved VM's around to other
nodes and nothing
>>>>>> changes the memory. How can I find the leak? Where is the leak?
RES shows
>>>>>> the following of which, the totals dont add up to 20GB
>>>>>> >
>>>>>> > PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+
>>>>>> COMMAND
>>>>>>
>>>>>> > 19044 qemu 20 0 8876m 4.0g 5680 S 3.6 12.9
1571:44
>>>>>> qemu-kvm
>>>>>>
>>>>>> > 26143 qemu 20 0 5094m 1.1g 5624 S 9.2 3.7
6012:12
>>>>>> qemu-kvm
>>>>>>
>>>>>> > 5837 root 0 -20 964m 624m 3664 S 0.0 2.0
85:22.09
>>>>>> glusterfs
>>>>>>
>>>>>> > 14328 root 0 -20 635m 169m 3384 S 0.0 0.5
43:15.23
>>>>>> glusterfs
>>>>>>
>>>>>> > 5134 vdsm 0 -20 4368m 111m 10m S 5.9 0.3
3710:50 vdsm
>>>>>>
>>>>>>
>>>>>> > 4095 root 15 -5 727m 43m 10m S 0.0 0.1
0:02.00
>>>>>> supervdsmServer
>>>>>> >
>>>>>> > 4.0G + 1.1G + 624M + 169 + 111M + 43M = ~7GB
>>>>>> >
>>>>>> > This was top sorted by RES from highest to lowest
>>>>>> >
>>>>>> > At that point I wouldnt know where else to look except slab
/
>>>>>> kernel structures. Of which slab shows:
>>>>>> >
>>>>>> > [compute[root@node1 ~]$ cat /proc/meminfo | grep -i slab
>>>>>> > Slab: 2549748 kB
>>>>>> >
>>>>>> > So roughly 2-3GB. Adding that to the other use of 7GB we
have
>>>>>> still about 10GB unaccounted for
>>>>>> >
>>>>>> > On Fri, Jan 22, 2016 at 4:24 PM, Nir Soffer
<nsoffer(a)redhat.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> On Fri, Jan 22, 2016 at 11:08 PM, Charles Kozler <
>>>>>> charles(a)fixflyer.com> wrote:
>>>>>> >> > Hi Nir -
>>>>>> >> >
>>>>>> >> > Thanks for getting back to me. Will the patch to
3.6 be
>>>>>> backported to 3.5?
>>>>>> >>
>>>>>> >> We plan to include them in 3.5.8.
>>>>>> >>
>>>>>> >> > As you can tell from the images, it takes days and
days for it
>>>>>> to increase
>>>>>> >> > over time. I also wasnt sure if that was the right
bug because
>>>>>> VDSM memory
>>>>>> >> > shows normal from top ...
>>>>>> >> >
>>>>>> >> > PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+
>>>>>> COMMAND
>>>>>> >> > 5134 vdsm 0 -20 4368m 111m 10m S 2.0 0.3
3709:28
>>>>>> vdsm
>>>>>> >>
>>>>>> >> As you wrote, this issue is not related to vdsm.
>>>>>> >>
>>>>>> >> >
>>>>>> >> > Res is only 111M. This is from node1 which is
showing currently
>>>>>> 20GB of 32GB
>>>>>> >> > used with only 2 VMs running on it - 1 with 4G and
another with
>>>>>> ~1 GB of RAM
>>>>>> >> > configured
>>>>>> >> >
>>>>>> >> > The images are from nagios and the value here is a
direct
>>>>>> correlation to
>>>>>> >> > what you would see in the free command output. See
below from
>>>>>> an example of
>>>>>> >> > node 1 and node 2
>>>>>> >> >
>>>>>> >> > [compute[root@node1 ~]$ free
>>>>>> >> > total used free shared
buffers
>>>>>> cached
>>>>>> >> > Mem: 32765316 20318156 12447160 252
30884
>>>>>> 628948
>>>>>> >> > -/+ buffers/cache: 19658324 13106992
>>>>>> >> > Swap: 19247100 0 19247100
>>>>>> >> > [compute[root@node1 ~]$ free -m
>>>>>> >> > total used free shared
buffers
>>>>>> cached
>>>>>> >> > Mem: 31997 19843 12153 0
30
>>>>>> 614
>>>>>> >> > -/+ buffers/cache: 19199 12798
>>>>>> >> > Swap: 18795 0 18795
>>>>>> >> >
>>>>>> >> > And its correlated image
http://i.imgur.com/PZLEgyx.png (~19GB
>>>>>> used)
>>>>>> >> >
>>>>>> >> > And as a control, node 2 that I just restarted
today
>>>>>> >> >
>>>>>> >> > [compute[root@node2 ~]$ free
>>>>>> >> > total used free shared
buffers
>>>>>> cached
>>>>>> >> > Mem: 32765316 1815324 30949992 212
35784
>>>>>> 717320
>>>>>> >> > -/+ buffers/cache: 1062220 31703096
>>>>>> >> > Swap: 19247100 0 19247100
>>>>>> >>
>>>>>> >> Is this rhel/centos 6?
>>>>>> >>
>>>>>> >> > [compute[root@node2 ~]$ free -m
>>>>>> >> > total used free shared
buffers
>>>>>> cached
>>>>>> >> > Mem: 31997 1772 30225 0
34
>>>>>> 700
>>>>>> >> > -/+ buffers/cache: 1036 30960
>>>>>> >> > Swap: 18795 0 18795
>>>>>> >> >
>>>>>> >> > And its correlated image
http://i.imgur.com/8ldPVqY.png (~2GB
>>>>>> used). Note
>>>>>> >> > how 1772 in the image is exactly what is registered
under
>>>>>> 'used' in free
>>>>>> >> > command
>>>>>> >>
>>>>>> >> I guess you should start looking at the processes
running on
>>>>>> these nodes.
>>>>>> >>
>>>>>> >> Maybe try to collect memory usage per process using ps?
>>>>>> >>
>>>>>> >> >
>>>>>> >> > On Fri, Jan 22, 2016 at 3:59 PM, Nir Soffer
<nsoffer(a)redhat.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> On Fri, Jan 22, 2016 at 9:25 PM, Charles Kozler
<
>>>>>> charles(a)fixflyer.com>
>>>>>> >> >> wrote:
>>>>>> >> >> > Here is a screenshot of my three nodes and
their increased
>>>>>> memory usage
>>>>>> >> >> > over
>>>>>> >> >> > 30 days. Note that node #2 had 1 single VM
that had 4GB of
>>>>>> RAM assigned
>>>>>> >> >> > to
>>>>>> >> >> > it. I had since shut it down and saw no
memory reclamation
>>>>>> occur.
>>>>>> >> >> > Further, I
>>>>>> >> >> > flushed page caches and inodes and ran
'sync'. I tried
>>>>>> everything but
>>>>>> >> >> > nothing brought the memory usage down.
vdsm was low too
>>>>>> (couple hundred
>>>>>> >> >> > MB)
>>>>>> >> >>
>>>>>> >> >> Note that there is an old leak in vdsm, will be
fixed in next
>>>>>> 3.6 build:
>>>>>> >> >>
https://bugzilla.redhat.com/1269424
>>>>>> >> >>
>>>>>> >> >> > and there was no qemu-kvm process running
so I'm at a loss
>>>>>> >> >> >
>>>>>> >> >> >
http://imgur.com/a/aFPcK
>>>>>> >> >> >
>>>>>> >> >> > Please advise on what I can do to debug
this. Note I have
>>>>>> restarted node
>>>>>> >> >> > 2
>>>>>> >> >> > (which is why you see the drop) to see if
it raises in
>>>>>> memory use over
>>>>>> >> >> > tim
>>>>>> >> >> > even with no VM's running
>>>>>> >> >>
>>>>>> >> >> Not sure what is "memory" that you
show in the graphs.
>>>>>> Theoretically this
>>>>>> >> >> may be
>>>>>> >> >> normal memory usage, Linux using free memory
for the buffer
>>>>>> cache.
>>>>>> >> >>
>>>>>> >> >> Can you instead show the output of
"free", during one day,
>>>>>> maybe run once
>>>>>> >> >> per hour?
>>>>>> >> >>
>>>>>> >> >> You may also like to install sysstat for
collecting and
>>>>>> monitoring
>>>>>> >> >> resources usage.
>>>>>> >> >>
>>>>>> >> >> >
>>>>>> >> >> > [compute[root@node2 log]$ rpm -qa | grep
-i ovirt
>>>>>> >> >> > libgovirt-0.3.2-1.el6.x86_64
>>>>>> >> >> > ovirt-release35-006-1.noarch
>>>>>> >> >> > ovirt-hosted-engine-ha-1.2.8-1.el6.noarch
>>>>>> >> >> >
ovirt-hosted-engine-setup-1.2.6.1-1.el6.noarch
>>>>>> >> >> >
ovirt-engine-sdk-python-3.5.6.0-1.el6.noarch
>>>>>> >> >> > ovirt-host-deploy-1.3.2-1.el6.noarch
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > --
>>>>>> >> >> >
>>>>>> >> >> > Charles Kozler
>>>>>> >> >> > Vice President, IT Operations
>>>>>> >> >> >
>>>>>> >> >> > FIX Flyer, LLC
>>>>>> >> >> > 225 Broadway | Suite 1600 | New York, NY
10007
>>>>>> >> >> > 1-888-349-3593
>>>>>> >> >> >
http://www.fixflyer.com
>>>>>> >> >> >
>>>>>> >> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
ONLY FOR THE
>>>>>> INTENDED
>>>>>> >> >> > RECIPIENT(S)
>>>>>> >> >> > OF THE TRANSMISSION, AND CONTAINS
CONFIDENTIAL INFORMATION
>>>>>> WHICH IS
>>>>>> >> >> > PROPRIETARY TO FIX FLYER LLC. ANY
UNAUTHORIZED USE, COPYING,
>>>>>> >> >> > DISTRIBUTION,
>>>>>> >> >> > OR DISSEMINATION IS STRICTLY PROHIBITED.
ALL RIGHTS TO THIS
>>>>>> INFORMATION
>>>>>> >> >> > IS
>>>>>> >> >> > RESERVED BY FIX FLYER LLC. IF YOU ARE NOT
THE INTENDED
>>>>>> RECIPIENT,
>>>>>> >> >> > PLEASE
>>>>>> >> >> > CONTACT THE SENDER BY REPLY E-MAIL AND
PLEASE DELETE THIS
>>>>>> E-MAIL FROM
>>>>>> >> >> > YOUR
>>>>>> >> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>>> >> >> >
>>>>>> >> >> >
_______________________________________________
>>>>>> >> >> > Users mailing list
>>>>>> >> >> > Users(a)ovirt.org
>>>>>> >> >> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> >
>>>>>> >> > Charles Kozler
>>>>>> >> > Vice President, IT Operations
>>>>>> >> >
>>>>>> >> > FIX Flyer, LLC
>>>>>> >> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>>> >> > 1-888-349-3593
>>>>>> >> >
http://www.fixflyer.com
>>>>>> >> >
>>>>>> >> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR
THE INTENDED
>>>>>> RECIPIENT(S)
>>>>>> >> > OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL
INFORMATION
>>>>>> WHICH IS
>>>>>> >> > PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED
USE, COPYING,
>>>>>> DISTRIBUTION,
>>>>>> >> > OR DISSEMINATION IS STRICTLY PROHIBITED. ALL
RIGHTS TO THIS
>>>>>> INFORMATION IS
>>>>>> >> > RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE
INTENDED
>>>>>> RECIPIENT, PLEASE
>>>>>> >> > CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE
DELETE THIS
>>>>>> E-MAIL FROM YOUR
>>>>>> >> > SYSTEM AND DESTROY ANY COPIES.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > Charles Kozler
>>>>>> > Vice President, IT Operations
>>>>>> >
>>>>>> > FIX Flyer, LLC
>>>>>> > 225 Broadway | Suite 1600 | New York, NY 10007
>>>>>> > 1-888-349-3593
>>>>>> >
http://www.fixflyer.com
>>>>>> >
>>>>>> > NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE
INTENDED
>>>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL
INFORMATION
>>>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE,
COPYING,
>>>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL
RIGHTS TO THIS
>>>>>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE
INTENDED
>>>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE
DELETE THIS
>>>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Users mailing list
>>>>>> > Users(a)ovirt.org
>>>>>> >
http://lists.ovirt.org/mailman/listinfo/users
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Charles Kozler*
>>>>> *Vice President, IT Operations*
>>>>>
>>>>> FIX Flyer, LLC
>>>>> 225 Broadway | Suite 1600 | New York, NY 10007
>>>>> 1-888-349-3593
>>>>>
http://www.fixflyer.com <
http://fixflyer.com>
>>>>>
>>>>> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>>>>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL
INFORMATION
>>>>> WHICH IS PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE,
COPYING,
>>>>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO
THIS
>>>>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE
INTENDED
>>>>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE
DELETE THIS
>>>>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sandro Bonazzola
>>>> Better technology. Faster innovation. Powered by community
>>>> collaboration.
>>>> See how it works at
redhat.com
>>>>
>>>
>>
>>
>> --
>>
>> *Charles Kozler*
>> *Vice President, IT Operations*
>>
>> FIX Flyer, LLC
>> 225 Broadway | Suite 1600 | New York, NY 10007
>> 1-888-349-3593
>>
http://www.fixflyer.com <
http://fixflyer.com>
>>
>> NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT ONLY FOR THE INTENDED
>> RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION
>> WHICH IS PROPRIETARY TO FIX FLYER LLC. ANY UNAUTHORIZED USE, COPYING,
>> DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS
>> INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED
>> RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY E-MAIL AND PLEASE DELETE THIS
>> E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES.
>>
>
>