Hallo Dan,
I just opened a BZ aginst this [1]. Please let me know if I can be of
further assistance. I stopped the cron script for now, so vdsm can 'grow
nicely' (about 91.5 MB per 6hrs)
Cheers,
[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1147148
On 24.09.2014 19:20, Dan Kenigsberg wrote:
> On 01.09.2014 18:08, Dan Kenigsberg wrote:
>> On Mon, Sep 01, 2014 at 03:30:53PM +0000, Daniel Helgenberger wrote:
>>> Hello,
>>>
>>> in my LAB cluster I run into OOM conditions frequently because of a huge
>>> VDSM process. The memory stats from my nodes right now:
>>>
>>> Node A; running one VM:
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 3465 vdsm 0 -20 18.6g 9.8g 8244 S 32.1 50.3 27265:21 vdsm
>>> 7439 qemu 20 0 5641m 4.1g 4280 S 22.9 20.9 12737:08 qemu-kvm
>>> 2912 root 15 -5 2710m 35m 5968 S 0.0 0.2 0:04.76 supervdsmServer
>>>
>>> Node B, running 3 VMs including HosedEngine:
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 9079 vdsm 0 -20 9.9g 5.0g 7496 S 49.7 43.0 11858:06 vdsm
>>> 3347 qemu 20 0 7749m 1.8g 5264 S 4.3 15.8 3:25.71 qemu-kvm
>>> 18463 qemu 20 0 3865m 415m 5516 R 1.6 3.5 359:15.24 qemu-kvm
>>> 11755 qemu 20 0 3873m 276m 5336 S 80.5 2.3 21639:39 qemu-kvm
>>>
>>> Basically VDSM consumes more then all my VMs.
>>>
>>> I thought of VDSM as a 'supervisor' process for qemu-kvm?
>>>
>>> I attached recend vdsm logs as well as a screen shot.
>> Thanks for your report. It sounds like
>>
>> Bug 1130045 - Very high memory consumption
>>
>> which we believe is due to python-ioprocess.
On Wed, Sep 24, 2014 at 03:39:25PM +0000, Daniel Helgenberger wrote:
> Hi Dan,
>
> just to get this right, you yourself pointed me to the BZ. I was looking
> at the duplicate, since the metadata tags 3.4. Sorry my lack of
> knowlage, I really have no idea whatever there is an ioprocess python
> binding in 3.4 or not - I just see vdsmd resident size growing in 3.4.
> The top output below was from 3.4.3; I just upgraded to 3.4.4. But
> clearly vdsmd should not use 10GB RAM?
I'm sorry to have mislead you. The bug I refered to was indeed due to
ioprocess, and caused a very dramatic memory leak in 3.5. We have yet
another memory leak in 3.5.0, when managing gluster blocks
Bug 1142647 - supervdsm leaks memory when using glusterfs
3.4.z does not use ioprocess, and do not have that gluster bug, so you
are seeing completely different and much older.
These leaks are not so easy to debug - but they are important. I'd love
if you open a BZ about it. Please specify the rate of the leak, when
does it happen (when a host has VMs? when a host is polled by Engine? On
nfs? iscsi?
What's `lsof -p pid-of-fat-vdsm`?
I think that Francesco has some debug patches to help nail it down - he
can provide smarter questions.
Dan.
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767