
I've seen vdsmd leak memory (RSS increasing) for a while (brought it up on the lists and opened a BZ ticket), and never gotten anywhere with diagnosing or resolving it. I reinstalled my dev setup Friday with up-to-date CentOS 7 (minimal install) and oVirt 4.3, with a hosted engine on iSCSI (multipath if it matters). In just 3 days, vdsmd on the host with the engine has gone up to an RSS of 481 MB. It just continues to steadily increase. Watching with a script, I see (this is VmRSS from /proc/$(pidof -x vdsmd)/status): 12:26:32.892 482076 +20 12:26:35.300 482096 +20 12:26:38.927 482112 +16 12:26:40.034 482128 +16 12:26:47.534 482132 +4 12:26:48.887 482144 +12 12:26:49.133 482156 +12 12:26:50.955 482172 +16 12:26:53.062 482176 +4 12:26:53.092 482204 +28 12:26:59.065 482212 +8 12:26:59.075 482228 +16 12:26:59.361 482244 +16 12:27:03.131 482252 +8 12:27:07.370 482256 +4 12:27:10.091 482272 +16 12:27:13.205 482296 +24 12:27:18.770 482308 +12 12:27:20.437 482332 +24 12:27:23.313 482340 +8 12:27:23.324 482364 +24 12:27:26.667 482372 +8 12:27:26.687 482376 +4 12:27:28.873 482388 +12 12:27:28.883 482392 +4 12:27:28.976 482396 +4 12:27:29.190 482408 +12 That's an increase of 352 kB in a minute. There's got to be some way to diagnose this, but I don't know python well enough. -- Chris Adams <cma@cmadams.net>

Il giorno lun 16 dic 2019 alle ore 19:41 Chris Adams <cma@cmadams.net> ha scritto:
I've seen vdsmd leak memory (RSS increasing) for a while (brought it up on the lists and opened a BZ ticket), and never gotten anywhere with diagnosing or resolving it. I reinstalled my dev setup Friday with up-to-date CentOS 7 (minimal install) and oVirt 4.3, with a hosted engine on iSCSI (multipath if it matters).
Adding +Martin Perina <mperina@redhat.com> and +Milan Zamazal <mzamazal@redhat.com> for awareness
In just 3 days, vdsmd on the host with the engine has gone up to an RSS of 481 MB. It just continues to steadily increase. Watching with a script, I see (this is VmRSS from /proc/$(pidof -x vdsmd)/status):
12:26:32.892 482076 +20 12:26:35.300 482096 +20 12:26:38.927 482112 +16 12:26:40.034 482128 +16 12:26:47.534 482132 +4 12:26:48.887 482144 +12 12:26:49.133 482156 +12 12:26:50.955 482172 +16 12:26:53.062 482176 +4 12:26:53.092 482204 +28 12:26:59.065 482212 +8 12:26:59.075 482228 +16 12:26:59.361 482244 +16 12:27:03.131 482252 +8 12:27:07.370 482256 +4 12:27:10.091 482272 +16 12:27:13.205 482296 +24 12:27:18.770 482308 +12 12:27:20.437 482332 +24 12:27:23.313 482340 +8 12:27:23.324 482364 +24 12:27:26.667 482372 +8 12:27:26.687 482376 +4 12:27:28.873 482388 +12 12:27:28.883 482392 +4 12:27:28.976 482396 +4 12:27:29.190 482408 +12
That's an increase of 352 kB in a minute.
There's got to be some way to diagnose this, but I don't know python well enough.
-- Chris Adams <cma@cmadams.net> _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/KO5SEPAZMLBWSB...
-- Sandro Bonazzola MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://www.redhat.com/>*Red Hat respects your work life balance. Therefore there is no need to answer this email out of your office hours. <https://mojo.redhat.com/docs/DOC-1199578>*

Once upon a time, Sandro Bonazzola <sbonazzo@redhat.com> said:
Il giorno lun 16 dic 2019 alle ore 19:41 Chris Adams <cma@cmadams.net> ha scritto:
I've seen vdsmd leak memory (RSS increasing) for a while (brought it up on the lists and opened a BZ ticket), and never gotten anywhere with diagnosing or resolving it. I reinstalled my dev setup Friday with up-to-date CentOS 7 (minimal install) and oVirt 4.3, with a hosted engine on iSCSI (multipath if it matters).
Adding +Martin Perina <mperina@redhat.com> and +Milan Zamazal <mzamazal@redhat.com> for awareness
Is there any possibility of someone helping me look at this? I'm seeing the issue much worse with 4.3 - a cluster I updated to 4.3.7 two months ago has a host (where the hosted engine was running) where vdsmd got to over 20G RSS. -- Chris Adams <cma@cmadams.net>

On Thu, Jan 30, 2020 at 2:45 PM Chris Adams <cma@cmadams.net> wrote:
Once upon a time, Sandro Bonazzola <sbonazzo@redhat.com> said:
Il giorno lun 16 dic 2019 alle ore 19:41 Chris Adams <cma@cmadams.net> ha scritto:
I've seen vdsmd leak memory (RSS increasing) for a while (brought it up on the lists and opened a BZ ticket), and never gotten anywhere with diagnosing or resolving it. I reinstalled my dev setup Friday with up-to-date CentOS 7 (minimal install) and oVirt 4.3, with a hosted engine on iSCSI (multipath if it matters).
Adding +Martin Perina <mperina@redhat.com> and +Milan Zamazal <mzamazal@redhat.com> for awareness
Is there any possibility of someone helping me look at this? I'm seeing the issue much worse with 4.3 - a cluster I updated to 4.3.7 two months ago has a host (where the hosted engine was running) where vdsmd got to over 20G RSS.
Marcin, any suggestions how to investigate it?
-- Chris Adams <cma@cmadams.net>
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Hi, On 1/30/20 2:47 PM, Martin Perina wrote:
On Thu, Jan 30, 2020 at 2:45 PM Chris Adams <cma@cmadams.net <mailto:cma@cmadams.net>> wrote:
Once upon a time, Sandro Bonazzola <sbonazzo@redhat.com <mailto:sbonazzo@redhat.com>> said: > Il giorno lun 16 dic 2019 alle ore 19:41 Chris Adams <cma@cmadams.net <mailto:cma@cmadams.net>> ha > scritto: > > > I've seen vdsmd leak memory (RSS increasing) for a while (brought it up > > on the lists and opened a BZ ticket), and never gotten anywhere with > > diagnosing or resolving it. I reinstalled my dev setup Friday with > > up-to-date CentOS 7 (minimal install) and oVirt 4.3, with a hosted > > engine on iSCSI (multipath if it matters). > > > > Adding +Martin Perina <mperina@redhat.com <mailto:mperina@redhat.com>> and +Milan Zamazal > <mzamazal@redhat.com <mailto:mzamazal@redhat.com>> for awareness
Is there any possibility of someone helping me look at this? I'm seeing the issue much worse with 4.3 - a cluster I updated to 4.3.7 two months ago has a host (where the hosted engine was running) where vdsmd got to over 20G RSS.
Marcin, any suggestions how to investigate it?
Python mem profiling is hard... I already tackled the VDSM memory leak problem once. VDSM was growing, but not at a scale that Chris is describing. Tried out different tools, but got to a point, where enforcing periodic garbage collecting made VDSM mem usage constant, so the conclusion made there was no mem leaks. Chris, if I understood you correctly, a single machine suffices to reproduce your issue? One that acts as a host with hosted engine on it + iscsi storage? If so, maybe I/you could construct a VM with a reproducible environment and share? Having something like this would make investigating this issue much more reliable.
-- Chris Adams <cma@cmadams.net <mailto:cma@cmadams.net>>
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Once upon a time, Marcin Sobczyk <msobczyk@redhat.com> said:
Python mem profiling is hard... I already tackled the VDSM memory leak problem once. VDSM was growing, but not at a scale that Chris is describing. Tried out different tools, but got to a point, where enforcing periodic garbage collecting made VDSM mem usage constant, so the conclusion made there was no mem leaks.
Yeah, I gave it a try myself (despite not being very good at Python; been a system admin too long so I'm all about perl :) ), and didn't get anywhere.
Chris, if I understood you correctly, a single machine suffices to reproduce your issue? One that acts as a host with hosted engine on it + iscsi storage? If so, maybe I/you could construct a VM with a reproducible environment and share? Having something like this would make investigating this issue much more reliable.
I don't think I've tried to reproduce it on a single machine setup. I think I've always gone ahead and added at least a second machine (even if there was no VM other than the engine); stopping with one to see if it happens is a good idea. My dev cluster is actually down at the moment (a dead UPS and nearby road/bridge construction are a bad combination); I'll get it back online (and on a better UPS hopefully!) today. By "construct a VM" - do you mean building a setup inside a VM (with nested virtualization), so everything is local? My dev cluster iSCSI SAN is targetcli on Linux (the prod setups are all EqualLogics), so I know how to set that up. Thank you for your help. -- Chris Adams <cma@cmadams.net>

Once upon a time, Marcin Sobczyk <msobczyk@redhat.com> said:
Python mem profiling is hard... I already tackled the VDSM memory leak problem once. VDSM was growing, but not at a scale that Chris is describing. Tried out different tools, but got to a point, where enforcing periodic garbage collecting made VDSM mem usage constant, so the conclusion made there was no mem leaks. Yeah, I gave it a try myself (despite not being very good at Python; been a system admin too long so I'm all about perl :) ), and didn't get anywhere.
Chris, if I understood you correctly, a single machine suffices to reproduce your issue? One that acts as a host with hosted engine on it + iscsi storage? If so, maybe I/you could construct a VM with a reproducible environment and share? Having something like this would make investigating this issue much more reliable. I don't think I've tried to reproduce it on a single machine setup. I think I've always gone ahead and added at least a second machine (even if there was no VM other than the engine); stopping with one to see if it happens is a good idea. My dev cluster is actually down at the moment (a dead UPS and nearby road/bridge construction are a bad combination); I'll get it back online (and on a better UPS hopefully!) today.
By "construct a VM" - do you mean building a setup inside a VM (with nested virtualization), so everything is local? My dev cluster iSCSI SAN is targetcli on Linux (the prod setups are all EqualLogics), so I know how to set that up. Yes, basically - the simpler the better. If it takes more than one VM
On 2/3/20 2:46 PM, Chris Adams wrote: than that's fine too. The no 1 priority is reproducibility. If you can set up an env like that with virt-manager and share the XMLs and disks with me then I will have a solid foundation to tackle this issue.
Thank you for your help.
participants (4)
-
Chris Adams
-
Marcin Sobczyk
-
Martin Perina
-
Sandro Bonazzola