Once upon a time, Victor Stinner <vstinner(a)redhat.com> said:
I wrote the tracemalloc module which is easy to use on Python 3.4
and
newer. If you take tracemalloc snapshots while the memory usage is
growing, and comparing snapshots don't show anything obvious, you can
maybe suspect memory fragmentation. You're talking about 4 GB of
memory usage, I don't think that memory fragmentation can explain it.
Do you need my help to use tracemalloc?
My python is rudimentary at best (my programming has all been in other
languages), but here's what I tried for starters: I added a USR2 signal
handler to log the top users, but it doesn't seem to show anything
growing like the RSS is actually doing.
I made the following change:
--- /usr/lib/python3.6/site-packages/vdsm/vdsmd.py.dist~ 2021-10-25 11:27:46.000000000
-0500
+++ /usr/lib/python3.6/site-packages/vdsm/vdsmd.py 2021-12-02 13:08:46.000000000 -0600
@@ -29,6 +29,7 @@
import syslog
import resource
import tempfile
+import tracemalloc
from logging import config as lconfig
from vdsm import constants
@@ -82,6 +83,14 @@
irs.spmStop(
irs.getConnectedStoragePoolsList()['poollist'][0])
+ def sigusr2Handler(signum, frame):
+ snapshot = tracemalloc.take_snapshot()
+ top_stats = snapshot.statistics('lineno')
+ lentry = 'Top memory users:\n'
+ for stat in top_stats[:10]:
+ lentry += ' ' + str(stat) + '\n'
+ log.info(lentry)
+
def sigalrmHandler(signum, frame):
# Used in panic.panic() when shuting down logging, must not log.
raise RuntimeError("Alarm timeout")
@@ -89,6 +98,7 @@
sigutils.register()
signal.signal(signal.SIGTERM, sigtermHandler)
signal.signal(signal.SIGUSR1, sigusr1Handler)
+ signal.signal(signal.SIGUSR2, sigusr2Handler)
signal.signal(signal.SIGALRM, sigalrmHandler)
zombiereaper.registerSignalHandler()
And also set a systemd override on vdsmd.service to add
PYTHONTRACEMALLOC=25. That gets log entries like this:
2021-12-03 07:30:37,244-0600 INFO (MainThread) [vds] Top memory users:
/usr/lib64/python3.6/site-packages/libvirt.py:442: size=34.0 MiB, count=630128,
average=57 B
<frozen importlib._bootstrap_external>:487: size=16.5 MiB, count=191152,
average=90 B
/usr/lib64/python3.6/json/decoder.py:355: size=14.6 MiB, count=142411, average=108 B
/usr/lib/python3.6/site-packages/vdsm/host/stats.py:138: size=3678 KiB, count=22428,
average=168 B
<frozen importlib._bootstrap>:219: size=2027 KiB, count=17555, average=118 B
/usr/lib/python3.6/site-packages/vdsm/api/vdsmapi.py:143: size=1724 KiB, count=23388,
average=75 B
/usr/lib/python3.6/site-packages/vdsm/virt/vmchannels.py:163: size=1502 KiB,
count=24039, average=64 B
/usr/lib64/python3.6/linecache.py:137: size=1383 KiB, count=13404, average=106 B
/usr/lib/python3.6/site-packages/vdsm/utils.py:358: size=1305 KiB, count=8587,
average=156 B
/usr/lib64/python3.6/functools.py:67: size=1134 KiB, count=9624, average=121 B
(vdsmd:92)
But at the time I generated that, the RSS was over 340MB.
Interestingly, when I sent the signal, the RSS jumped to over 430MB (but
maybe my change did that?).
--
Chris Adams <cma(a)cmadams.net>