
Once upon a time, Victor Stinner <vstinner@redhat.com> said:
I wrote the tracemalloc module which is easy to use on Python 3.4 and newer. If you take tracemalloc snapshots while the memory usage is growing, and comparing snapshots don't show anything obvious, you can maybe suspect memory fragmentation. You're talking about 4 GB of memory usage, I don't think that memory fragmentation can explain it. Do you need my help to use tracemalloc?
My python is rudimentary at best (my programming has all been in other languages), but here's what I tried for starters: I added a USR2 signal handler to log the top users, but it doesn't seem to show anything growing like the RSS is actually doing. I made the following change: --- /usr/lib/python3.6/site-packages/vdsm/vdsmd.py.dist~ 2021-10-25 11:27:46.000000000 -0500 +++ /usr/lib/python3.6/site-packages/vdsm/vdsmd.py 2021-12-02 13:08:46.000000000 -0600 @@ -29,6 +29,7 @@ import syslog import resource import tempfile +import tracemalloc from logging import config as lconfig from vdsm import constants @@ -82,6 +83,14 @@ irs.spmStop( irs.getConnectedStoragePoolsList()['poollist'][0]) + def sigusr2Handler(signum, frame): + snapshot = tracemalloc.take_snapshot() + top_stats = snapshot.statistics('lineno') + lentry = 'Top memory users:\n' + for stat in top_stats[:10]: + lentry += ' ' + str(stat) + '\n' + log.info(lentry) + def sigalrmHandler(signum, frame): # Used in panic.panic() when shuting down logging, must not log. raise RuntimeError("Alarm timeout") @@ -89,6 +98,7 @@ sigutils.register() signal.signal(signal.SIGTERM, sigtermHandler) signal.signal(signal.SIGUSR1, sigusr1Handler) + signal.signal(signal.SIGUSR2, sigusr2Handler) signal.signal(signal.SIGALRM, sigalrmHandler) zombiereaper.registerSignalHandler() And also set a systemd override on vdsmd.service to add PYTHONTRACEMALLOC=25. That gets log entries like this: 2021-12-03 07:30:37,244-0600 INFO (MainThread) [vds] Top memory users: /usr/lib64/python3.6/site-packages/libvirt.py:442: size=34.0 MiB, count=630128, average=57 B <frozen importlib._bootstrap_external>:487: size=16.5 MiB, count=191152, average=90 B /usr/lib64/python3.6/json/decoder.py:355: size=14.6 MiB, count=142411, average=108 B /usr/lib/python3.6/site-packages/vdsm/host/stats.py:138: size=3678 KiB, count=22428, average=168 B <frozen importlib._bootstrap>:219: size=2027 KiB, count=17555, average=118 B /usr/lib/python3.6/site-packages/vdsm/api/vdsmapi.py:143: size=1724 KiB, count=23388, average=75 B /usr/lib/python3.6/site-packages/vdsm/virt/vmchannels.py:163: size=1502 KiB, count=24039, average=64 B /usr/lib64/python3.6/linecache.py:137: size=1383 KiB, count=13404, average=106 B /usr/lib/python3.6/site-packages/vdsm/utils.py:358: size=1305 KiB, count=8587, average=156 B /usr/lib64/python3.6/functools.py:67: size=1134 KiB, count=9624, average=121 B (vdsmd:92) But at the time I generated that, the RSS was over 340MB. Interestingly, when I sent the signal, the RSS jumped to over 430MB (but maybe my change did that?). -- Chris Adams <cma@cmadams.net>