On Mon, Feb 8, 2021 at 1:22 PM Yedidyah Bar David <didi(a)redhat.com> wrote:
On Mon, Feb 8, 2021 at 9:05 AM Yedidyah Bar David <didi(a)redhat.com> wrote:
>
> Hi all,
>
> I ran a loop of [1] (from [2]). The loop succeeded for ~ 380
> iterations, then failed with 'Too many open files'. First failure was:
>
> 2021-02-08 02:21:15,702+0100 ERROR (jsonrpc/4) [storage.HSM] Could not
> connect to storageServer (hsm:2446)
> Traceback (most recent call last):
> File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line
> 2443, in connectStorageServer
> conObj.connect()
> File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py",
> line 449, in connect
> return self._mountCon.connect()
> File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py",
> line 171, in connect
> self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
> File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line
> 210, in mount
> cgroup=cgroup)
> File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
> line 56, in __call__
> return callMethod()
> File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
> line 54, in <lambda>
> **kwargs)
> File "<string>", line 2, in mount
> File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772,
> in _callmethod
> raise convert_to_error(kind, result)
> OSError: [Errno 24] Too many open files
Maybe we have a fd leak in supervdsmd?
We know that there a small memory leak in multiprocessing, but not
about any fd leak.
> But obviously, once it did, it continued failing for this reason
on
> many later operations.
Smells like fd leak.
> Is this considered a bug?
Generally yes, but the question is if this happens during
real world scenarios.
Do we actively try to prevent such cases?
No, we don't have any code monitoring number of open fds
in runtime, or tests checking this in system tests.
We do have health monitor in vdsm:
https://github.com/oVirt/vdsm/blob/master/lib/vdsm/health.py
It can be useful to log monitor also the number of fds (.e.g ls -lh
/proc/pid/fd).
We don't have any monitor in supervdsm, it can be useful to add
one. supervdsm is relatively simple, but the problem is it runs
possibly complex code from vdsm, so "safe" changes in vdsm can
cause trouble when the code is run by supervdsm.
So should I open one and attach logs? Or it can be considered a
"corner
> case"?
Yes, please open a bug, and include the info you have.
Please include output of "ls -lh /proc/pid/fd" for both vdsm
and supervdsm when you reproduce the issue, or during the
long test if you cannot reproduce.
> Using vdsm-4.40.50.3-37.git7883b3b43.el8.x86_64 from
> ost-images-el8-he-installed-1-202102021144.x86_64 .
>
> I can also let access to the machine(s) if needed, for now.
Sorry, now cleaned this env. Can try to reproduce if there is interest.
It will help you can reproduce.
Nir