[ovirt-devel] Re: "Too many open files" in vdsm.log after 380 migrations

Monday, 8 February 2021

I didn't check it yet, but maybe this customer bug is related to this
discussion:
https://bugzilla.redhat.com/show_bug.cgi?id=1666553

From vdsm.log:
2019-01-15 13:41:11,162+0000 INFO  (periodic/2) [vdsm.api] FINISH
multipath_health return={} from=internal,
task_id=97c359aa-002e-46d8-9fc5-2477db0909b4 (api:52)
2019-01-15 13:41:12,210+0000 WARN  (vdsm.Scheduler) [Executor] Worker
blocked: <Worker name=jsonrpc/0 running <Task <JsonRpcTask {'params':
{},
'jsonrpc': '2.0', 'method': u'Host.getCapabilities',
'id':
u'74b9dc62-22b2-4698-9d84-6a71c4f29763'} at 0x7f71dc31b0d0> timeout=60,
duration=60 at 0x7f71dc31b110> task#=33 at 0x7f722003c890>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line
194, in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
_execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
__call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 523,
in __call__
  self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 566,
in _serveRequest
  response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606,
in _handle_request
  res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in
_dynamicMethod
  result = fn(*methodArgs)
File: "<string>", line 2, in getCapabilities
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
method
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1337, in
getCapabilities
  c = caps.get()
File: "/usr/lib/python2.7/site-packages/vdsm/host/caps.py", line 168, in get
  net_caps = supervdsm.getProxy().network_caps()
File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55,
in __call__
  return callMethod()
File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53,
in <lambda>
  **kwargs)
File: "<string>", line 2, in network_caps
File: "/usr/lib64/python2.7/multiprocessing/b", line 759, in _callmethod
  kind, result = conn.recv() (executor:363)

On Mon, Feb 8, 2021 at 5:52 PM Nir Soffer <nsoffer(a)redhat.com&gt; wrote:

...
 On Mon, Feb 8, 2021 at 1:22 PM Yedidyah Bar David
<didi(a)redhat.com&gt; wrote:
 >
 > On Mon, Feb 8, 2021 at 9:05 AM Yedidyah Bar David <didi(a)redhat.com&gt;
 wrote:
 > >
 > > Hi all,
 > >
 > > I ran a loop of [1] (from [2]). The loop succeeded for ~ 380
 > > iterations, then failed with 'Too many open files'. First failure was:
 > >
 > > 2021-02-08 02:21:15,702+0100 ERROR (jsonrpc/4) [storage.HSM] Could not
 > > connect to storageServer (hsm:2446)
 > > Traceback (most recent call last):
 > >   File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line
 > > 2443, in connectStorageServer
 > >     conObj.connect()
 > >   File
 "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py",
 > > line 449, in connect
 > >     return self._mountCon.connect()
 > >   File
 "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py",
 > > line 171, in connect
 > >     self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
 > >   File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py",
line
 > > 210, in mount
 > >     cgroup=cgroup)
 > >   File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
 > > line 56, in __call__
 > >     return callMethod()
 > >   File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py",
 > > line 54, in <lambda>
 > >     **kwargs)
 > >   File "<string>", line 2, in mount
 > >   File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772,
 > > in _callmethod
 > >     raise convert_to_error(kind, result)
 > > OSError: [Errno 24] Too many open files

 Maybe we have a fd leak in supervdsmd?

 We know that there a small memory leak in multiprocessing, but not
 about any fd leak.

 > > But obviously, once it did, it continued failing for this reason on
 > > many later operations.

 Smells like fd leak.

 > > Is this considered a bug?

 Generally yes, but the question is if this happens during
 real world scenarios.

 > Do we actively try to prevent such cases?

 No, we don't have any code monitoring number of open fds
 in runtime, or tests checking this in system tests.

 We do have health monitor in vdsm:
 https://github.com/oVirt/vdsm/blob/master/lib/vdsm/health.py

 It can be useful to log monitor also the number of fds (.e.g ls -lh
 /proc/pid/fd).

 We don't have any monitor in supervdsm, it can be useful to add
 one. supervdsm is relatively simple, but the problem is it runs
 possibly complex code from vdsm, so "safe" changes in vdsm can
 cause trouble when the code is run by supervdsm.

 > So should I open one and attach logs? Or it can be considered a "corner
 > > case"?

 Yes, please open a bug, and include the info you have.

 Please include output of "ls -lh /proc/pid/fd" for both vdsm
 and supervdsm when you reproduce the issue, or during the
 long test if you cannot reproduce.

 > > Using vdsm-4.40.50.3-37.git7883b3b43.el8.x86_64 from
 > > ost-images-el8-he-installed-1-202102021144.x86_64 .
 > >
 > > I can also let access to the machine(s) if needed, for now.
 >
 > Sorry, now cleaned this env. Can try to reproduce if there is interest.

 It will help you can reproduce.

 Nir
 _______________________________________________
 Devel mailing list -- devel(a)ovirt.org
 To unsubscribe send an email to devel-leave(a)ovirt.org
 Privacy Statement: https://www.ovirt.org/privacy-policy.html
 oVirt Code of Conduct:
 https://www.ovirt.org/community/about/community-guidelines/
 List Archives:

https://lists.ovirt.org/archives/list/devel@ovirt.org/message/W7YKK25CHMN...

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-devel] Re: "Too many open files" in vdsm.log after 380 migrations