[ovirt-users] ovirt-ha-agent and too many open files error

Gianluca Cecchi gianluca.cecchi at gmail.com
Tue Aug 9 14:59:29 UTC 2016


Hello,
I have a 4.0 test environment (single host with self hosted engine) where I
have 6 VMs defined (5 running) and no much activity.

I do't monitor this system very much.

Now I have connected to it to evaluate upgrade to 4.0.1 and see that about
15 days ago the ovirt-ha-agent died because of too many open files....

[root at ractor ovirt-hosted-engine-ha]# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring
Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled;
vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-07-22 16:39:49 CEST; 2 weeks 4
days ago
 Main PID: 72795 (code=exited, status=0/SUCCESS)

Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: self.set_file(fd)
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: File
"/usr/lib64/python2.7/asyncore.py", line 657, in set_file
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: self.socket =
file_wrapper(fd)
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: File
"/usr/lib64/python2.7/asyncore.py", line 616, in __init__
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: self.fd = os.dup(fd)
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: OSError: [Errno 24]
Too many open files
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Shutting down
the agent because of 3 failures in a row!
Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]:
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Shutting down
the agent because of 3 failures in a row!
Jul 22 16:39:49 ractor.mydomain ovirt-ha-agent[72795]:
WARNING:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:The VM is
running locally or we have no data, keeping the domain monitor.
Jul 22 16:39:49 ractor.mydomain ovirt-ha-agent[72795]:
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down

Is this sort of known problem or any reason to investigate?
It seems very strange to have reached this limit

I presume the agent runs as vdsm user and that the oVirt installation
creates the file
/etc/security/limits.d/99-vdsm.conf

with
# This limits are intended for medium VDSM hosts, for large hosts scale
these
# numbers appropriately.

# nproc should be the maximum amount of storage operations usage.
# VMs run by "qemu" user, vm processes are not relavent to "vdsm" user
limits.
vdsm - nproc 4096

# nofile should be at least 3(stdin,stdour,stderr) * each external process.
# 3 * 4096 = 12288
vdsm - nofile 12288

As a rough estimation (over estimation actually , due to many duplicates) I
have now:
# lsof -u vdsm | wc -l
488

Anything else to check?

Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160809/42d9280e/attachment-0001.html>


More information about the Users mailing list