[ovirt-users] ovirt-ha-agent and too many open files error
Simone Tiraboschi
stirabos at redhat.com
Tue Aug 9 16:54:39 UTC 2016
On Tue, Aug 9, 2016 at 4:59 PM, Gianluca Cecchi
<gianluca.cecchi at gmail.com> wrote:
> Hello,
> I have a 4.0 test environment (single host with self hosted engine) where I
> have 6 VMs defined (5 running) and no much activity.
>
> I do't monitor this system very much.
>
> Now I have connected to it to evaluate upgrade to 4.0.1 and see that about
> 15 days ago the ovirt-ha-agent died because of too many open files....
>
> [root at ractor ovirt-hosted-engine-ha]# systemctl status ovirt-ha-agent -l
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring
> Agent
> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled;
> vendor preset: disabled)
> Active: inactive (dead) since Fri 2016-07-22 16:39:49 CEST; 2 weeks 4
> days ago
> Main PID: 72795 (code=exited, status=0/SUCCESS)
>
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: self.set_file(fd)
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: File
> "/usr/lib64/python2.7/asyncore.py", line 657, in set_file
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: self.socket =
> file_wrapper(fd)
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: File
> "/usr/lib64/python2.7/asyncore.py", line 616, in __init__
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: self.fd = os.dup(fd)
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: OSError: [Errno 24]
> Too many open files
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Shutting down
> the agent because of 3 failures in a row!
> Jul 22 16:39:47 ractor.mydomain ovirt-ha-agent[72795]:
> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Shutting down
> the agent because of 3 failures in a row!
> Jul 22 16:39:49 ractor.mydomain ovirt-ha-agent[72795]:
> WARNING:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:The VM is
> running locally or we have no data, keeping the domain monitor.
> Jul 22 16:39:49 ractor.mydomain ovirt-ha-agent[72795]:
> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>
> Is this sort of known problem or any reason to investigate?
> It seems very strange to have reached this limit
>
> I presume the agent runs as vdsm user and that the oVirt installation
> creates the file
> /etc/security/limits.d/99-vdsm.conf
>
> with
> # This limits are intended for medium VDSM hosts, for large hosts scale
> these
> # numbers appropriately.
>
> # nproc should be the maximum amount of storage operations usage.
> # VMs run by "qemu" user, vm processes are not relavent to "vdsm" user
> limits.
> vdsm - nproc 4096
>
> # nofile should be at least 3(stdin,stdour,stderr) * each external process.
> # 3 * 4096 = 12288
> vdsm - nofile 12288
>
> As a rough estimation (over estimation actually , due to many duplicates) I
> have now:
> # lsof -u vdsm | wc -l
> 488
>
> Anything else to check?
Ciao Gianluca,
can you please report which vdsm version are using there?
we had a similar issue in the past but it should be already solved:
https://bugzilla.redhat.com/1343005
> Gianluca
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
More information about the Users
mailing list