[ovirt-users] ovirt-ha-agent keeps quitting - 4.0.0

Yaniv Dary ydary at redhat.com
Sun Jul 17 11:31:43 UTC 2016


The other issue will be fixed in 4.0.2:
https://bugzilla.redhat.com/show_bug.cgi?id=1348907

Yaniv Dary
Technical Product Manager
Red Hat Israel Ltd.
34 Jerusalem Road
Building A, 4th floor
Ra'anana, Israel 4350109

Tel : +972 (9) 7692306
        8272306
Email: ydary at redhat.com
IRC : ydary


On Sun, Jul 17, 2016 at 1:04 PM, Artyom Lukianov <alukiano at redhat.com>
wrote:

> We had the bug related to this issue
> https://bugzilla.redhat.com/show_bug.cgi?id=1343005.
> It must be fixed in recent versions.
> Best Regards
>
> On Thu, Jul 14, 2016 at 8:14 PM, Gervais de Montbrun <
> gervais at demontbrun.com> wrote:
>
>> Hey Folks,
>>
>> I upgraded my oVirt cluster from 3.6.7 to 4.0.0 yesterday and am
>> experiencing a bunch of issues.
>>
>> 1) I can't update the Compatibility Version to 4.0 because it tells me
>> that all my VMs have to be off to do so, but I have a hosted engine. I
>> found some info online about how you plan to fix this. Do we know if the
>> fix will be in 4.0.1?
>>
>> 2) More alarming... the ovirt-ha-agent keeps quitting. The agent.log
>> shows:
>>
>> MainThread::ERROR::2016-07-13
>> 16:38:57,100::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 16:39:02,104::config::122::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_load)
>> Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not
>> available [[Errno 24] Too many open files:
>> '/etc/ovirt-hosted-engine/hosted-engine.conf']
>> MainThread::ERROR::2016-07-13
>> 16:39:02,105::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 16:39:07,110::agent::210::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Too many errors occurred, giving up. Please review the log and consider
>> filing a bug.
>> MainThread::ERROR::2016-07-13
>> 17:44:03,499::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Shutting down the agent because of 3 failures in a row!
>> MainThread::ERROR::2016-07-13
>> 17:44:03,515::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '(24, 'Sanlock lockspace remove failure', 'Too many open files')' -
>> trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:08,520::config::122::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_load)
>> Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not
>> available [[Errno 24] Too many open files:
>> '/etc/ovirt-hosted-engine/hosted-engine.conf']
>> MainThread::ERROR::2016-07-13
>> 17:44:08,523::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:13,529::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:18,535::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:23,541::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:28,546::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:33,552::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:38,556::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:43,561::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:48,566::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: '[Errno 24] Too many open files' - trying to restart agent
>> MainThread::ERROR::2016-07-13
>> 17:44:53,571::agent::210::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Too many errors occurred, giving up. Please review the log and consider
>> filing a bug.
>> MainThread::ERROR::2016-07-13
>> 18:47:40,048::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Shutting down the agent because of 3 failures in a row!
>> MainThread::ERROR::2016-07-14
>> 10:32:29,184::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Shutting down the agent because of 3 failures in a row!
>> MainThread::ERROR::2016-07-14
>> 11:10:07,223::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
>> Connection closed: Connection closed
>> MainThread::ERROR::2016-07-14
>> 11:10:07,224::brokerlink::148::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(get_monitor_status)
>> Exception getting monitor status: Connection closed
>> MainThread::ERROR::2016-07-14
>> 11:10:07,224::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>> Error: 'Failed to get monitor status: Connection closed' - trying to
>> restart agent
>> MainThread::ERROR::2016-07-14
>> 12:10:26,772::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Shutting down the agent because of 3 failures in a row!
>>
>> systemtl output:
>>
>> [root at cultivar3 ~]# systemctl status ovirt-ha-agent.service
>> ovirt-ha-broker.service vdsmd
>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability
>> Monitoring Agent
>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service;
>> enabled; vendor preset: disabled)
>>    Active: inactive (dead) since Thu 2016-07-14 12:10:29 ADT; 2h 3min ago
>>   Process: 19426
>> ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>> (code=exited, status=0/SUCCESS)
>>  Main PID: 19426 (code=exited, status=0/SUCCESS)
>>
>> Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>> Connection closed: Connection closed
>> Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>> Exception getting monitor status: Connection closed
>> Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error:
>> 'Failed to get monitor status: Connection closed' - trying to restart agent
>> Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ERROR:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Connection closed:
>> Connection closed
>> Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ERROR:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Exception getting
>> monitor status: Connection closed
>> Jul 14 11:10:07 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Failed to get
>> monitor status: Connection closed' - trying to restart agent
>> Jul 14 12:10:26 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> Exception AttributeError: "'EventFD' object has no attribute '_fd'" in
>> <bound method EventFD.__del__ of <vdsm.infra.eventfd.EventFD object at
>> 0x2b035d0>> ignored
>> Jul 14 12:10:26 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Shutting down the agent because of 3 failures in a row!
>> Jul 14 12:10:26 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Shutting down
>> the agent because of 3 failures in a row!
>> Jul 14 12:10:28 cultivar3.grove.silverorange.com ovirt-ha-agent[19426]:
>> Exception AttributeError: "'EventFD' object has no attribute '_fd'" in
>> <bound method EventFD.__del__ of <vdsm.infra.eventfd.EventFD object at
>> 0x2b03f90>> ignored
>>
>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>> Communications Broker
>>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>> enabled; vendor preset: disabled)
>>    Active: active (running) since Thu 2016-07-14 11:10:09 ADT; 3h 3min ago
>>  Main PID: 19907 (ovirt-ha-broker)
>>    CGroup: /system.slice/ovirt-ha-broker.service
>>            └─19907 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>>
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> reply: '354 End data with <CR><LF>.<CR><LF>\r\n'
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> reply: retcode (354); Msg: End data with <CR><LF>.<CR><LF>
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> data: (354, 'End data with <CR><LF>.<CR><LF>')
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> send: 'From: root at cultivar.grove.silverorange.com\r\nTo:
>> sysadmin at silverorange.com\r\nSubject: ovirt-hosted-engine state
>> transition EngineUnexpectedlyDown-EngineDown\r\nDate: ...
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> reply: '250 2.0.0 Ok: queued as 1B5F9C0064B90\r\n'
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> reply: retcode (250); Msg: 2.0.0 Ok: queued as 1B5F9C0064B90
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> data: (250, '2.0.0 Ok: queued as 1B5F9C0064B90')
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> send: 'quit\r\n'
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> reply: '221 2.0.0 Bye\r\n'
>> Jul 14 11:36:01 cultivar3.grove.silverorange.com ovirt-ha-broker[19907]:
>> reply: retcode (221); Msg: 2.0.0 Bye
>>
>> ● vdsmd.service - Virtual Desktop Server Manager
>>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
>> preset: enabled)
>>    Active: active (running) since Thu 2016-07-14 09:31:06 ADT; 4h 42min
>> ago
>>   Process: 2236 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
>> --pre-start (code=exited, status=0/SUCCESS)
>>  Main PID: 2356 (vdsm)
>>    CGroup: /system.slice/vdsmd.service
>>            ├─2356 /usr/bin/python /usr/share/vdsm/vdsm
>>            ├─2577 /usr/libexec/ioprocess --read-pipe-fd 82
>> --write-pipe-fd 81 --max-threads 10 --max-queued-requests 10
>>            ├─3180 /usr/libexec/ioprocess --read-pipe-fd 125
>> --write-pipe-fd 124 --max-threads 10 --max-queued-requests 10
>>            ├─3191 /usr/libexec/ioprocess --read-pipe-fd 130
>> --write-pipe-fd 127 --max-threads 10 --max-queued-requests 10
>>            └─3198 /usr/libexec/ioprocess --read-pipe-fd 138
>> --write-pipe-fd 136 --max-threads 10 --max-queued-requests 10
>>
>> Jul 14 14:13:04 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'vcpuCount': '1', 'displayInfo':
>> [{'tlsPort': u'5905', 'ipAddress': '0', 'type': u'spice', 'port':
>> u'5904'}], 'hash': '242489...9-e01f21985049',
>> Jul 14 14:13:20 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'displayInfo': [{'tlsPort':
>> u'5901', 'ipAddress': '0', 'type': u'spice', 'port': u'5900'}], 'memUsage':
>> '27', 'acpiEnable': u...eRuntimeInfo': {
>> Jul 14 14:13:20 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'displayInfo': [{'tlsPort':
>> u'5903', 'ipAddress': '0', 'type': u'spice', 'port': u'5902'}], 'memUsage':
>> '19', 'acpiEnable': u...deRuntimeInfo':
>> Jul 14 14:13:20 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'vcpuCount': '1', 'displayInfo':
>> [{'tlsPort': u'5905', 'ipAddress': '0', 'type': u'spice', 'port':
>> u'5904'}], 'hash': '242489...9-e01f21985049',
>> Jul 14 14:13:36 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'displayInfo': [{'tlsPort':
>> u'5901', 'ipAddress': '0', 'type': u'spice', 'port': u'5900'}], 'memUsage':
>> '27', 'acpiEnable': u...eRuntimeInfo': {
>> Jul 14 14:13:36 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'displayInfo': [{'tlsPort':
>> u'5903', 'ipAddress': '0', 'type': u'spice', 'port': u'5902'}], 'memUsage':
>> '19', 'acpiEnable': u...deRuntimeInfo':
>> Jul 14 14:13:36 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'vcpuCount': '1', 'displayInfo':
>> [{'tlsPort': u'5905', 'ipAddress': '0', 'type': u'spice', 'port':
>> u'5904'}], 'hash': '242489...9-e01f21985049',
>> Jul 14 14:13:52 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'displayInfo': [{'tlsPort':
>> u'5901', 'ipAddress': '0', 'type': u'spice', 'port': u'5900'}], 'memUsage':
>> '27', 'acpiEnable': u...eRuntimeInfo': {
>> Jul 14 14:13:52 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'displayInfo': [{'tlsPort':
>> u'5903', 'ipAddress': '0', 'type': u'spice', 'port': u'5902'}], 'memUsage':
>> '19', 'acpiEnable': u...deRuntimeInfo':
>> Jul 14 14:13:52 cultivar3.grove.silverorange.com vdsm[2356]: vdsm
>> SchemaCache WARNING Provided parameters {'vcpuCount': '1', 'displayInfo':
>> [{'tlsPort': u'5905', 'ipAddress': '0', 'type': u'spice', 'port':
>> u'5904'}], 'hash': '242489...9-e01f21985049',
>> Hint: Some lines were ellipsized, use -l to show in full.
>>
>>
>> Cheers,
>> Gervais
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160717/58a31552/attachment-0001.html>


More information about the Users mailing list