oVIRT 4.0.1 Hosted Engine Agent stops and can't be started anymore

Hi Guys, I'm having an issues just at once that my HA Agent stops running and can't be restarted or won't start anymore after the following error. I'm able to start the HE manually on the commandline on each host. Exception: Failed to start monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout during domain acquisition MainThread::WARNING::2016-07-22 13:20:05,059::hosted_engine::477::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout during domain acquisition MainThread::WARNING::2016-07-22 13:20:05,059::hosted_engine::480::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 451, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 831, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout during domain acquisition MainThread::ERROR::2016-07-22 13:20:05,060::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (start_monitoring) Shutting down the agent because of 3 failures in a row! MainThread::INFO::2016-07-22 13:20:07,096::hosted_engine::860::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2016-07-22 13:20:07,122::hosted_engine::786::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321): Error 900 from stopMonitoringDomain: Storage domain is member of pool: 'domain=4093ad17-bef5-4e4b-9a16-259a98e20321' MainThread::INFO::2016-07-22 13:20:07,129::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down This part concerns me actually: Error 900 from stopMonitoringDomain: Storage domain is member of pool: 'domain=4093ad17-bef5-4e4b-9a16-259a98e20321' I'm running oVirt 4.0.1 Or do you guys want a bugreport for this ? Thanks! Matt

On Sun, Jul 24, 2016 at 12:19 AM, Matt . <yamakasi.014@gmail.com> wrote:
Hi Guys,
I'm having an issues just at once that my HA Agent stops running and can't be restarted or won't start anymore after the following error.
I'm able to start the HE manually on the commandline on each host.
Exception: Failed to start monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout during domain acquisition MainThread::WARNING::2016-07-22
13:20:05,059::hosted_engine::477::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout during domain acquisition MainThread::WARNING::2016-07-22
13:20:05,059::hosted_engine::480::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 451, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 831, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout during domain acquisition MainThread::ERROR::2016-07-22
13:20:05,060::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (start_monitoring) Shutting down the agent because of 3 failures in a row! MainThread::INFO::2016-07-22
13:20:07,096::hosted_engine::860::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2016-07-22
13:20:07,122::hosted_engine::786::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:: (_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321): Error 900 from stopMonitoringDomain: Storage domain is member of pool: 'domain=4093ad17-bef5-4e4b-9a16-259a98e20321' MainThread::INFO::2016-07-22 13:20:07,129::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
This part concerns me actually:
Error 900 from stopMonitoringDomain: Storage domain is member of pool: 'domain=4093ad17-bef5-4e4b-9a16-259a98e20321'
Can you please provide full sos report from the failing host? Has this host been upgraded from a previous version or freshly installed?
I'm running oVirt 4.0.1
Or do you guys want a bugreport for this ?
Thanks!
Matt _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com
participants (2)
-
Matt .
-
Sandro Bonazzola