On Sun, Jul 24, 2016 at 12:19 AM, Matt . <yamakasi.014@gmail.com> wrote:
Hi Guys,

I'm having an issues just at once that my HA Agent stops running and
can't be restarted or won't start anymore after the following error.

I'm able to start the HE manually on the commandline on each host.

Exception: Failed to start monitoring domain
(sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1):
timeout during domain acquisition
MainThread::WARNING::2016-07-22
13:20:05,059::hosted_engine::477::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::
(start_monitoring) Error while monitoring engine: Failed to start
monitoring domain
(sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2016-07-22
13:20:05,059::hosted_engine::480::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::
(start_monitoring) Unexpected error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 451, in start_monitoring
    self._initialize_domain_monitor()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 831, in _initialize_domain_monitor
    raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321, host_id=1):
timeout during domain acquisition
MainThread::ERROR::2016-07-22
13:20:05,060::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::
(start_monitoring) Shutting down the agent because of 3 failures in a row!
MainThread::INFO::2016-07-22
13:20:07,096::hosted_engine::860::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::
(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2016-07-22
13:20:07,122::hosted_engine::786::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::
(_stop_domain_monitor) Failed to stop monitoring domain
(sd_uuid=4093ad17-bef5-4e4b-9a16-259a98e20321):
Error 900 from stopMonitoringDomain: Storage domain is member of pool:
'domain=4093ad17-bef5-4e4b-9a16-259a98e20321'
MainThread::INFO::2016-07-22
13:20:07,129::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down

This part concerns me actually:

Error 900 from stopMonitoringDomain: Storage domain is member of pool:
'domain=4093ad17-bef5-4e4b-9a16-259a98e20321'


Can you please provide full sos report from the failing host?
Has this host been upgraded from a previous version or freshly installed?


 

I'm running oVirt 4.0.1

Or do you guys want a bugreport for this ?


Thanks!

Matt
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com