Hello,
attached is my vdsm.log from the host with hosted-engine-ha around the time-frame of agent timeout that is not working anymore for engine (it works in Ovirt and is active). It simply isn't working for engine-ha anymore after Update.
At 2017-02-02 19:25:34,248 you'll find an error corresponoding to agent timeout error.
Bye
3. Three of my hosts have the hosted engine deployed for ha. First all three where marked by a crown (running was gold and others where silver). After upgrading the 3 Host deployed hosted engine ha is not active anymore.
I can't get this host back with working ovirt-ha-agent/broker. I already rebooted, manually restarted the services but It isn't able to get cluster state according to
"hosted-engine --vm-status". The other hosts state the host status as "unknown stale-data"I already shut down all agents on all hosts and issued a "hosted-engine --reinitialize-lockspace" but that didn't help.
Agents stops working after a timeout-error according to log:
MainThread::INFO::2017-02-02 19:24:52,040::hosted_engine::8
41::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2017-02-02 19:24:59,185::hosted_engine::841::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2017-02-02 19:25:06,333::hosted_engine::841::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2017-02-02 19:25:13,554::hosted_engine::841::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2017-02-02 19:25:20,710::hosted_engine::841::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2017-02-02 19:25:27,865::hosted_engine::841::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::815::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9 b4-ddc8da99ad96, host_id=3): timeout during domain acquisition
MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::4 69::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9 b4-ddc8da99ad96, host_id=3): timeout during domain acquisition
MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::4 72::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Unexpected error
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/hosted_engine.py", line 443, in start_monitoring
self._initialize_domain_monitor()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/ agent/hosted_engine.py", line 816, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout during domain acquisition
MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::485::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(start_monitoring) Shutting down the agent because of 3 failures in a row!
MainThread::INFO::2017-02-02 19:25:32,087::hosted_engine::841::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_get_domain_monitor_status) VDSM domain monitor status: PENDING
MainThread::INFO::2017-02-02 19:25:34,250::hosted_engine::769::ovirt_hosted_engine_ha.age nt.hosted_engine.HostedEngine: :(_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9 b4-ddc8da99ad96): Storage domain is member of pool: u'domain=7c8deaa8-be02-4aaf-b9 b4-ddc8da99ad96'
MainThread::INFO::2017-02-02 19:25:34,254::agent::143::ovirt_hosted_engine_ha.agent.agent .Agent::(run) Agent shutting down Simone, Martin, can you please follow up on this?
Ralph, could you please attach vdsm logs from on of your hosts for the relevant time frame?
Ralf Schenk fon +49 (0) 24 05 / 40 83 70 fax +49 (0) 24 05 / 40 83 759 mail rs@databay.de |
Databay AG Jens-Otto-Krag-Straße 11 D-52146 Würselen www.databay.de |
|
Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202 Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. Philipp Hermanns Aufsichtsratsvorsitzender: Wilhelm Dohmen |