
This is a multi-part message in MIME format. --------------C9EE446E673EA16FD96BD311 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hello, I also put host in Maintenance and restarted vdsm while ovirt-ha-agent is running. I can mount the gluster Volume "engine" manually in the host. I get this repeatedly in /var/log/vdsm.log: 2017-02-03 15:29:28,891 INFO (MainThread) [vds] Exiting (vdsm:167) 2017-02-03 15:29:30,974 INFO (MainThread) [vds] (PID: 11456) I am the actual vdsm 4.19.4-1.el7.centos microcloud27 (3.10.0-514.6.1.el7.x86_64) (vdsm:145) 2017-02-03 15:29:30,974 INFO (MainThread) [vds] VDSM will run with cpu affinity: frozenset([1]) (vdsm:251) 2017-02-03 15:29:31,013 INFO (MainThread) [storage.check] Starting check service (check:91) 2017-02-03 15:29:31,017 INFO (MainThread) [storage.Dispatcher] Starting StorageDispatcher... (dispatcher:47) 2017-02-03 15:29:31,017 INFO (check/loop) [storage.asyncevent] Starting <EventLoop running=True closed=False at 0x37480464> (asyncevent:122) 2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and protect: registerDomainStateChangeCallback(callbackFunc=<functools.partial object at 0x2881fc8>) (logUtils:49) 2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and protect: registerDomainStateChangeCallback, Return response: None (logUtils:52) 2017-02-03 15:29:31,160 INFO (MainThread) [MOM] Preparing MOM interface (momIF:49) 2017-02-03 15:29:31,161 INFO (MainThread) [MOM] Using named unix socket /var/run/vdsm/mom-vdsm.sock (momIF:58) 2017-02-03 15:29:31,162 INFO (MainThread) [root] Unregistering all secrets (secret:91) 2017-02-03 15:29:31,164 INFO (MainThread) [vds] Setting channels' timeout to 30 seconds. (vmchannels:223) 2017-02-03 15:29:31,165 INFO (MainThread) [vds.MultiProtocolAcceptor] Listening at :::54321 (protocoldetector:185) 2017-02-03 15:29:31,354 INFO (vmrecovery) [vds] recovery: completed in 0s (clientIF:495) 2017-02-03 15:29:31,371 INFO (BindingXMLRPC) [vds] XMLRPC server running (bindingxmlrpc:63) 2017-02-03 15:29:31,471 INFO (periodic/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-03 15:29:31,472 INFO (periodic/1) [dispatcher] Run and protect: repoStats, Return response: {} (logUtils:52) 2017-02-03 15:29:31,472 WARN (periodic/1) [MOM] MOM not available. (momIF:116) 2017-02-03 15:29:31,473 WARN (periodic/1) [MOM] MOM not available, KSM stats will be missing. (momIF:79) 2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA info (api:252) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed: <class 'ovirt_hos ted_engine_ha.lib.storage_backends.BackendFailureException'> 2017-02-03 15:29:35,920 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:49506 (protocoldetector:72) 2017-02-03 15:29:35,929 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:49506 (protocoldetector:127) 2017-02-03 15:29:35,930 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-03 15:29:35,930 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-03 15:29:36,067 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping succeeded in 0.00 seconds (__init__:515) 2017-02-03 15:29:36,071 INFO (jsonrpc/1) [throttled] Current getAllVmStats: {} (throttledlog:105) 2017-02-03 15:29:36,071 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515) 2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and protect: repoStats, Return response: {} (logUtils:52) 2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA info (api:252) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed: <class 'ovirt_hos ted_engine_ha.lib.storage_backends.BackendFailureException'> 2017-02-03 15:29:51,095 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515) 2017-02-03 15:29:51,219 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.setKsmTune succeeded in 0.00 seconds (__init__:515) 2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and protect: repoStats, Return response: {} (logUtils:52) 2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA info (api:252) Am 03.02.2017 um 13:39 schrieb Simone Tiraboschi:
I see there an ERROR on stopMonitoringDomain but I cannot see the correspondent startMonitoringDomain; could you please look for it?
On Fri, Feb 3, 2017 at 1:16 PM, Ralf Schenk <rs@databay.de <mailto:rs@databay.de>> wrote:
Hello,
attached is my vdsm.log from the host with hosted-engine-ha around the time-frame of agent timeout that is not working anymore for engine (it works in Ovirt and is active). It simply isn't working for engine-ha anymore after Update.
At 2017-02-02 19:25:34,248 you'll find an error corresponoding to agent timeout error.
Bye
Am 03.02.2017 um 11:28 schrieb Simone Tiraboschi:
3. Three of my hosts have the hosted engine deployed for ha. First all three where marked by a crown (running was gold and others where silver). After upgrading the 3 Host deployed hosted engine ha is not active anymore.
I can't get this host back with working ovirt-ha-agent/broker. I already rebooted, manually restarted the services but It isn't able to get cluster state according to "hosted-engine --vm-status". The other hosts state the host status as "unknown stale-data"
I already shut down all agents on all hosts and issued a "hosted-engine --reinitialize-lockspace" but that didn't help.
Agents stops working after a timeout-error according to log:
MainThread::INFO::2017-02-02 19:24:52,040::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-02-02 19:24:59,185::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-02-02 19:25:06,333::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-02-02 19:25:13,554::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-02-02 19:25:20,710::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-02-02 19:25:27,865::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::815::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout during domain acquisition MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout during domain acquisition MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 443, in start_monitoring self._initialize_domain_monitor() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 816, in _initialize_domain_monitor raise Exception(msg) Exception: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout during domain acquisition MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Shutting down the agent because of 3 failures in a row! MainThread::INFO::2017-02-02 19:25:32,087::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING MainThread::INFO::2017-02-02 19:25:34,250::hosted_engine::769::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96): Storage domain is member of pool: u'domain=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96' MainThread::INFO::2017-02-02 19:25:34,254::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
Simone, Martin, can you please follow up on this?
Ralph, could you please attach vdsm logs from on of your hosts for the relevant time frame?
--
*Ralf Schenk* fon +49 (0) 24 05 / 40 83 70 <tel:+49%202405%20408370> fax +49 (0) 24 05 / 40 83 759 <tel:+49%202405%204083759> mail *rs@databay.de* <mailto:rs@databay.de>
*Databay AG* Jens-Otto-Krag-Straße 11 D-52146 Würselen *www.databay.de* <http://www.databay.de>
Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202 Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. Philipp Hermanns Aufsichtsratsvorsitzender: Wilhelm Dohmen
------------------------------------------------------------------------
-- *Ralf Schenk* fon +49 (0) 24 05 / 40 83 70 fax +49 (0) 24 05 / 40 83 759 mail *rs@databay.de* <mailto:rs@databay.de> *Databay AG* Jens-Otto-Krag-Straße 11 D-52146 Würselen *www.databay.de* <http://www.databay.de> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202 Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. Philipp Hermanns Aufsichtsratsvorsitzender: Wilhelm Dohmen ------------------------------------------------------------------------ --------------C9EE446E673EA16FD96BD311 Content-Type: multipart/related; boundary="------------27C892A9C9EF2750A20ED420" --------------27C892A9C9EF2750A20ED420 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <p>Hello,</p> <p>I also put host in Maintenance and restarted vdsm while ovirt-ha-agent is running. I can mount the gluster Volume "engine" manually in the host.<br> </p> <p>I get this repeatedly in /var/log/vdsm.log:</p> <p><tt>2017-02-03 15:29:28,891 INFO (MainThread) [vds] Exiting (vdsm:167)</tt><tt><br> </tt><tt>2017-02-03 15:29:30,974 INFO (MainThread) [vds] (PID: 11456) I am the actual vdsm 4.19.4-1.el7.centos microcloud27 (3.10.0-514.6.1.el7.x86_64) (vdsm:145)</tt><tt><br> </tt><tt>2017-02-03 15:29:30,974 INFO (MainThread) [vds] VDSM will run with cpu affinity: frozenset([1]) (vdsm:251)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,013 INFO (MainThread) [storage.check] Starting check service (check:91)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,017 INFO (MainThread) [storage.Dispatcher] Starting StorageDispatcher... (dispatcher:47)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,017 INFO (check/loop) [storage.asyncevent] Starting <EventLoop running=True closed=False at 0x37480464> (asyncevent:122)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and protect: registerDomainStateChangeCallback(callbackFunc=<functools.partial object at 0x2881fc8>) (logUtils:49)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and protect: registerDomainStateChangeCallback, Return response: None (logUtils:52)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,160 INFO (MainThread) [MOM] Preparing MOM interface (momIF:49)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,161 INFO (MainThread) [MOM] Using named unix socket /var/run/vdsm/mom-vdsm.sock (momIF:58)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,162 INFO (MainThread) [root] Unregistering all secrets (secret:91)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,164 INFO (MainThread) [vds] Setting channels' timeout to 30 seconds. (vmchannels:223)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,165 INFO (MainThread) [vds.MultiProtocolAcceptor] Listening at :::54321 (protocoldetector:185)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,354 INFO (vmrecovery) [vds] recovery: completed in 0s (clientIF:495)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,371 INFO (BindingXMLRPC) [vds] XMLRPC server running (bindingxmlrpc:63)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,471 INFO (periodic/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,472 INFO (periodic/1) [dispatcher] Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,472 WARN (periodic/1) [MOM] MOM not available. (momIF:116)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,473 WARN (periodic/1) [MOM] MOM not available, KSM stats will be missing. (momIF:79)</tt><tt><br> </tt><tt>2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA info (api:252)</tt><tt><br> </tt><tt>Traceback (most recent call last):</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo</tt><tt><br> </tt><tt> stats = instance.get_all_stats()</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats</tt><tt><br> </tt><tt> self._configure_broker_conn(broker)</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn</tt><tt><br> </tt><tt> dom_type=dom_type)</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain</tt><tt><br> </tt><tt> .format(sd_type, options, e))</tt><tt><br> </tt><tt>RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed: <class 'ovirt_hos</tt><tt><br> </tt><tt>ted_engine_ha.lib.storage_backends.BackendFailureException'></tt><tt><br> </tt><tt>2017-02-03 15:29:35,920 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:49506 (protocoldetector:72)</tt><tt><br> </tt><tt>2017-02-03 15:29:35,929 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:49506 (protocoldetector:127)</tt><tt><br> </tt><tt>2017-02-03 15:29:35,930 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102)</tt><tt><br> </tt><tt>2017-02-03 15:29:35,930 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129)</tt><tt><br> </tt><tt>2017-02-03 15:29:36,067 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping succeeded in 0.00 seconds (__init__:515)</tt><tt><br> </tt><tt>2017-02-03 15:29:36,071 INFO (jsonrpc/1) [throttled] Current getAllVmStats: {} (throttledlog:105)</tt><tt><br> </tt><tt>2017-02-03 15:29:36,071 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)</tt><tt><br> </tt><tt>2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br> </tt><tt>2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br> </tt><tt>2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA info (api:252)</tt><tt><br> </tt><tt>Traceback (most recent call last):</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo</tt><tt><br> </tt><tt> stats = instance.get_all_stats()</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats</tt><tt><br> </tt><tt> self._configure_broker_conn(broker)</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn</tt><tt><br> </tt><tt> dom_type=dom_type)</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain</tt><tt><br> </tt><tt> .format(sd_type, options, e))</tt><tt><br> </tt><tt>RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed: <class 'ovirt_hos</tt><tt><br> </tt><tt>ted_engine_ha.lib.storage_backends.BackendFailureException'></tt><tt><br> </tt><tt>2017-02-03 15:29:51,095 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)</tt><tt><br> </tt><tt>2017-02-03 15:29:51,219 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.setKsmTune succeeded in 0.00 seconds (__init__:515)</tt><tt><br> </tt><tt>2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br> </tt><tt>2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br> </tt><tt>2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed to retrieve Hosted Engine HA info (api:252)</tt><br> <br> </p> <p><br> </p> <br> <div class="moz-cite-prefix">Am 03.02.2017 um 13:39 schrieb Simone Tiraboschi:<br> </div> <blockquote cite="mid:CAN8-ONrThxOsyRJRkPXVK8=Tot0OVW+bbN7pY2gJD4SihDxzHw@mail.gmail.com" type="cite"> <div dir="ltr">I see there an ERROR on stopMonitoringDomain but I cannot see the correspondent startMonitoringDomain; could you please look for it?</div> <div class="gmail_extra"><br> <div class="gmail_quote">On Fri, Feb 3, 2017 at 1:16 PM, Ralf Schenk <span dir="ltr"><<a moz-do-not-send="true" href="mailto:rs@databay.de" target="_blank">rs@databay.de</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"> <p>Hello,</p> <p>attached is my vdsm.log from the host with hosted-engine-ha around the time-frame of agent timeout that is not working anymore for engine (it works in Ovirt and is active). It simply isn't working for engine-ha anymore after Update.</p> <p>At 2017-02-02 19:25:34,248 you'll find an error corresponoding to agent timeout error.</p> <p>Bye<br> </p> <div> <div class="h5"> <p><br> </p> <br> <div class="m_-5371711976759655950moz-cite-prefix">Am 03.02.2017 um 11:28 schrieb Simone Tiraboschi:<br> </div> <blockquote type="cite"> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote"><span> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"> <p>3. Three of my hosts have the hosted engine deployed for ha. First all three where marked by a crown (running was gold and others where silver). After upgrading the 3 Host deployed hosted engine ha is not active anymore.</p> <p>I can't get this host back with working ovirt-ha-agent/broker. I already rebooted, manually restarted the services but It isn't able to get cluster state according to <br> "hosted-engine --vm-status". The other hosts state the host status as "unknown stale-data"</p> <p>I already shut down all agents on all hosts and issued a "hosted-engine --reinitialize-lockspace" but that didn't help.<br> </p> <p>Agents stops working after a timeout-error according to log:</p> <p><tt>MainThread::INFO::2017-02-02 19:24:52,040::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:24:59,185::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:06,333::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:13,554::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:20,710::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:27,865::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::8<wbr>15::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96, host_id=3): timeout during domain acquisition</tt><tt><br> </tt><tt>MainThread::WARNING::2017-02-0<wbr>2 19:25:27,866::hosted_engine::4<wbr>69::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring) Error while monitoring engine: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96, host_id=3): timeout during domain acquisition</tt><tt><br> </tt><tt>MainThread::WARNING::2017-02-0<wbr>2 19:25:27,866::hosted_engine::4<wbr>72::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring) Unexpected error</tt><tt><br> </tt><tt>Traceback (most recent call last):</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py", line 443, in start_monitoring</tt><tt><br> </tt><tt> self._initialize_domain_monito<wbr>r()</tt><tt><br> </tt><tt> File "/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py", line 816, in _initialize_domain_monitor</tt><tt><br> </tt><tt> raise Exception(msg)</tt><tt><br> </tt><tt>Exception: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96, host_id=3): timeout during domain acquisition</tt><tt><br> </tt><tt>MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::4<wbr>85::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring) Shutting down the agent because of 3 failures in a row!</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:32,087::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status) VDSM domain monitor status: PENDING</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:34,250::hosted_engine::7<wbr>69::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_stop_domain_monitor) Failed to stop monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96): Storage domain is member of pool: u'domain=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96'</tt><tt><br> </tt><tt>MainThread::INFO::2017-02-02 19:25:34,254::agent::143::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(run) Agent shutting down</tt></p> </div> </blockquote> </span> <div>Simone, Martin, can you please follow up on this?</div> </div> </div> </div> </blockquote> <div><br> </div> <div>Ralph, could you please attach vdsm logs from on of your hosts for the relevant time frame?</div> </blockquote> <br> </div> </div> <span class=""> <div class="m_-5371711976759655950moz-signature">-- <br> <p> </p> <table border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td colspan="3"><img src="cid:part2.442CE625.84474DBE@databay.de" height="30" border="0" width="151"></td> </tr> <tr> <td valign="top"> <font face="Verdana, Arial, sans-serif" size="-1"><br> <b>Ralf Schenk</b><br> fon <a moz-do-not-send="true" href="tel:+49%202405%20408370" value="+492405408370" target="_blank">+49 (0) 24 05 / 40 83 70</a><br> fax <a moz-do-not-send="true" href="tel:+49%202405%204083759" value="+4924054083759" target="_blank">+49 (0) 24 05 / 40 83 759</a><br> mail <a moz-do-not-send="true" href="mailto:rs@databay.de" target="_blank"><font color="#FF0000"><b>rs@databay.de</b></font></a><br> </font> </td> <td width="30"> </td> <td valign="top"> <font face="Verdana, Arial, sans-serif" size="-1"><br> <b>Databay AG</b><br> Jens-Otto-Krag-Straße 11<br> D-52146 Würselen<br> <a moz-do-not-send="true" href="http://www.databay.de" target="_blank"><font color="#FF0000"><b>www.databay.de</b></font></a> </font> </td> </tr> <tr> <td colspan="3" valign="top"> <font face="Verdana, Arial, sans-serif" size="1"><br> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202<br> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. Philipp Hermanns<br> Aufsichtsratsvorsitzender: Wilhelm Dohmen </font> </td> </tr> </tbody> </table> <hr color="#000000" noshade="noshade" size="1" width="100%"> </div> </span></div> </blockquote> </div> <br> </div> </blockquote> <br> <div class="moz-signature">-- <br> <p> </p> <table border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td colspan="3"><img src="cid:part7.257995E9.651557B2@databay.de" height="30" border="0" width="151"></td> </tr> <tr> <td valign="top"> <font face="Verdana, Arial, sans-serif" size="-1"><br> <b>Ralf Schenk</b><br> fon +49 (0) 24 05 / 40 83 70<br> fax +49 (0) 24 05 / 40 83 759<br> mail <a href="mailto:rs@databay.de"><font color="#FF0000"><b>rs@databay.de</b></font></a><br> </font> </td> <td width="30"> </td> <td valign="top"> <font face="Verdana, Arial, sans-serif" size="-1"><br> <b>Databay AG</b><br> Jens-Otto-Krag-Straße 11<br> D-52146 Würselen<br> <a href="http://www.databay.de"><font color="#FF0000"><b>www.databay.de</b></font></a> </font> </td> </tr> <tr> <td colspan="3" valign="top"> <font face="Verdana, Arial, sans-serif" size="1"><br> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202<br> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm. Philipp Hermanns<br> Aufsichtsratsvorsitzender: Wilhelm Dohmen </font> </td> </tr> </tbody> </table> <hr color="#000000" noshade="noshade" size="1" width="100%"> </div> </body> </html> --------------27C892A9C9EF2750A20ED420 Content-Type: image/gif Content-Transfer-Encoding: base64 Content-ID: <part2.442CE625.84474DBE@databay.de> R0lGODlhlwAeAMQAAObm5v9QVf/R0oKBgfDw8NfX105MTLi3t/r6+sfHx/+rrf98gC0sLP8L EhIQEKalpf/g4ZmYmHd2dmppaf8uNP/y8v8cIv+Ym//AwkE/P46NjRwbG11cXP8ABwUDA/// /yH5BAAAAAAALAAAAACXAB4AAAX/4CeOYnUJZKqubOu+cCzPNA0tVnfVfO//wGAKk+t0Ap+K QMFUYCDCqHRKJVUWDaPRUsFktZ1G4AKtms9o1gKsFVS+7I5ll67bpd647hPQawNld4KDMQJF bA07F35aFBiEkJEpfXEBjx8KjI0Vkp2DEIdaCySgFBShbEgrCQOtrq+uEQcALQewrQUjEbe8 rgkkD7y5KhMZB3drqSoVFQhdlHGXKQYe1dbX2BvHKwzY1RMiAN7j1xEjBeTmKeIeD3cYCxRf FigvChRxFJwkBBvk5A7cpZhAjgGCDwn+kfslgto4CSoSehh2BwEEBQvowDAUR0EKdArHZTg4 4oDCXBFC/3qj9SEluZEpHnjYQFIGgpo1KgSasYjNKBImrzF4NaFbNgIjCGRQeIyVKwneOLzS cLCAg38OWI4Y4GECgQcSOEwYcADnh6/FNjAwoGFYAQ0atI4AAFeEFwsLFLiJUQEfGH0kNGAD x8+oNQdIRQg+7NCaOhIgD8sVgYADNsPVGI5YWjRqzQTdHDDIYHRDLokaUhCglkFEJi0NKJhl 0RP2TsvXUg88KiLBVWsZrF6DmMKlNYMqglqTik1guN8OBgAgkGCpB+L9ugK4iSCBvwEfECw1 kILrBpa1jVCQIQBRvbP+rlEcQVAoSevWyv6uhpwE12uEkQAAZucpVw1xIsjkgf8B863mQVYt eQATCZYJZJ5WBfij2wfpHcEeHGG8Z+BMszVWDXkfKLhceJhBSAJ+1ThH32AfRFZNayNAtUFi wFSTSwEHJIYAAQU84IADwyjIEALU9MchG+vFgIF7W2GDI2T7HfjBgNcgKQKMHmwjgnCSpeCb ULRkdxhF1CDY40RjgmUAA/v1J5FAKW2gGSZscBFDMraNgJs1AYpAAGYP5jJoNQ4Y4Gh8jpFg HH9mgbmWo1l6oA4C3Ygp6UwEIFBfNRtkMIBlKMLnAXgAXLWhXXH85EIFqMhGGZgDEKArABGA ed0HI4bk5qgnprCYSt88B6dqS0FEEAMPJDCdCJYViur/B1BlwGMJqDTwnhqxJgUpo0ceOQ4D 0yEakpMm/jqCRMgWm2I1j824Y6vLvuuPjHnqOJkIgP6xzwp5sCFNsCFp88Gxh11lrjfDcNrc CEx64/CD3iAHlQcMUEQXvcA+qBkBB4Q2X1CusjBlJdKMYAKI6g28MbKN5hJsBAXknHOwutn4 oFYqkpqAzjnPbE0u1PxmwAQGXLWBbvhuIIEGEnRjlAHO4SvhbCNAkwoGzEBwgV9U0lfu2WiX OkDEGaCdKgl0nk2YkWdPOCDabvaGdkAftL1LlgwCM+7Tq11V71IO7LkM2XE0YAHMYMhqqK6U V165CpaHukLmiXFO8XSVzzakX+UH6TrmAajPNxfqByTQec41AeBPvSwIALkmAnuiexCsca3C BajgfsROuxcPA8kHQJX4DAIwjnsAvhsvfXHWKEwDAljg7sj03L9wwAQTxOWD2AE0YP75eCkw cPfs+xACADs= --------------27C892A9C9EF2750A20ED420 Content-Type: image/gif; name="logo_databay_email.gif" Content-Transfer-Encoding: base64 Content-ID: <part7.257995E9.651557B2@databay.de> Content-Disposition: inline; filename="logo_databay_email.gif" R0lGODlhlwAeAMQAAObm5v9QVf/R0oKBgfDw8NfX105MTLi3t/r6+sfHx/+rrf98gC0sLP8L EhIQEKalpf/g4ZmYmHd2dmppaf8uNP/y8v8cIv+Ym//AwkE/P46NjRwbG11cXP8ABwUDA/// /yH5BAAAAAAALAAAAACXAB4AAAX/4CeOYnUJZKqubOu+cCzPNA0tVnfVfO//wGAKk+t0Ap+K QMFUYCDCqHRKJVUWDaPRUsFktZ1G4AKtms9o1gKsFVS+7I5ll67bpd647hPQawNld4KDMQJF bA07F35aFBiEkJEpfXEBjx8KjI0Vkp2DEIdaCySgFBShbEgrCQOtrq+uEQcALQewrQUjEbe8 rgkkD7y5KhMZB3drqSoVFQhdlHGXKQYe1dbX2BvHKwzY1RMiAN7j1xEjBeTmKeIeD3cYCxRf FigvChRxFJwkBBvk5A7cpZhAjgGCDwn+kfslgto4CSoSehh2BwEEBQvowDAUR0EKdArHZTg4 4oDCXBFC/3qj9SEluZEpHnjYQFIGgpo1KgSasYjNKBImrzF4NaFbNgIjCGRQeIyVKwneOLzS cLCAg38OWI4Y4GECgQcSOEwYcADnh6/FNjAwoGFYAQ0atI4AAFeEFwsLFLiJUQEfGH0kNGAD x8+oNQdIRQg+7NCaOhIgD8sVgYADNsPVGI5YWjRqzQTdHDDIYHRDLokaUhCglkFEJi0NKJhl 0RP2TsvXUg88KiLBVWsZrF6DmMKlNYMqglqTik1guN8OBgAgkGCpB+L9ugK4iSCBvwEfECw1 kILrBpa1jVCQIQBRvbP+rlEcQVAoSevWyv6uhpwE12uEkQAAZucpVw1xIsjkgf8B863mQVYt eQATCZYJZJ5WBfij2wfpHcEeHGG8Z+BMszVWDXkfKLhceJhBSAJ+1ThH32AfRFZNayNAtUFi wFSTSwEHJIYAAQU84IADwyjIEALU9MchG+vFgIF7W2GDI2T7HfjBgNcgKQKMHmwjgnCSpeCb ULRkdxhF1CDY40RjgmUAA/v1J5FAKW2gGSZscBFDMraNgJs1AYpAAGYP5jJoNQ4Y4Gh8jpFg HH9mgbmWo1l6oA4C3Ygp6UwEIFBfNRtkMIBlKMLnAXgAXLWhXXH85EIFqMhGGZgDEKArABGA ed0HI4bk5qgnprCYSt88B6dqS0FEEAMPJDCdCJYViur/B1BlwGMJqDTwnhqxJgUpo0ceOQ4D 0yEakpMm/jqCRMgWm2I1j824Y6vLvuuPjHnqOJkIgP6xzwp5sCFNsCFp88Gxh11lrjfDcNrc CEx64/CD3iAHlQcMUEQXvcA+qBkBB4Q2X1CusjBlJdKMYAKI6g28MbKN5hJsBAXknHOwutn4 oFYqkpqAzjnPbE0u1PxmwAQGXLWBbvhuIIEGEnRjlAHO4SvhbCNAkwoGzEBwgV9U0lfu2WiX OkDEGaCdKgl0nk2YkWdPOCDabvaGdkAftL1LlgwCM+7Tq11V71IO7LkM2XE0YAHMYMhqqK6U V165CpaHukLmiXFO8XSVzzakX+UH6TrmAajPNxfqByTQec41AeBPvSwIALkmAnuiexCsca3C BajgfsROuxcPA8kHQJX4DAIwjnsAvhsvfXHWKEwDAljg7sj03L9wwAQTxOWD2AE0YP75eCkw cPfs+xACADs= --------------27C892A9C9EF2750A20ED420-- --------------C9EE446E673EA16FD96BD311--