<div dir="ltr">The hosted-engine storage domain is mounted for sure,<div>but the issue is here:<br><div>Exception: Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout during domain acquisition<br></div></div><div><br></div><div>The point is that in VDSM logs I see just something like:</div><div><div>2017-02-02 21:05:22,283 INFO (jsonrpc/1) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)</div><div>2017-02-02 21:05:22,285 INFO (jsonrpc/1) [dispatcher] Run and protect: repoStats, Return response: {u'a7fbaaad-7043-4391-9523-3bedcdc4fb0d': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000748727', 'lastCheck': '0.1', 'valid': True}, u'2b2a44fc-f2bd-47cd-b7af-00be59e30a35': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.00082529', 'lastCheck': '0.1', 'valid': True}, u'5d99af76-33b5-47d8-99da-1f32413c7bb0': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000349356', 'lastCheck': '5.3', 'valid': True}, u'7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96': {'code': 0, 'actual': True, 'version': 4, 'acquired': False, 'delay': '0.000377052', 'lastCheck': '0.6', 'valid': True}} (logUtils:52)</div></div><div><br></div><div>Where the other storage domains have 'acquired': True whil it's always 'acquired': False for the hosted-engine storage domain.</div><div><br></div><div>Could you please share your /var/log/sanlock.log from the same host and the output of </div><div> sanlock client status</div><div>?</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 3, 2017 at 3:52 PM, Ralf Schenk <span dir="ltr"><<a href="mailto:rs@databay.de" target="_blank">rs@databay.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Hello,</p>
<p>I also put host in Maintenance and restarted vdsm while
ovirt-ha-agent is running. I can mount the gluster Volume "engine"
manually in the host.<br>
</p>
<p>I get this repeatedly in /var/log/vdsm.log:</p>
<p><tt>2017-02-03 15:29:28,891 INFO (MainThread) [vds] Exiting
(vdsm:167)</tt><tt><br>
</tt><tt>2017-02-03 15:29:30,974 INFO (MainThread) [vds] (PID:
11456) I am the actual vdsm 4.19.4-1.el7.centos microcloud27
(3.10.0-514.6.1.el7.x86_64) (vdsm:145)</tt><tt><br>
</tt><tt>2017-02-03 15:29:30,974 INFO (MainThread) [vds] VDSM
will run with cpu affinity: frozenset([1]) (vdsm:251)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,013 INFO (MainThread)
[storage.check] Starting check service (check:91)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,017 INFO (MainThread)
[storage.Dispatcher] Starting StorageDispatcher...
(dispatcher:47)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,017 INFO (check/loop)
[storage.asyncevent] Starting <EventLoop running=True
closed=False at 0x37480464> (asyncevent:122)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher]
Run and protect:
registerDomainStateChangeCallb<wbr>ack(callbackFunc=<functools.<wbr>partial
object at 0x2881fc8>) (logUtils:49)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher]
Run and protect: registerDomainStateChangeCallb<wbr>ack, Return
response: None (logUtils:52)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,160 INFO (MainThread) [MOM]
Preparing MOM interface (momIF:49)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,161 INFO (MainThread) [MOM] Using
named unix socket /var/run/vdsm/mom-vdsm.sock (momIF:58)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,162 INFO (MainThread) [root]
Unregistering all secrets (secret:91)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,164 INFO (MainThread) [vds] Setting
channels' timeout to 30 seconds. (vmchannels:223)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,165 INFO (MainThread)
[vds.MultiProtocolAcceptor] Listening at :::54321
(protocoldetector:185)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,354 INFO (vmrecovery) [vds]
recovery: completed in 0s (clientIF:495)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,371 INFO (BindingXMLRPC) [vds]
XMLRPC server running (bindingxmlrpc:63)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,471 INFO (periodic/1) [dispatcher]
Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,472 INFO (periodic/1) [dispatcher]
Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,472 WARN (periodic/1) [MOM] MOM not
available. (momIF:116)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,473 WARN (periodic/1) [MOM] MOM not
available, KSM stats will be missing. (momIF:79)</tt><tt><br>
</tt><tt>2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed
to retrieve Hosted Engine HA info (api:252)</tt><span class=""><tt><br>
</tt><tt>Traceback (most recent call last):</tt><tt><br>
</tt></span><tt> File
"/usr/lib/python2.7/site-<wbr>packages/vdsm/host/api.py", line 231,
in _getHaInfo</tt><tt><br>
</tt><tt> stats = instance.get_all_stats()</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-<wbr>packages/ovirt_hosted_engine_<wbr>ha/client/client.py",
line 103, in get_all_stats</tt><tt><br>
</tt><tt> self._configure_broker_conn(<wbr>broker)</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-<wbr>packages/ovirt_hosted_engine_<wbr>ha/client/client.py",
line 180, in _configure_broker_conn</tt><tt><br>
</tt><tt> dom_type=dom_type)</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-<wbr>packages/ovirt_hosted_engine_<wbr>ha/lib/brokerlink.py",
line 177, in set_storage_domain</tt><tt><br>
</tt><tt> .format(sd_type, options, e))</tt><tt><br>
</tt><tt>RequestError: Failed to set storage domain
FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid':
'7c8deaa8-be02-4aaf-b9b4-<wbr>ddc8da99ad96'}: Request failed:
<class 'ovirt_hos</tt><tt><br>
</tt><tt>ted_engine_ha.lib.storage_<wbr>backends.<wbr>BackendFailureException'></tt><tt><br>
</tt><tt>2017-02-03 15:29:35,920 INFO (Reactor thread)
[ProtocolDetector.<wbr>AcceptorImpl] Accepted connection from
::1:49506 (protocoldetector:72)</tt><tt><br>
</tt><tt>2017-02-03 15:29:35,929 INFO (Reactor thread)
[ProtocolDetector.Detector] Detected protocol stomp from
::1:49506 (protocoldetector:127)</tt><tt><br>
</tt><tt>2017-02-03 15:29:35,930 INFO (Reactor thread)
[Broker.StompAdapter] Processing CONNECT request
(stompreactor:102)</tt><tt><br>
</tt><tt>2017-02-03 15:29:35,930 INFO (JsonRpc (StompReactor))
[Broker.StompAdapter] Subscribe command received
(stompreactor:129)</tt><tt><br>
</tt><tt>2017-02-03 15:29:36,067 INFO (jsonrpc/0)
[jsonrpc.JsonRpcServer] RPC call Host.ping succeeded in 0.00
seconds (__init__:515)</tt><tt><br>
</tt><tt>2017-02-03 15:29:36,071 INFO (jsonrpc/1) [throttled]
Current getAllVmStats: {} (throttledlog:105)</tt><tt><br>
</tt><tt>2017-02-03 15:29:36,071 INFO (jsonrpc/1)
[jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in
0.00 seconds (__init__:515)</tt><tt><br>
</tt><tt>2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher]
Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br>
</tt><tt>2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher]
Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br>
</tt><tt>2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed
to retrieve Hosted Engine HA info (api:252)</tt><span class=""><tt><br>
</tt><tt>Traceback (most recent call last):</tt><tt><br>
</tt></span><tt> File
"/usr/lib/python2.7/site-<wbr>packages/vdsm/host/api.py", line 231,
in _getHaInfo</tt><tt><br>
</tt><tt> stats = instance.get_all_stats()</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-<wbr>packages/ovirt_hosted_engine_<wbr>ha/client/client.py",
line 103, in get_all_stats</tt><tt><br>
</tt><tt> self._configure_broker_conn(<wbr>broker)</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-<wbr>packages/ovirt_hosted_engine_<wbr>ha/client/client.py",
line 180, in _configure_broker_conn</tt><tt><br>
</tt><tt> dom_type=dom_type)</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-<wbr>packages/ovirt_hosted_engine_<wbr>ha/lib/brokerlink.py",
line 177, in set_storage_domain</tt><tt><br>
</tt><tt> .format(sd_type, options, e))</tt><tt><br>
</tt><tt>RequestError: Failed to set storage domain
FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid':
'7c8deaa8-be02-4aaf-b9b4-<wbr>ddc8da99ad96'}: Request failed:
<class 'ovirt_hos</tt><tt><br>
</tt><tt>ted_engine_ha.lib.storage_<wbr>backends.<wbr>BackendFailureException'></tt><tt><br>
</tt><tt>2017-02-03 15:29:51,095 INFO (jsonrpc/2)
[jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in
0.00 seconds (__init__:515)</tt><tt><br>
</tt><tt>2017-02-03 15:29:51,219 INFO (jsonrpc/3)
[jsonrpc.JsonRpcServer] RPC call Host.setKsmTune succeeded in
0.00 seconds (__init__:515)</tt><tt><br>
</tt><tt>2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher]
Run and protect: repoStats(options=None) (logUtils:49)</tt><tt><br>
</tt><tt>2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher]
Run and protect: repoStats, Return response: {} (logUtils:52)</tt><tt><br>
</tt><tt>2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed
to retrieve Hosted Engine HA info (api:252)</tt><br>
<br>
</p><span class="">
<p><br>
</p>
<br>
<div class="m_2083194111166540231moz-cite-prefix">Am 03.02.2017 um 13:39 schrieb Simone
Tiraboschi:<br>
</div>
</span><div><div class="h5"><blockquote type="cite">
<div dir="ltr">I see there an ERROR on stopMonitoringDomain but I
cannot see the correspondent startMonitoringDomain; could you
please look for it?</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Feb 3, 2017 at 1:16 PM, Ralf
Schenk <span dir="ltr"><<a href="mailto:rs@databay.de" target="_blank">rs@databay.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Hello,</p>
<p>attached is my vdsm.log from the host with
hosted-engine-ha around the time-frame of agent timeout
that is not working anymore for engine (it works in
Ovirt and is active). It simply isn't working for
engine-ha anymore after Update.</p>
<p>At 2017-02-02 19:25:34,248 you'll find an error
corresponoding to agent timeout error.</p>
<p>Bye<br>
</p>
<div>
<div class="m_2083194111166540231h5">
<p><br>
</p>
<br>
<div class="m_2083194111166540231m_-5371711976759655950moz-cite-prefix">Am
03.02.2017 um 11:28 schrieb Simone Tiraboschi:<br>
</div>
<blockquote type="cite">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><span>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>3. Three of my hosts have the
hosted engine deployed for ha. First
all three where marked by a crown
(running was gold and others where
silver). After upgrading the 3 Host
deployed hosted engine ha is not
active anymore.</p>
<p>I can't get this host back with
working ovirt-ha-agent/broker. I
already rebooted, manually restarted
the services but It isn't able to
get cluster state according to <br>
"hosted-engine --vm-status". The
other hosts state the host status as
"unknown stale-data"</p>
<p>I already shut down all agents on
all hosts and issued a
"hosted-engine
--reinitialize-lockspace" but that
didn't help.<br>
</p>
<p>Agents stops working after a
timeout-error according to log:</p>
<p><tt>MainThread::INFO::2017-02-02
19:24:52,040::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:24:59,185::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:06,333::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:13,554::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:20,710::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:27,865::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::ERROR::2017-02-02
19:25:27,866::hosted_engine::8<wbr>15::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_initialize_domain_monitor)
Failed to start monitoring domain
(sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96,
host_id=3): timeout during domain
acquisition</tt><tt><br>
</tt><tt>MainThread::WARNING::2017-02-0<wbr>2
19:25:27,866::hosted_engine::4<wbr>69::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)
Error while monitoring engine:
Failed to start monitoring domain
(sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96,
host_id=3): timeout during domain
acquisition</tt><tt><br>
</tt><tt>MainThread::WARNING::2017-02-0<wbr>2
19:25:27,866::hosted_engine::4<wbr>72::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)
Unexpected error</tt><tt><br>
</tt><tt>Traceback (most recent call
last):</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py",
line 443, in start_monitoring</tt><tt><br>
</tt><tt>
self._initialize_domain_monito<wbr>r()</tt><tt><br>
</tt><tt> File
"/usr/lib/python2.7/site-packa<wbr>ges/ovirt_hosted_engine_ha/age<wbr>nt/hosted_engine.py",
line 816, in
_initialize_domain_monitor</tt><tt><br>
</tt><tt> raise Exception(msg)</tt><tt><br>
</tt><tt>Exception: Failed to start
monitoring domain
(sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96,
host_id=3): timeout during domain
acquisition</tt><tt><br>
</tt><tt>MainThread::ERROR::2017-02-02
19:25:27,866::hosted_engine::4<wbr>85::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(start_monitoring)
Shutting down the agent because of
3 failures in a row!</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:32,087::hosted_engine::8<wbr>41::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_get_domain_monitor_status)
VDSM domain monitor status:
PENDING</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:34,250::hosted_engine::7<wbr>69::ovirt_hosted_engine_ha.age<wbr>nt.hosted_engine.HostedEngine:<wbr>:(_stop_domain_monitor)
Failed to stop monitoring domain
(sd_uuid=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96):
Storage domain is member of pool:
u'domain=7c8deaa8-be02-4aaf-b9<wbr>b4-ddc8da99ad96'</tt><tt><br>
</tt><tt>MainThread::INFO::2017-02-02
19:25:34,254::agent::143::ovir<wbr>t_hosted_engine_ha.agent.agent<wbr>.Agent::(run)
Agent shutting down</tt></p>
</div>
</blockquote>
</span>
<div>Simone, Martin, can you please follow
up on this?</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Ralph, could you please attach vdsm logs from
on of your hosts for the relevant time frame?</div>
</blockquote>
<br>
</div>
</div>
<span>
<div class="m_2083194111166540231m_-5371711976759655950moz-signature">-- <br>
<p> </p>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td colspan="3"><img src="cid:part2.442CE625.84474DBE@databay.de" height="30" border="0" width="151"></td>
</tr>
<tr>
<td valign="top"> <font face="Verdana, Arial,
sans-serif" size="-1"><br>
<b>Ralf Schenk</b><br>
fon <a href="tel:+49%202405%20408370" value="+492405408370" target="_blank">+49
(0) 24 05 / 40 83 70</a><br>
fax <a href="tel:+49%202405%204083759" value="+4924054083759" target="_blank">+49
(0) 24 05 / 40 83 759</a><br>
mail <a href="mailto:rs@databay.de" target="_blank"><font color="#FF0000"><b>rs@databay.de</b></font></a><br>
</font> </td>
<td width="30"> </td>
<td valign="top"> <font face="Verdana, Arial,
sans-serif" size="-1"><br>
<b>Databay AG</b><br>
Jens-Otto-Krag-Straße 11<br>
D-52146 Würselen<br>
<a href="http://www.databay.de" target="_blank"><font color="#FF0000"><b>www.databay.de</b></font></a>
</font> </td>
</tr>
<tr>
<td colspan="3" valign="top"> <font face="Verdana, Arial, sans-serif" size="1"><br>
Sitz/Amtsgericht Aachen • HRB:8437 •
USt-IdNr.: DE 210844202<br>
Vorstand: Ralf Schenk, Dipl.-Ing. Jens
Conze, Aresch Yavari, Dipl.-Kfm. Philipp
Hermanns<br>
Aufsichtsratsvorsitzender: Wilhelm Dohmen </font>
</td>
</tr>
</tbody>
</table>
<hr color="#000000" noshade size="1" width="100%"> </div>
</span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<div class="m_2083194111166540231moz-signature">-- <br>
<p>
</p>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td colspan="3"><img src="cid:part7.257995E9.651557B2@databay.de" height="30" border="0" width="151"></td>
</tr>
<tr>
<td valign="top"> <font face="Verdana, Arial, sans-serif" size="-1"><br>
<b>Ralf Schenk</b><br>
fon <a href="tel:+49%202405%20408370" value="+492405408370" target="_blank">+49 (0) 24 05 / 40 83 70</a><br>
fax <a href="tel:+49%202405%204083759" value="+4924054083759" target="_blank">+49 (0) 24 05 / 40 83 759</a><br>
mail <a href="mailto:rs@databay.de" target="_blank"><font color="#FF0000"><b>rs@databay.de</b></font></a><br>
</font> </td>
<td width="30"> </td>
<td valign="top"> <font face="Verdana, Arial, sans-serif" size="-1"><br>
<b>Databay AG</b><br>
Jens-Otto-Krag-Straße 11<br>
D-52146 Würselen<br>
<a href="http://www.databay.de" target="_blank"><font color="#FF0000"><b>www.databay.de</b></font></a>
</font> </td>
</tr>
<tr>
<td colspan="3" valign="top"> <font face="Verdana, Arial,
sans-serif" size="1"><br>
Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE
210844202<br>
Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch
Yavari, Dipl.-Kfm. Philipp Hermanns<br>
Aufsichtsratsvorsitzender: Wilhelm Dohmen </font> </td>
</tr>
</tbody>
</table>
<hr color="#000000" noshade size="1" width="100%">
</div>
</div></div></div>
</blockquote></div><br></div>