Hello,
of course:
[root@microcloud27 mnt]# sanlock client status
daemon 8a93c9ea-e242-408c-a63d-a9356bb22df5.microcloud
p -1 helper
p -1 listener
p -1 status
sanlock.log attached. (Beginning 2017-01-27 where everything was fine)
Bye
Am 03.02.2017 um 16:12 schrieb Simone Tiraboschi:
The hosted-engine storage domain is mounted for sure,
but the issue is here:
Exception: Failed to start monitoring domain
(sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout
during domain acquisition
The point is that in VDSM logs I see just something like:
2017-02-02 21:05:22,283 INFO (jsonrpc/1) [dispatcher] Run and
protect: repoStats(options=None) (logUtils:49)
2017-02-02 21:05:22,285 INFO (jsonrpc/1) [dispatcher] Run and
protect: repoStats, Return response:
{u'a7fbaaad-7043-4391-9523-3bedcdc4fb0d': {'code': 0, 'actual':
True,
'version': 0, 'acquired': True, 'delay': '0.000748727',
'lastCheck':
'0.1', 'valid': True}, u'2b2a44fc-f2bd-47cd-b7af-00be59e30a35':
{'code': 0, 'actual': True, 'version': 0, 'acquired':
True, 'delay':
'0.00082529', 'lastCheck': '0.1', 'valid': True},
u'5d99af76-33b5-47d8-99da-1f32413c7bb0': {'code': 0, 'actual':
True,
'version': 4, 'acquired': True, 'delay': '0.000349356',
'lastCheck':
'5.3', 'valid': True}, u'7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96':
{'code': 0, 'actual': True, 'version': 4, 'acquired':
False, 'delay':
'0.000377052', 'lastCheck': '0.6', 'valid': True}}
(logUtils:52)
Where the other storage domains have 'acquired': True whil it's
always 'acquired': False for the hosted-engine storage domain.
Could you please share your /var/log/sanlock.log from the same host
and the output of
sanlock client status
?
On Fri, Feb 3, 2017 at 3:52 PM, Ralf Schenk <rs(a)databay.de
<mailto:rs@databay.de>> wrote:
Hello,
I also put host in Maintenance and restarted vdsm while
ovirt-ha-agent is running. I can mount the gluster Volume "engine"
manually in the host.
I get this repeatedly in /var/log/vdsm.log:
2017-02-03 15:29:28,891 INFO (MainThread) [vds] Exiting (vdsm:167)
2017-02-03 15:29:30,974 INFO (MainThread) [vds] (PID: 11456) I am
the actual vdsm 4.19.4-1.el7.centos microcloud27
(3.10.0-514.6.1.el7.x86_64) (vdsm:145)
2017-02-03 15:29:30,974 INFO (MainThread) [vds] VDSM will run
with cpu affinity: frozenset([1]) (vdsm:251)
2017-02-03 15:29:31,013 INFO (MainThread) [storage.check]
Starting check service (check:91)
2017-02-03 15:29:31,017 INFO (MainThread) [storage.Dispatcher]
Starting StorageDispatcher... (dispatcher:47)
2017-02-03 15:29:31,017 INFO (check/loop) [storage.asyncevent]
Starting <EventLoop running=True closed=False at 0x37480464>
(asyncevent:122)
2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and
protect:
registerDomainStateChangeCallback(callbackFunc=<functools.partial
object at 0x2881fc8>) (logUtils:49)
2017-02-03 15:29:31,156 INFO (MainThread) [dispatcher] Run and
protect: registerDomainStateChangeCallback, Return response: None
(logUtils:52)
2017-02-03 15:29:31,160 INFO (MainThread) [MOM] Preparing MOM
interface (momIF:49)
2017-02-03 15:29:31,161 INFO (MainThread) [MOM] Using named unix
socket /var/run/vdsm/mom-vdsm.sock (momIF:58)
2017-02-03 15:29:31,162 INFO (MainThread) [root] Unregistering
all secrets (secret:91)
2017-02-03 15:29:31,164 INFO (MainThread) [vds] Setting channels'
timeout to 30 seconds. (vmchannels:223)
2017-02-03 15:29:31,165 INFO (MainThread)
[vds.MultiProtocolAcceptor] Listening at :::54321
(protocoldetector:185)
2017-02-03 15:29:31,354 INFO (vmrecovery) [vds] recovery:
completed in 0s (clientIF:495)
2017-02-03 15:29:31,371 INFO (BindingXMLRPC) [vds] XMLRPC server
running (bindingxmlrpc:63)
2017-02-03 15:29:31,471 INFO (periodic/1) [dispatcher] Run and
protect: repoStats(options=None) (logUtils:49)
2017-02-03 15:29:31,472 INFO (periodic/1) [dispatcher] Run and
protect: repoStats, Return response: {} (logUtils:52)
2017-02-03 15:29:31,472 WARN (periodic/1) [MOM] MOM not
available. (momIF:116)
2017-02-03 15:29:31,473 WARN (periodic/1) [MOM] MOM not
available, KSM stats will be missing. (momIF:79)
2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed to
retrieve Hosted Engine HA info (api:252)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line
231, in _getHaInfo
stats = instance.get_all_stats()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 103, in get_all_stats
self._configure_broker_conn(broker)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 180, in _configure_broker_conn
dom_type=dom_type)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 177, in set_storage_domain
.format(sd_type, options, e))
RequestError: Failed to set storage domain FilesystemBackend,
options {'dom_type': 'glusterfs', 'sd_uuid':
'7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed: <class
'ovirt_hos
ted_engine_ha.lib.storage_backends.BackendFailureException'>
2017-02-03 15:29:35,920 INFO (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from ::1:49506
(protocoldetector:72)
2017-02-03 15:29:35,929 INFO (Reactor thread)
[ProtocolDetector.Detector] Detected protocol stomp from ::1:49506
(protocoldetector:127)
2017-02-03 15:29:35,930 INFO (Reactor thread)
[Broker.StompAdapter] Processing CONNECT request (stompreactor:102)
2017-02-03 15:29:35,930 INFO (JsonRpc (StompReactor))
[Broker.StompAdapter] Subscribe command received (stompreactor:129)
2017-02-03 15:29:36,067 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer]
RPC call Host.ping succeeded in 0.00 seconds (__init__:515)
2017-02-03 15:29:36,071 INFO (jsonrpc/1) [throttled] Current
getAllVmStats: {} (throttledlog:105)
2017-02-03 15:29:36,071 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)
2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and
protect: repoStats(options=None) (logUtils:49)
2017-02-03 15:29:46,435 INFO (periodic/0) [dispatcher] Run and
protect: repoStats, Return response: {} (logUtils:52)
2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed to
retrieve Hosted Engine HA info (api:252)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line
231, in _getHaInfo
stats = instance.get_all_stats()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 103, in get_all_stats
self._configure_broker_conn(broker)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 180, in _configure_broker_conn
dom_type=dom_type)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 177, in set_storage_domain
.format(sd_type, options, e))
RequestError: Failed to set storage domain FilesystemBackend,
options {'dom_type': 'glusterfs', 'sd_uuid':
'7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}: Request failed: <class
'ovirt_hos
ted_engine_ha.lib.storage_backends.BackendFailureException'>
2017-02-03 15:29:51,095 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)
2017-02-03 15:29:51,219 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer]
RPC call Host.setKsmTune succeeded in 0.00 seconds (__init__:515)
2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and
protect: repoStats(options=None) (logUtils:49)
2017-02-03 15:30:01,444 INFO (periodic/1) [dispatcher] Run and
protect: repoStats, Return response: {} (logUtils:52)
2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed to
retrieve Hosted Engine HA info (api:252)
Am 03.02.2017 um 13:39 schrieb Simone Tiraboschi:
> I see there an ERROR on stopMonitoringDomain but I cannot see the
> correspondent startMonitoringDomain; could you please look for it?
>
> On Fri, Feb 3, 2017 at 1:16 PM, Ralf Schenk <rs(a)databay.de
> <mailto:rs@databay.de>> wrote:
>
> Hello,
>
> attached is my vdsm.log from the host with hosted-engine-ha
> around the time-frame of agent timeout that is not working
> anymore for engine (it works in Ovirt and is active). It
> simply isn't working for engine-ha anymore after Update.
>
> At 2017-02-02 19:25:34,248 you'll find an error
> corresponoding to agent timeout error.
>
> Bye
>
>
>
> Am 03.02.2017 um 11:28 schrieb Simone Tiraboschi:
>>
>> 3. Three of my hosts have the hosted engine deployed
>> for ha. First all three where marked by a crown
>> (running was gold and others where silver). After
>> upgrading the 3 Host deployed hosted engine ha is
>> not active anymore.
>>
>> I can't get this host back with working
>> ovirt-ha-agent/broker. I already rebooted, manually
>> restarted the services but It isn't able to get
>> cluster state according to
>> "hosted-engine --vm-status". The other hosts state
>> the host status as "unknown stale-data"
>>
>> I already shut down all agents on all hosts and
>> issued a "hosted-engine --reinitialize-lockspace"
>> but that didn't help.
>>
>> Agents stops working after a timeout-error according
>> to log:
>>
>> MainThread::INFO::2017-02-02
>>
19:24:52,040::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::INFO::2017-02-02
>>
19:24:59,185::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::INFO::2017-02-02
>>
19:25:06,333::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::INFO::2017-02-02
>>
19:25:13,554::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::INFO::2017-02-02
>>
19:25:20,710::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::INFO::2017-02-02
>>
19:25:27,865::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::ERROR::2017-02-02
>>
19:25:27,866::hosted_engine::815::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>> Failed to start monitoring domain
>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96,
>> host_id=3): timeout during domain acquisition
>> MainThread::WARNING::2017-02-02
>>
19:25:27,866::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Error while monitoring engine: Failed to start
>> monitoring domain
>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96,
>> host_id=3): timeout during domain acquisition
>> MainThread::WARNING::2017-02-02
>>
19:25:27,866::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Unexpected error
>> Traceback (most recent call last):
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 443, in start_monitoring
>> self._initialize_domain_monitor()
>> File
>>
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 816, in _initialize_domain_monitor
>> raise Exception(msg)
>> Exception: Failed to start monitoring domain
>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96,
>> host_id=3): timeout during domain acquisition
>> MainThread::ERROR::2017-02-02
>>
19:25:27,866::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Shutting down the agent because of 3 failures in a row!
>> MainThread::INFO::2017-02-02
>>
19:25:32,087::hosted_engine::841::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>> VDSM domain monitor status: PENDING
>> MainThread::INFO::2017-02-02
>>
19:25:34,250::hosted_engine::769::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>> Failed to stop monitoring domain
>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96):
>> Storage domain is member of pool:
>> u'domain=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'
>> MainThread::INFO::2017-02-02
>>
19:25:34,254::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>> Agent shutting down
>>
>> Simone, Martin, can you please follow up on this?
>>
>>
>> Ralph, could you please attach vdsm logs from on of your
>> hosts for the relevant time frame?
>
> --
>
>
> *Ralf Schenk*
> fon +49 (0) 24 05 / 40 83 70 <tel:+49%202405%20408370>
> fax +49 (0) 24 05 / 40 83 759 <tel:+49%202405%204083759>
> mail *rs(a)databay.de* <mailto:rs@databay.de>
>
> *Databay AG*
> Jens-Otto-Krag-Straße 11
> D-52146 Würselen
> *www.databay.de* <
http://www.databay.de>
>
> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari,
> Dipl.-Kfm. Philipp Hermanns
> Aufsichtsratsvorsitzender: Wilhelm Dohmen
>
> ------------------------------------------------------------------------
>
>
--
*Ralf Schenk*
fon +49 (0) 24 05 / 40 83 70 <tel:+49%202405%20408370>
fax +49 (0) 24 05 / 40 83 759 <tel:+49%202405%204083759>
mail *rs(a)databay.de* <mailto:rs@databay.de>
*Databay AG*
Jens-Otto-Krag-Straße 11
D-52146 Würselen
*www.databay.de* <
http://www.databay.de>
Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari,
Dipl.-Kfm. Philipp Hermanns
Aufsichtsratsvorsitzender: Wilhelm Dohmen
------------------------------------------------------------------------
--
*Ralf Schenk*
fon +49 (0) 24 05 / 40 83 70
fax +49 (0) 24 05 / 40 83 759
mail *rs(a)databay.de* <mailto:rs@databay.de>
*Databay AG*
Jens-Otto-Krag-Straße 11
D-52146 Würselen
*www.databay.de* <
Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
Philipp Hermanns
Aufsichtsratsvorsitzender: Wilhelm Dohmen
------------------------------------------------------------------------