[ovirt-users] [Call for feedback] did you install/update to 4.1.0?

Simone Tiraboschi stirabos at redhat.com
Fri Feb 3 18:23:24 UTC 2017


On Fri, Feb 3, 2017 at 7:20 PM, Simone Tiraboschi <stirabos at redhat.com>
wrote:

>
>
> On Fri, Feb 3, 2017 at 5:22 PM, Ralf Schenk <rs at databay.de> wrote:
>
>> Hello,
>>
>> of course:
>>
>> [root at microcloud27 mnt]# sanlock client status
>> daemon 8a93c9ea-e242-408c-a63d-a9356bb22df5.microcloud
>> p -1 helper
>> p -1 listener
>> p -1 status
>>
>> sanlock.log attached. (Beginning 2017-01-27 where everything was fine)
>>
> Thanks, the issue is here:
>
> 2017-02-02 19:01:22+0100 4848 [1048]: s36 lockspace 7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96:3:/rhev/data-center/mnt/glusterSD/glusterfs.rxmgmt.databay.de:_engine/7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96/dom_md/ids:0
> 2017-02-02 19:03:42+0100 4988 [12983]: s36 delta_acquire host_id 3 busy1 3 15 13129 7ad427b1-fbb6-4cee-b9ee-01f596fddfbb.microcloud
> 2017-02-02 19:03:43+0100 4989 [1048]: s36 add_lockspace fail result -262
>
> Could you please check if you have other hosts contending for the same ID
> (id=3 in this case).
>

Another option is to manually force a sanlock renewal on that host and
check what happens, something like:
sanlock client renewal -s 7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96:3:/rhev/data-
center/mnt/glusterSD/glusterfs.rxmgmt.databay.de:_engine/7c8deaa8-be02-4aaf-
b9b4-ddc8da99ad96/dom_md/ids:0


>
>
>> Bye
>>
>> Am 03.02.2017 um 16:12 schrieb Simone Tiraboschi:
>>
>> The hosted-engine storage domain is mounted for sure,
>> but the issue is here:
>> Exception: Failed to start monitoring domain
>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout
>> during domain acquisition
>>
>> The point is that in VDSM logs I see just something like:
>> 2017-02-02 21:05:22,283 INFO  (jsonrpc/1) [dispatcher] Run and protect:
>> repoStats(options=None) (logUtils:49)
>> 2017-02-02 21:05:22,285 INFO  (jsonrpc/1) [dispatcher] Run and protect:
>> repoStats, Return response: {u'a7fbaaad-7043-4391-9523-3bedcdc4fb0d':
>> {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay':
>> '0.000748727', 'lastCheck': '0.1', 'valid': True},
>> u'2b2a44fc-f2bd-47cd-b7af-00be59e30a35': {'code': 0, 'actual': True,
>> 'version': 0, 'acquired': True, 'delay': '0.00082529', 'lastCheck': '0.1',
>> 'valid': True}, u'5d99af76-33b5-47d8-99da-1f32413c7bb0': {'code': 0,
>> 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000349356',
>> 'lastCheck': '5.3', 'valid': True}, u'7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96':
>> {'code': 0, 'actual': True, 'version': 4, 'acquired': False, 'delay':
>> '0.000377052', 'lastCheck': '0.6', 'valid': True}} (logUtils:52)
>>
>> Where the other storage domains have 'acquired': True whil it's
>> always 'acquired': False for the hosted-engine storage domain.
>>
>> Could you please share your /var/log/sanlock.log from the same host and
>> the output of
>>  sanlock client status
>> ?
>>
>>
>>
>>
>> On Fri, Feb 3, 2017 at 3:52 PM, Ralf Schenk <rs at databay.de> wrote:
>>
>>> Hello,
>>>
>>> I also put host in Maintenance and restarted vdsm while ovirt-ha-agent
>>> is running. I can mount the gluster Volume "engine" manually in the host.
>>>
>>> I get this repeatedly in /var/log/vdsm.log:
>>>
>>> 2017-02-03 15:29:28,891 INFO  (MainThread) [vds] Exiting (vdsm:167)
>>> 2017-02-03 15:29:30,974 INFO  (MainThread) [vds] (PID: 11456) I am the
>>> actual vdsm 4.19.4-1.el7.centos microcloud27 (3.10.0-514.6.1.el7.x86_64)
>>> (vdsm:145)
>>> 2017-02-03 15:29:30,974 INFO  (MainThread) [vds] VDSM will run with cpu
>>> affinity: frozenset([1]) (vdsm:251)
>>> 2017-02-03 15:29:31,013 INFO  (MainThread) [storage.check] Starting
>>> check service (check:91)
>>> 2017-02-03 15:29:31,017 INFO  (MainThread) [storage.Dispatcher] Starting
>>> StorageDispatcher... (dispatcher:47)
>>> 2017-02-03 15:29:31,017 INFO  (check/loop) [storage.asyncevent] Starting
>>> <EventLoop running=True closed=False at 0x37480464> (asyncevent:122)
>>> 2017-02-03 15:29:31,156 INFO  (MainThread) [dispatcher] Run and protect:
>>> registerDomainStateChangeCallback(callbackFunc=<functools.partial
>>> object at 0x2881fc8>) (logUtils:49)
>>> 2017-02-03 15:29:31,156 INFO  (MainThread) [dispatcher] Run and protect:
>>> registerDomainStateChangeCallback, Return response: None (logUtils:52)
>>> 2017-02-03 15:29:31,160 INFO  (MainThread) [MOM] Preparing MOM interface
>>> (momIF:49)
>>> 2017-02-03 15:29:31,161 INFO  (MainThread) [MOM] Using named unix socket
>>> /var/run/vdsm/mom-vdsm.sock (momIF:58)
>>> 2017-02-03 15:29:31,162 INFO  (MainThread) [root] Unregistering all
>>> secrets (secret:91)
>>> 2017-02-03 15:29:31,164 INFO  (MainThread) [vds] Setting channels'
>>> timeout to 30 seconds. (vmchannels:223)
>>> 2017-02-03 15:29:31,165 INFO  (MainThread) [vds.MultiProtocolAcceptor]
>>> Listening at :::54321 (protocoldetector:185)
>>> 2017-02-03 15:29:31,354 INFO  (vmrecovery) [vds] recovery: completed in
>>> 0s (clientIF:495)
>>> 2017-02-03 15:29:31,371 INFO  (BindingXMLRPC) [vds] XMLRPC server
>>> running (bindingxmlrpc:63)
>>> 2017-02-03 15:29:31,471 INFO  (periodic/1) [dispatcher] Run and protect:
>>> repoStats(options=None) (logUtils:49)
>>> 2017-02-03 15:29:31,472 INFO  (periodic/1) [dispatcher] Run and protect:
>>> repoStats, Return response: {} (logUtils:52)
>>> 2017-02-03 15:29:31,472 WARN  (periodic/1) [MOM] MOM not available.
>>> (momIF:116)
>>> 2017-02-03 15:29:31,473 WARN  (periodic/1) [MOM] MOM not available, KSM
>>> stats will be missing. (momIF:79)
>>> 2017-02-03 15:29:31,474 ERROR (periodic/1) [root] failed to retrieve
>>> Hosted Engine HA info (api:252)
>>> Traceback (most recent call last):
>>>   File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231,
>>> in _getHaInfo
>>>     stats = instance.get_all_stats()
>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>> line 103, in get_all_stats
>>>     self._configure_broker_conn(broker)
>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>> line 180, in _configure_broker_conn
>>>     dom_type=dom_type)
>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> line 177, in set_storage_domain
>>>     .format(sd_type, options, e))
>>> RequestError: Failed to set storage domain FilesystemBackend, options
>>> {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}:
>>> Request failed: <class 'ovirt_hos
>>> ted_engine_ha.lib.storage_backends.BackendFailureException'>
>>> 2017-02-03 15:29:35,920 INFO  (Reactor thread)
>>> [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:49506
>>> (protocoldetector:72)
>>> 2017-02-03 15:29:35,929 INFO  (Reactor thread)
>>> [ProtocolDetector.Detector] Detected protocol stomp from ::1:49506
>>> (protocoldetector:127)
>>> 2017-02-03 15:29:35,930 INFO  (Reactor thread) [Broker.StompAdapter]
>>> Processing CONNECT request (stompreactor:102)
>>> 2017-02-03 15:29:35,930 INFO  (JsonRpc (StompReactor))
>>> [Broker.StompAdapter] Subscribe command received (stompreactor:129)
>>> 2017-02-03 15:29:36,067 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC
>>> call Host.ping succeeded in 0.00 seconds (__init__:515)
>>> 2017-02-03 15:29:36,071 INFO  (jsonrpc/1) [throttled] Current
>>> getAllVmStats: {} (throttledlog:105)
>>> 2017-02-03 15:29:36,071 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
>>> call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)
>>> 2017-02-03 15:29:46,435 INFO  (periodic/0) [dispatcher] Run and protect:
>>> repoStats(options=None) (logUtils:49)
>>> 2017-02-03 15:29:46,435 INFO  (periodic/0) [dispatcher] Run and protect:
>>> repoStats, Return response: {} (logUtils:52)
>>> 2017-02-03 15:29:46,439 ERROR (periodic/0) [root] failed to retrieve
>>> Hosted Engine HA info (api:252)
>>> Traceback (most recent call last):
>>>   File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231,
>>> in _getHaInfo
>>>     stats = instance.get_all_stats()
>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>> line 103, in get_all_stats
>>>     self._configure_broker_conn(broker)
>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>> line 180, in _configure_broker_conn
>>>     dom_type=dom_type)
>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> line 177, in set_storage_domain
>>>     .format(sd_type, options, e))
>>> RequestError: Failed to set storage domain FilesystemBackend, options
>>> {'dom_type': 'glusterfs', 'sd_uuid': '7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96'}:
>>> Request failed: <class 'ovirt_hos
>>> ted_engine_ha.lib.storage_backends.BackendFailureException'>
>>> 2017-02-03 15:29:51,095 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
>>> call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)
>>> 2017-02-03 15:29:51,219 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
>>> call Host.setKsmTune succeeded in 0.00 seconds (__init__:515)
>>> 2017-02-03 15:30:01,444 INFO  (periodic/1) [dispatcher] Run and protect:
>>> repoStats(options=None) (logUtils:49)
>>> 2017-02-03 15:30:01,444 INFO  (periodic/1) [dispatcher] Run and protect:
>>> repoStats, Return response: {} (logUtils:52)
>>> 2017-02-03 15:30:01,448 ERROR (periodic/1) [root] failed to retrieve
>>> Hosted Engine HA info (api:252)
>>>
>>>
>>>
>>> Am 03.02.2017 um 13:39 schrieb Simone Tiraboschi:
>>>
>>> I see there an ERROR on stopMonitoringDomain but I cannot see the
>>> correspondent  startMonitoringDomain; could you please look for it?
>>>
>>> On Fri, Feb 3, 2017 at 1:16 PM, Ralf Schenk <rs at databay.de> wrote:
>>>
>>>> Hello,
>>>>
>>>> attached is my vdsm.log from the host with hosted-engine-ha around the
>>>> time-frame of agent timeout that is not working anymore for engine (it
>>>> works in Ovirt and is active). It simply isn't working for engine-ha
>>>> anymore after Update.
>>>>
>>>> At 2017-02-02 19:25:34,248 you'll find an error corresponoding to agent
>>>> timeout error.
>>>>
>>>> Bye
>>>>
>>>>
>>>>
>>>> Am 03.02.2017 um 11:28 schrieb Simone Tiraboschi:
>>>>
>>>> 3. Three of my hosts have the hosted engine deployed for ha. First all
>>>>>> three where marked by a crown (running was gold and others where silver).
>>>>>> After upgrading the 3 Host deployed hosted engine ha is not active anymore.
>>>>>>
>>>>>> I can't get this host back with working ovirt-ha-agent/broker. I
>>>>>> already rebooted, manually restarted the services but It isn't able to get
>>>>>> cluster state according to
>>>>>> "hosted-engine --vm-status". The other hosts state the host status as
>>>>>> "unknown stale-data"
>>>>>>
>>>>>> I already shut down all agents on all hosts and issued a
>>>>>> "hosted-engine --reinitialize-lockspace" but that didn't help.
>>>>>>
>>>>>> Agents stops working after a timeout-error according to log:
>>>>>>
>>>>>> MainThread::INFO::2017-02-02 19:24:52,040::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-02-02 19:24:59,185::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-02-02 19:25:06,333::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-02-02 19:25:13,554::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-02-02 19:25:20,710::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-02-02 19:25:27,865::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::8
>>>>>> 15::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>> Failed to start monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96,
>>>>>> host_id=3): timeout during domain acquisition
>>>>>> MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::4
>>>>>> 69::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout
>>>>>> during domain acquisition
>>>>>> MainThread::WARNING::2017-02-02 19:25:27,866::hosted_engine::4
>>>>>> 72::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Unexpected error
>>>>>> Traceback (most recent call last):
>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 443, in start_monitoring
>>>>>>     self._initialize_domain_monitor()
>>>>>>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 816, in _initialize_domain_monitor
>>>>>>     raise Exception(msg)
>>>>>> Exception: Failed to start monitoring domain
>>>>>> (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96, host_id=3): timeout
>>>>>> during domain acquisition
>>>>>> MainThread::ERROR::2017-02-02 19:25:27,866::hosted_engine::4
>>>>>> 85::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>> MainThread::INFO::2017-02-02 19:25:32,087::hosted_engine::8
>>>>>> 41::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>> VDSM domain monitor status: PENDING
>>>>>> MainThread::INFO::2017-02-02 19:25:34,250::hosted_engine::7
>>>>>> 69::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>> Failed to stop monitoring domain (sd_uuid=7c8deaa8-be02-4aaf-b9b4-ddc8da99ad96):
>>>>>> Storage domain is member of pool: u'domain=7c8deaa8-be02-4aaf-b9
>>>>>> b4-ddc8da99ad96'
>>>>>> MainThread::INFO::2017-02-02 19:25:34,254::agent::143::ovir
>>>>>> t_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
>>>>>>
>>>>> Simone, Martin, can you please follow up on this?
>>>>>
>>>>
>>>> Ralph, could you please attach vdsm logs from on of your hosts for the
>>>> relevant time frame?
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> *Ralf Schenk*
>>>> fon +49 (0) 24 05 / 40 83 70 <+49%202405%20408370>
>>>> fax +49 (0) 24 05 / 40 83 759 <+49%202405%204083759>
>>>> mail *rs at databay.de* <rs at databay.de>
>>>>
>>>> *Databay AG*
>>>> Jens-Otto-Krag-Straße 11
>>>> D-52146 Würselen
>>>> *www.databay.de* <http://www.databay.de>
>>>>
>>>> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
>>>> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
>>>> Philipp Hermanns
>>>> Aufsichtsratsvorsitzender: Wilhelm Dohmen
>>>> ------------------------------
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> *Ralf Schenk*
>>> fon +49 (0) 24 05 / 40 83 70 <+49%202405%20408370>
>>> fax +49 (0) 24 05 / 40 83 759 <+49%202405%204083759>
>>> mail *rs at databay.de* <rs at databay.de>
>>>
>>> *Databay AG*
>>> Jens-Otto-Krag-Straße 11
>>> D-52146 Würselen
>>> *www.databay.de* <http://www.databay.de>
>>>
>>> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
>>> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
>>> Philipp Hermanns
>>> Aufsichtsratsvorsitzender: Wilhelm Dohmen
>>> ------------------------------
>>>
>>
>>
>> --
>>
>>
>> *Ralf Schenk*
>> fon +49 (0) 24 05 / 40 83 70 <+49%202405%20408370>
>> fax +49 (0) 24 05 / 40 83 759 <+49%202405%204083759>
>> mail *rs at databay.de* <rs at databay.de>
>>
>> *Databay AG*
>> Jens-Otto-Krag-Straße 11
>> D-52146 Würselen
>> *www.databay.de* <http://www.databay.de>
>>
>> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
>> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
>> Philipp Hermanns
>> Aufsichtsratsvorsitzender: Wilhelm Dohmen
>> ------------------------------
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170203/97834183/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo_databay_email.gif
Type: image/gif
Size: 1250 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170203/97834183/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1250 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170203/97834183/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1250 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170203/97834183/attachment-0005.gif>


More information about the Users mailing list