I still can't resolve this issue.
I have a host that is stuck in a cycle; it will be marked non responsive, then come back
up, ending with an "finished activation" message in the GUI. Then it repeats.
The root cause seems to be sanlock. I'm just unclear on why it started or how to
resolve it. The only "approved" knob I'm aware of is
--reinitialize-lockspace and the manual equivalent, neither of which fix anything.
Anyone have a guess?
-j
- - - vdsm.log - - - -
2018-03-28 10:38:22,207-0700 INFO (monitor/b41eb20) [storage.SANLock] Acquiring host id
for domain b41eb20a-eafb-481b-9a50-a135cf42b15e (id=1, async=True) (clusterlock:284)
2018-03-28 10:38:22,208-0700 ERROR (monitor/b41eb20) [storage.Monitor] Error acquiring
host id 1 for domain b41eb20a-eafb-481b-9a50-a135cf42b15e (monitor:568)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 565, in
_acquireHostId
self.domain.acquireHostId(self.hostId, async=True)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 828, in
acquireHostId
self._manifest.acquireHostId(hostId, async)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 453, in
acquireHostId
self._domainLock.acquireHostId(hostId, async)
File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 315,
in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
(u'b41eb20a-eafb-481b-9a50-a135cf42b15e', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))
2018-03-28 10:38:23,078-0700 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.ping2
succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:23,085-0700 INFO (jsonrpc/6) [vdsm.api] START
repoStats(domains=[u'b41eb20a-eafb-481b-9a50-a135cf42b15e']) from=::1,54450,
task_id=186d7e8b-7b4e-485d-a9e0-c0cb46eed621 (api:46)
2018-03-28 10:38:23,085-0700 INFO (jsonrpc/6) [vdsm.api] FINISH repoStats
return={u'b41eb20a-eafb-481b-9a50-a135cf42b15e': {'code': 0,
'actual': True, 'version': 4, 'acquired': False, 'delay':
'0.000812547', 'lastCheck': '0.4', 'valid': True}}
from=::1,54450, task_id=186d7e8b-7b4e-485d-a9e0-c0cb46eed621 (api:52)
2018-03-28 10:38:23,086-0700 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:23,092-0700 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker
name=periodic/6 running <Task <Operation action=<vdsm.virt.sampling.HostMonitor
object at 0x1d44110> at 0x1d44150> timeout=15, duration=150 at 0x7f076c05fb90>
task#=83985 at 0x7f082c08e510>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in
run
ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
_execute_task
task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
__call__
self._callable()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 213, in
__call__
self._func()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 578, in
__call__
stats = hostapi.get_stats(self._cif, self._samples.stats())
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 77, in
get_stats
ret['haStats'] = _getHaInfo()
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
_getHaInfo
stats = instance.get_all_stats()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line
93, in get_all_stats
stats = broker.get_stats_from_storage()
File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 135, in get_stats_from_storage
result = self._proxy.get_stats()
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
response = h.getresponse(buffering=True)
File: "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize) (executor:363)
2018-03-28 10:38:23,274-0700 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2
succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:24,297-0700 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2
succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:24,306-0700 INFO (jsonrpc/2) [vdsm.api] START
repoStats(domains=[u'b41eb20a-eafb-481b-9a50-a135cf42b15e']) from=::1,54450,
task_id=6a60e316-e4d7-415d-970a-a998710a5899 (api:46)
2018-03-28 10:38:24,306-0700 INFO (jsonrpc/2) [vdsm.api] FINISH repoStats
return={u'b41eb20a-eafb-481b-9a50-a135cf42b15e': {'code': 0,
'actual': True, 'version': 4, 'acquired': False, 'delay':
'0.000812547', 'lastCheck': '1.6', 'valid': True}}
from=::1,54450, task_id=6a60e316-e4d7-415d-970a-a998710a5899 (api:52)
2018-03-28 10:38:24,307-0700 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:24,374-0700 INFO (jsonrpc/7) [api.host] START getAllVmStats()
from=::ffff:10.181.26.150,46064 (api:46)
2018-03-28 10:38:24,377-0700 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats
return={'status': {'message': 'Done', 'code': 0},
'statsList': (suppressed)} from=::ffff:10.181.26.150,46064 (api:52)
2018-03-28 10:38:24,379-0700 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call
Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:24,529-0700 INFO (jsonrpc/5) [api.host] START getAllVmStats()
from=::1,54454 (api:46)
2018-03-28 10:38:24,532-0700 INFO (jsonrpc/5) [api.host] FINISH getAllVmStats
return={'status': {'message': 'Done', 'code': 0},
'statsList': (suppressed)} from=::1,54454 (api:52)
2018-03-28 10:38:24,533-0700 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call
Host.getAllVmStats succeeded in 0.01 seconds (__init__:573)
2018-03-28 10:38:24,545-0700 INFO (jsonrpc/6) [api.host] START getAllVmIoTunePolicies()
from=::1,54454 (api:46)
2018-03-28 10:38:24,546-0700 INFO (jsonrpc/6) [api.host] FINISH getAllVmIoTunePolicies
return={'status': {'message': 'Done', 'code': 0},
'io_tune_policies_dict': {'588a1394-4f28-4fb8-bcad-5b08d78ecd00':
{'policy': [], 'current_values': [{'ioTune':
{'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L,
'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec':
0L}, 'path':
u'/var/run/vdsm/storage/b41eb20a-eafb-481b-9a50-a135cf42b15e/a9d01d59-f146-47e5-b514-d10f8867678e/8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87',
'name': 'vda'}]}}} from=::1,54454 (api:52)
2018-03-28 10:38:24,547-0700 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call
Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:29,319-0700 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2
succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:29,327-0700 INFO (jsonrpc/0) [vdsm.api] START
repoStats(domains=[u'b41eb20a-eafb-481b-9a50-a135cf42b15e']) from=::1,54450,
task_id=c27c5e13-3b31-4182-9c14-11463c9b590a (api:46)
2018-03-28 10:38:29,327-0700 INFO (jsonrpc/0) [vdsm.api] FINISH repoStats
return={u'b41eb20a-eafb-481b-9a50-a135cf42b15e': {'code': 0,
'actual': True, 'version': 4, 'acquired': False, 'delay':
'0.000812547', 'lastCheck': '6.6', 'valid': True}}
from=::1,54450, task_id=c27c5e13-3b31-4182-9c14-11463c9b590a (api:52)
2018-03-28 10:38:29,328-0700 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:30,471-0700 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.ping2
succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:30,475-0700 INFO (jsonrpc/7) [api.host] START getCapabilities()
from=::1,54450 (api:46)