On čtvrtek 3. září 2020 22:49:17 CEST Gillingham, Eric J (US 393D) via Users
wrote:
I recently removed a host from my cluster to upgrade it to 4.4, after
I
removed the host from the datacenter VMs started to pause on the second
system they all migrated to. Investigating via the engine showed the
storage domain was showing as "unknown", when I try to activate it via the
engine it cycles to locked then to unknown again.
/var/log/sanlock.log contains a repeating:
add_lockspace
e1270474-108c-4cae-83d6-51698cffebbf:1:/dev/e1270474-108c-4cae-83d6-51698cf
febbf/ids:0 conflicts with name of list1 s1
e1270474-108c-4cae-83d6-51698cffebbf:3:/dev/e1270474-108c-4cae-83d6-51698cf
febbf/ids:0
how do you remove the fist host, did you put it into maintenance first? I
wonder, how this situation (two lockspaces with conflicting names) can occur.
You can try to re-initialize the lockspace directly using sanlock command (see
man sanlock), but it would be good to understand the situation first.
vdsm.log contains these (maybe related) snippets:
---
2020-09-03 20:19:53,483+0000 INFO (jsonrpc/6) [vdsm.api] FINISH
getAllTasksStatuses error=Secured object is not in safe state
from=::ffff:137.79.52.43,36326, flow_id=18031a91,
task_id=8e92f059-743a-48c8-aa9d-e7c4c836337b (api:52)
2020-09-03
20:19:53,483+0000 ERROR (jsonrpc/6) [storage.TaskManager.Task]
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
return fn(*args, **kargs)
File "<string>", line 2, in getAllTasksStatuses
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
method
ret = func(*args, **kwargs)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2201, in
getAllTasksStatuses
allTasksStatus = self._pool.getAllTasksStatuses()
File
"/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
77, in wrapper
raise SecureError("Secured object is not in safe state")
SecureError: Secured object is not in safe state
2020-09-03 20:19:53,483+0000 INFO (jsonrpc/6) [storage.TaskManager.Task]
(Task='8e92f059-743a-48c8-aa9d-e7c4c836337b') aborting: Task is aborted:
u'Secured object is not in safe state' - code 100 (task:1181)
2020-09-03
20:19:53,483+0000 ERROR (jsonrpc/6) [storage.Dispatcher] FINISH
getAllTasksStatuses error=Secured object is not in safe state
(dispatcher:87) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line
74, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in
wrapper
return m(self, *a, **kw)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189,
in prepare
raise self.error
SecureError: Secured object is not in safe state
---
2020-09-03 20:44:23,252+0000 INFO (tasks/2)
[storage.ThreadPool.WorkerThread] START task
76415a77-9d29-4b72-ade1-53207cfc503b (cmd=<bound method Task.commit of
<vdsm.storage.task.Task instance at 0x7fe99c27dea8>>, args=None) (thre
adPool:208)
2020-09-03 20:44:23,266+0000 INFO (tasks/2) [storage.SANLock] Acquiring
host id for domain e1270474-108c-4cae-83d6-51698cffebbf (id=1, wait=True)
(clusterlock:313)
2020-09-03 20:44:23,267+0000 ERROR (tasks/2)
[storage.TaskManager.Task]
(Task='76415a77-9d29-4b72-ade1-53207cfc503b')
Unexpected error (task:875) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
return fn(*args, **kargs)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in
run
return self.cmd(*self.argslist, **self.argsdict)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 317, in
startSpm
self.masterDomain.acquireHostId(self.id)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 957, in
acquireHostId
self._manifest.acquireHostId(hostId, wait)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 501, in
acquireHostId
self._domainLock.acquireHostId(hostId, wait)
File
"/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
344, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
('e1270474-108c-4cae-83d6-51698cffebbf', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))
---
Another symptom is in the hosts view of the engine SPM bounces between
"Normal" and "Contending". When it's Normal if I select
Management ->
Select as SPM I get "Error while executing action: Cannot force select SPM.
Unknown Data Center status."
I've tried rebooting the one remaining host in the cluster no to
avail,
hosted-engine --reinitialize-lockspace also seems to not solve the issue.
I'm kind of stumped as to what else to try, would appreciate any guidance on
how to resolve this.
Thank You
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FMJZV2OEKHPTS
TROSPLCQ3WJUIPB6CKL/