Hi all,
I'm testing the new oVirt version 4.2.6, with the ovirt-engine installed as a service
in a physical host.
I tried to recover an engine-backup from a previous ovirt deployment, where the
ovirt-engine was self-hosted inside a VM , and it seems that all work properly, because
the cluster, hosts, and network configuration was correctly redeployed.
The problem was when in the new cluster, I put all the nodes on Maintenance to upgrade it,
and when they go back to the UP state, the datacenter was on a Non Responsive Status,
because the Master Storage Domain is Inactive, and if I try to reactivate it, I obtain
this error message:
Failed Activating Storage Domain gpfs_kvm on Data Center BSC-CNS
I don't have any hosts in the cluster as a SPM.
Due to this situacion, I've some questions:
It's possible to force one host to be an SPM?
I failed when I put all the nodes on maintenance to upgrade, beacause it's better to
maintain one host active as an SPM to protect the cluster, and the storage?
Maybe the restored config from a diferent architecture (hosted-engine vs engine installed
on host), can affect the Storage? (Previously the Master Storage Domain was on NFS, but I
delete it and changed the role of Master to one Posix domain added)
At log level on one compute, I can see this events:
2018-09-19 11:40:11,878+0200 INFO (jsonrpc/5) [vdsm.api] FINISH connectStoragePool
error=Cannot find master domain: u'spUUID=32168096-b763-11e8-a7aa-000af7b8b6ba,
msdUUID=cb99c414-0d93-41b9-9396-d8a607652b49' from=::ffff:10.2.1.101,58538,
task_id=eedeba4d-99bc-4e18-8f24-f501b73e0da6 (api:50)
2018-09-19 11:40:11,879+0200 ERROR (jsonrpc/5) [storage.TaskManager.Task]
(Task='eedeba4d-99bc-4e18-8f24-f501b73e0da6') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
return fn(*args, **kargs)
File "<string>", line 2, in connectStoragePool
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1035, in
connectStoragePool
spUUID, hostID, msdUUID, masterVersion, domainsMap)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1097, in
_connectStoragePool
res = pool.connect(hostID, msdUUID, masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 700, in
connect
self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1274, in
__rebuild
self.setMasterDomain(msdUUID, masterVersion)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1495, in
setMasterDomain
raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
StoragePoolMasterNotFound: Cannot find master domain:
u'spUUID=32168096-b763-11e8-a7aa-000af7b8b6ba,
msdUUID=cb99c414-0d93-41b9-9396-d8a607652b49'
2018-09-19 11:40:11,879+0200 INFO (jsonrpc/5) [storage.TaskManager.Task]
(Task='eedeba4d-99bc-4e18-8f24-f501b73e0da6') aborting: Task is aborted:
"Cannot find master domain: u'spUUID=32168096-b763-11e8-a7aa-000af7b8b6ba,
msdUUID=cb99c414-0d93-41b9-9396-d8a607652b49'" - code 304 (task:1181)
2018-09-19 11:40:11,879+0200 ERROR (jsonrpc/5) [storage.Dispatcher] FINISH
connectStoragePool error=Cannot find master domain:
u'spUUID=32168096-b763-11e8-a7aa-000af7b8b6ba,
msdUUID=cb99c414-0d93-41b9-9396-d8a607652b49' (dispatcher:82)
2018-09-19 11:40:11,880+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call
StoragePool.connect failed (error 304) in 0.31 seconds (__init__:573)
2018-09-19 11:40:11,907+0200 INFO (jsonrpc/1) [vdsm.api] START
getSpmStatus(spUUID=u'32168096-b763-11e8-a7aa-000af7b8b6ba', options=None)
from=::ffff:10.2.1.101,58538, task_id=4133fa8c-88f8-44c5-a8b8-18ac6c67771e (api:46)
2018-09-19 11:40:11,907+0200 INFO (jsonrpc/1) [vdsm.api] FINISH getSpmStatus
error=Unknown pool id, pool not connected:
(u'32168096-b763-11e8-a7aa-000af7b8b6ba',) from=::ffff:10.2.1.101,58538,
task_id=4133fa8c-88f8-44c5-a8b8-18ac6c67771e (api:50)
2018-09-19 11:40:11,908+0200 ERROR (jsonrpc/1) [storage.TaskManager.Task]
(Task='4133fa8c-88f8-44c5-a8b8-18ac6c67771e') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in
_run
return fn(*args, **kargs)
File "<string>", line 2, in getSpmStatus
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 634, in
getSpmStatus
pool = self.getPool(spUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 350, in
getPool
raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected:
(u'32168096-b763-11e8-a7aa-000af7b8b6ba',)
2018-09-19 11:40:11,908+0200 INFO (jsonrpc/1) [storage.TaskManager.Task]
(Task='4133fa8c-88f8-44c5-a8b8-18ac6c67771e') aborting: Task is aborted:
"Unknown pool id, pool not connected:
(u'32168096-b763-11e8-a7aa-000af7b8b6ba',)" - code 309 (task:1181)
2018-09-19 11:40:11,908+0200 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH getSpmStatus
error=Unknown pool id, pool not connected:
(u'32168096-b763-11e8-a7aa-000af7b8b6ba',) (dispatcher:82)
2018-09-19 11:40:11,908+0200 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call
StoragePool.getSpmStatus failed (error 309) in 0.01 seconds (__init__:573)
I tried to base my situation on other cases that other people reported before, but nothing
fits with exactly my logged errors.
Could you suggest me anything? The infrastructure is now on testing, and I'm able to
redeploy it if its possible.
Thanks in advance.