
hi all, We are developing for supporting ceph as a new storage domain type. It went well when creating and using. But we cannot add a new host to the engine when there is already a ceph-StorageDomain. The engine reports "VDSM node1 command ConnectStoragePoolVDS failed: Cannot find master domain" and set the node to "Non Operational" status. The vdsm log reports error like this: INFO (jsonrpc/0) [vdsm.api] FINISH connectStoragePool error=Cannot find master domain: u'spUUID=5bc95ba9-01f7-0307-0342-000000000033, msdUUID=a3d6903d-07d5-4794-9667-9dddc3c84fe6' from=::ffff:192.168.122.22,40166, flow_id=43b39e00, task_id=4128f9d4-3c92-46ed-a230-2505d3a8ddb9 (api:50) 2018-10-24 11:06:22,227+0800 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='4128f9d4-3c92-46ed-a230-2505d3a8ddb9') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in connectStoragePool File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1032, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1094, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 704, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1275, in __rebuild self.setMasterDomain(msdUUID, masterVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1488, in setMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: u'spUUID=5bc95ba9-01f7-0307-0342-000000000033, msdUUID=a3d6903d-07d5-4794-9667-9dddc3c84fe6' After tracing back, I found that there is no local file path under /rhev/data-center/mnt for the storage domain, so it cannot find storage domain when ConnectStoargeSever runs. It only happens when adding a new host to engine and there is already a ceph-StroageDomain in ovirt. So how does the host synchronized the existing storage domains? And what can I do to fix this? Thanks in advance! Yours Sincerely, Tianyuan