[Users] SPM problems after upgrade to 3.1

I upgraded my overt setup to 3.1 and it went ok (following http://wiki.ovirt.org/wiki/OVirt_3.0_to_3.1_upgrade, I've got it running on centos 6.2. note a need to copy a few more files from /etc/pki/ovirt-engine-old, notable generatesshkeys, may have been due to my previous version) until I tried to upgrade one of the nodes to the latest vdsm as well. It happened to be the SPM at the time, and when I put it into maintenance, the SPM started bouncing between the other two nodes that were still up. The logs are full of these error messages, but this seems to be important bit: AcquireHostIdFailure: Cannot acquire host id: ('e6ba97ae-7ccc-42ed-8739-f05b7a90d82c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) I've since finished updating the vdsm node and it's up and running, although it has the same issue. Additionally drops out of active with the message that it can't access one of the storage domains or the data center object. I've confirmed that all nodes can access all the data centers. In this case, I suspect it means the DC object, but I can't find any specific error messages to indicate that. Any thought on repairing the issue? Let me know if you want more specific data. This vdsm.log excerpt is repeated on all 3 nodes. I have active vms on the two old nodes, so I'm hesitant to shut everything down and see if that helps, but if I've got to... Thread-82::INFO::2012-08-20 08:23:57,617::safelease::160::SANLock::(acquireHostId) Acquiring host id for domain 5e47082f-a404-41ed-9109-a722270b86c3 (id: 1) Thread-82::DEBUG::2012-08-20 08:23:57,621::safelease::178::SANLock::(acquireHostId) Host id for domain 5e47082f-a404-41ed-9109-a722270b86c3 successfully acquired (id: 1) 77895d39-873d-4e7a-8560-3edaf3103656::ERROR::2012-08-20 08:23:58,099::task::833::TaskManager.Task::(_setError) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 307, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/sp.py", line 250, in startSpm self.masterDomain.acquireHostId(self.id) File "/usr/share/vdsm/storage/sd.py", line 427, in acquireHostId self._clusterLock.acquireHostId(hostId, async) File "/usr/share/vdsm/storage/safelease.py", line 175, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('e6ba97ae-7ccc-42ed-8739-f05b7a90d82c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,100::task::852::TaskManager.Task::(_run) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::Task._run: 77895d39-873d-4e7a-8560-3edaf3103656 () {} failed - stopping task 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,100::task::1177::TaskManager.Task::(stop) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::stopping in state running (force False) 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,100::task::957::TaskManager.Task::(_decref) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::ref 1 aborting True 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,100::task::883::TaskManager.Task::(_runJobs) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::aborting: Task is aborted: 'Cannot acquire host id' - code 661 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,101::task::957::TaskManager.Task::(_decref) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::ref 0 aborting True 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,101::task::892::TaskManager.Task::(_doAbort) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::Task._doAbort: force False 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,101::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,101::task::568::TaskManager.Task::(_updateState) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::moving from state running -> state aborting 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,102::task::523::TaskManager.Task::(__state_aborting) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::_aborting: recover policy auto 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,102::task::568::TaskManager.Task::(_updateState) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::moving from state aborting -> state racquiring 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,102::task::568::TaskManager.Task::(_updateState) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::moving from state racquiring -> state recovering 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,102::task::765::TaskManager.Task::(_recover) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::_recover 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,103::task::772::TaskManager.Task::(_recover) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::running recovery None 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,103::task::753::TaskManager.Task::(_recoverDone) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::Recover Done: state recovering 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,103::task::568::TaskManager.Task::(_updateState) Task=`77895d39-873d-4e7a-8560-3edaf3103656`::moving from state recovering -> state recovered 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,103::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.de911214-832b-11e1-ab21-00188bf945ff': < ResourceRef 'Storage.de911214-832b-11e1-ab21-00188bf945ff', isValid: 'True' obj: 'None'>} 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,104::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,104::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.de911214-832b-11e1-ab21-00188bf945ff' 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,104::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.de911214-832b-11e1-ab21-00188bf945ff' (0 active users) 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,105::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.de911214-832b-11e1-ab21-00188bf945ff' is free, finding out if anyone is waiting for it. 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,105::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.de911214-832b-11e1-ab21-00188bf945ff', Clearing records. 77895d39-873d-4e7a-8560-3edaf3103656::DEBUG::2012-08-20 08:23:58,105::threadPool::67::Misc.ThreadPool::(setRunningTask) Number of running tasks: 0 dom_md/metadata for the master storage pool: CLASS=Data DESCRIPTION=production IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=65 POOL_DESCRIPTION=Default POOL_DOMAINS=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c:Active,228f3315-0057-4a2d-b493 -7d93938188f3:Active,fb3b55ac-c01a-47b2-9391-77bff2a7ad16:Active,66263b64-66a9-4 f0e-b904-71d815d0fa71:Active,afa535c6-34d9-4a04-8f0a-c74ad08f094c:Active,5e47082 f-a404-41ed-9109-a722270b86c3:Active,7ae6036e-939c-41de-bd35-1a448d864987:Active POOL_SPM_ID=1 POOL_SPM_LVER=48 POOL_UUID=de911214-832b-11e1-ab21-00188bf945ff REMOTE_PATH=172.16.50.1:/volumes/vol1/production ROLE=Master SDUUID=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c TYPE=NFS VERSION=3 _SHA_CKSUM=02634ef22893ae00421372ea6c692d0ae8d67a4f It's trying to reconstruct the datacenter and failing, using different volumes from the pool too. Here's some samples from engine.log: 2012-08-20 10:06:09,850 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-78) [40e7204d] IrsBroker::Failed::GetStoragePoolInfoVDS 2012-08-20 10:06:09,851 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-78) [40e7204d] Exception: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, pool=de911214-832b-11e1-ab21-00188bf945ff' 2012-08-20 10:06:09,952 INFO [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (QuartzScheduler_Worker-78) [4da5651c] Running command: ReconstructMasterDomainCommand internal: true. Entities affected : ID: e6ba97ae-7ccc-42ed-8739-f05b7a90d82c Type: Storage 2012-08-20 10:06:10,007 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MarkPoolInReconstructModeVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] START, MarkPoolInReconstructModeVDSCommand(storagePoolId = de911214-832b-11e1-ab21-00188bf945ff, ignoreFailoverLimit = false, compatabilityVersion = null, reconstructMarkAction = ClearJobs), log id: 786fd419 2012-08-20 10:06:10,008 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-78) [4da5651c] clear domain error-timers for pool de911214-832b-11e1-ab21-00188bf945ff 2012-08-20 10:06:10,009 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MarkPoolInReconstructModeVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] FINISH, MarkPoolInReconstructModeVDSCommand, log id: 786fd419 2012-08-20 10:06:10,015 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] START, DisconnectStoragePoolVDSCommand(vdsId = 5f82de2c-84ba-11e1-8a7a-00188bf945ff, storagePoolId = de911214-832b-11e1-ab21-00188bf945ff, vds_spm_id = 2, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 0), log id: 5733fdc9 2012-08-20 10:06:10,027 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStoragePoolVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] FINISH, DisconnectStoragePoolVDSCommand, log id: 5733fdc9 2012-08-20 10:06:10,033 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] START, ReconstructMasterVDSCommand(vdsId = 5f82de2c-84ba-11e1-8a7a-00188bf945ff, vdsSpmId = 2, storagePoolId = de911214-832b-11e1-ab21-00188bf945ff, storagePoolName = Default, masterDomainId = 5e47082f-a404-41ed-9109-a722270b86c3, masterVersion = 178, domainsList = [{ domainId: 5e47082f-a404-41ed-9109-a722270b86c3, status: Active };{ domainId: fb3b55ac-c01a-47b2-9391-77bff2a7ad16, status: Active };{ domainId: 66263b64-66a9-4f0e-b904-71d815d0fa71, status: Active };{ domainId: 7ae6036e-939c-41de-bd35-1a448d864987, status: Active };{ domainId: d9ffaa0a-845c-4ad1-8450-ececaa3f236c, status: Active };{ domainId: 228f3315-0057-4a2d-b493-7d93938188f3, status: Active };{ domainId: afa535c6-34d9-4a04-8f0a-c74ad08f094c, status: Active };{ domainId: e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, status: Active };]), log id: 19aad8d6 2012-08-20 10:06:11,313 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-78) [4da5651c] Weird return value: Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 661 mMessage Cannot acquire host id: ('5e47082f-a404-41ed-9109-a722270b86c3', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) 2012-08-20 10:06:11,315 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-78) [4da5651c] Failed in ReconstructMasterVDS method 2012-08-20 10:06:11,315 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-78) [4da5651c] Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to ReconstructMasterVDS, error = Cannot acquire host id: ('5e47082f-a404-41ed-9109-a722270b86c3', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) 2012-08-20 10:06:11,317 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-78) [4da5651c] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 661 mMessage Cannot acquire host id: ('5e47082f-a404-41ed-9109-a722270b86c3', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) 2012-08-20 10:06:11,319 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-78) [4da5651c] Vds: virt1.ch1 2012-08-20 10:06:11,319 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-78) [4da5651c] Command ReconstructMasterVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to ReconstructMasterVDS, error = Cannot acquire host id: ('5e47082f-a404-41ed-9109-a722270b86c3', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) 2012-08-20 10:06:11,321 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] FINISH, ReconstructMasterVDSCommand, log id: 19aad8d6 2012-08-20 10:06:11,322 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MarkPoolInReconstructModeVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] START, MarkPoolInReconstructModeVDSCommand(storagePoolId = de911214-832b-11e1-ab21-00188bf945ff, ignoreFailoverLimit = false, compatabilityVersion = null, reconstructMarkAction = ClearCache), log id: 99f5d8 2012-08-20 10:06:11,323 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-78) [4da5651c] clearing cache for problematic entities in pool de911214-832b-11e1-ab21-00188bf945ff 2012-08-20 10:06:11,324 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MarkPoolInReconstructModeVDSCommand] (QuartzScheduler_Worker-78) [4da5651c] FINISH, MarkPoolInReconstructModeVDSCommand, log id: 99f5d8 2012-08-20 10:06:11,324 ERROR [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (QuartzScheduler_Worker-78) [4da5651c] Command org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to ReconstructMasterVDS, error = Cannot acquire host id: ('5e47082f-a404-41ed-9109-a722270b86c3', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')) 2012-08-20 10:06:11,329 INFO [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (QuartzScheduler_Worker-78) [4da5651c] Command [id=53793c2c-93db-4538-8217-45a9f8cc7e48]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_domain_static; snapshot: id=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c. 2012-08-20 10:06:11,333 INFO [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (QuartzScheduler_Worker-78) [4da5651c] Command [id=53793c2c-93db-4538-8217-45a9f8cc7e48]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_domain_static; snapshot: id=5e47082f-a404-41ed-9109-a722270b86c3. 2012-08-20 10:06:15,320 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain e6ba97ae-7ccc-42ed-8739-f05b7a90d82c in problem. vds: virt2.ch1 2012-08-20 10:06:15,321 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain 66263b64-66a9-4f0e-b904-71d815d0fa71 in problem. vds: virt2.ch1 2012-08-20 10:06:15,322 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain fb3b55ac-c01a-47b2-9391-77bff2a7ad16 in problem. vds: virt2.ch1 2012-08-20 10:06:15,323 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain afa535c6-34d9-4a04-8f0a-c74ad08f094c in problem. vds: virt2.ch1 2012-08-20 10:06:15,323 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain d9ffaa0a-845c-4ad1-8450-ececaa3f236c in problem. vds: virt2.ch1 2012-08-20 10:06:15,324 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain 228f3315-0057-4a2d-b493-7d93938188f3 in problem. vds: virt2.ch1 2012-08-20 10:06:15,325 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain 7ae6036e-939c-41de-bd35-1a448d864987 in problem. vds: virt2.ch1 2012-08-20 10:06:15,326 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-5) [7995024e] domain 5e47082f-a404-41ed-9109-a722270b86c3 in problem. vds: virt2.ch1 2012-08-20 10:06:21,421 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-100) [f98d712] hostFromVds::selectedVds - virt1.ch1, spmStatus Unknown_Pool, storage pool Default 2012-08-20 10:06:21,495 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-100) [f98d712] START, ConnectStoragePoolVDSCommand(vdsId = 5f82de2c-84ba-11e1-8a7a-00188bf945ff, storagePoolId = de911214-832b-11e1-ab21-00188bf945ff, vds_spm_id = 2, masterDomainId = e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, masterVersion = 178), log id: 42708bc8 2012-08-20 10:06:22,046 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-100) [f98d712] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 324 mMessage Wrong Master domain or its version: 'SD=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, pool=de911214-832b-11e1-ab21-00188bf945ff' 2012-08-20 10:06:22,049 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-100) [f98d712] Vds: virt1.ch1 2012-08-20 10:06:22,049 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-100) [f98d712] Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, pool=de911214-832b-11e1-ab21-00188bf945ff' 2012-08-20 10:06:22,051 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-100) [f98d712] FINISH, ConnectStoragePoolVDSCommand, log id: 42708bc8 2012-08-20 10:06:22,051 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-100) [f98d712] IrsBroker::Failed::GetStoragePoolInfoVDS 2012-08-20 10:06:22,052 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-100) [f98d712] Exception: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, pool=de911214-832b-11e1-ab21-00188bf945ff' 2012-08-20 10:06:22,081 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-100) [f98d712] Irs placed on server null failed. Proceed Failover 2012-08-20 10:06:22,109 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-100) [f98d712] hostFromVds::selectedVds - virt2.ch1, spmStatus Unknown_Pool, storage pool Default 2012-08-20 10:06:22,170 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-100) [f98d712] START, ConnectStoragePoolVDSCommand(vdsId = d38856e2-c2f1-11e1-9952-00188bf945ff, storagePoolId = de911214-832b-11e1-ab21-00188bf945ff, vds_spm_id = 3, masterDomainId = e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, masterVersion = 178), log id: 51638c25 2012-08-20 10:06:22,737 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-100) [f98d712] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 324 mMessage Wrong Master domain or its version: 'SD=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, pool=de911214-832b-11e1-ab21-00188bf945ff' 2012-08-20 10:06:22,739 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-100) [f98d712] Vds: virt2.ch1 2012-08-20 10:06:22,740 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-100) [f98d712] Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e6ba97ae-7ccc-42ed-8739-f05b7a90d82c, pool=de911214-832b-11e1-ab21-00188bf945ff' 2012-08-20 10:06:22,741 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-100) [f98d712] FINISH, ConnectStoragePoolVDSCommand, log id: 51638c25 Darrell Budic Bigwells Technology LLC office: 312.529.7816 cell: 608.239.4628

I upgraded my overt setup to 3.1 and it went ok (following = http://wiki.ovirt.org/wiki/OVirt_3.0_to_3.1_upgrade, I've got it running = on centos 6.2. note a need to copy a few more files from = /etc/pki/ovirt-engine-old, notable generatesshkeys, may have been due to = my previous version) until I tried to upgrade one of the nodes to the = latest vdsm as well. It happened to be the SPM at the time, and when I =
=20 AcquireHostIdFailure: Cannot acquire host id: = ('e6ba97ae-7ccc-42ed-8739-f05b7a90d82c', SanlockException(90, 'Sanlock = lockspace add failure', 'Message too long')) =20 I've since finished updating the vdsm node and it's up and running, = although it has the same issue. Additionally drops out of active with =
--Apple-Mail=_481D530E-8314-4567-AD0A-6E82E9DF45FF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 After many trials and annoyances, I started tearing down my old dc and = building a new one, which is working out ok, if slow. I wasn't able to = remove the old one cleanly, but I am getting all my data back online. My = symptoms resemble an exchange from a few months ago that I'll attach = some of below. Not sure how I got there, but I had v3 storage domains on = NFS that refused to activate. I found similar things in my logs to the = lvm errors Rene encountered so I'm wondering if I had the same problem = with the v3 update. -Darrell ----- Original Message ----- From: "Rene Rosenberger"<r.rosenber...@netbiscuits.com> To: "Saggi Mizrahi"<smizr...@redhat.com>, rvak...@redhat.com Cc: users@ovirt.org Sent: Monday, April 2, 2012 2:30:16 AM Subject: AW: AW: [Users] storage domain reactivate not working Hi, ok, but how can i delete it if nothing goes. I want to generate a new storage domain. -----Urspr=FCngliche Nachricht----- Von: Saggi Mizrahi [mailto:smizr...@redhat.com] Gesendet: Freitag, 30. M=E4rz 2012 21:00 An: rvak...@redhat.com Cc: users@ovirt.org; Rene Rosenberger Betreff: Re: AW: [Users] storage domain reactivate not working I am currently working on patches to fix the issues with upgraded domains. I've been ill for the most part of last week so it is taking a bit more time then it should. ----- Original Message ----- From: "Rami Vaknin"<rvak...@redhat.com> To: "Saggi Mizrahi"<smizr...@redhat.com>, "Rene Rosenberger" <r.rosenber...@netbiscuits.com> Cc: users@ovirt.org Sent: Thursday, March 29, 2012 11:57:08 AM Subject: Fwd: AW: [Users] storage domain reactivate not working Rene, VDSM can't read the storage domain's metadata, the problem is that vdsm tries to read the metadata using 'dd' command which applies to the old version of storage domains as in the new format the metadata is saved as vg tags. Are you using storage domain version lower that V2? Can you attach the full log? Saggi, any thoughts on that? -------- Original Message -------- Subject: AW: [Users] storage domain reactivate not working Date: Thu, 29 Mar 2012 06:33:27 -0400 From: Rene Rosenberger<r.rosenber...@netbiscuits.com> To: rvak...@redhat.com<rvak...@redhat.com> , users@ovirt.org <users@ovirt.org> On Aug 20, 2012, at 5:09 AM, Darrell Budic wrote: put it into maintenance, the SPM started bouncing between the other two = nodes that were still up. The logs are full of these error messages, but = this seems to be important bit:=20 the message that it can't access one of the storage domains or the data = center object. I've confirmed that all nodes can access all the data = centers. In this case, I suspect it means the DC object, but I can't = find any specific error messages to indicate that.
=20 Any thought on repairing the issue? Let me know if you want more = specific data. This vdsm.log excerpt is repeated on all 3 nodes. I have = active vms on the two old nodes, so I'm hesitant to shut everything down = and see if that helps, but if I've got to...
Darrell Budic Bigwells Technology LLC office: 312.529.7816 cell: 608.239.4628 --Apple-Mail=_481D530E-8314-4567-AD0A-6E82E9DF45FF Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 <html><head></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">After = many trials and annoyances, I started tearing down my old dc and = building a new one, which is working out ok, if slow. I wasn't able to = remove the old one cleanly, but I am getting all my data back online. My = symptoms resemble an exchange from a few months ago that I'll attach = some of below. Not sure how I got there, but I had v3 storage domains on = NFS that refused to activate. I found similar things in my logs to the = lvm errors Rene encountered so I'm wondering if I had the same problem = with the v3 update.<div><br></div><div> = -Darrell</div><div><br></div><div><pre style=3D"color: rgb(0, 0, 0); = font-size: medium; font-style: normal; font-variant: normal; = font-weight: normal; letter-spacing: normal; line-height: normal; = orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: = none; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; = -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); = margin: 0em; ">----- Original Message ----- </pre><blockquote style=3D"color: rgb(0, 0, 0); font-family: monospace; = font-size: medium; font-style: normal; font-variant: normal; = font-weight: normal; letter-spacing: normal; line-height: normal; = orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: = none; white-space: normal; widows: 2; word-spacing: 0px; = -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; = background-color: rgb(255, 255, 255); border-left-color: rgb(85, 85, = 238); border-left-style: solid; border-left-width: 0.2em; margin: 0em; = padding-left: 0.85em; "><pre style=3D"margin: 0em; ">From: "Rene = Rosenberger"<r.rosenber...@netbiscuits.com> To: "Saggi Mizrahi"<smizr...@redhat.com>, rvak...@redhat.com Cc: <a href=3D"mailto:users@ovirt.org">users@ovirt.org</a> Sent: Monday, April 2, 2012 2:30:16 AM Subject: AW: AW: [Users] storage domain reactivate not working Hi, ok, but how can i delete it if nothing goes. I want to generate a new storage domain. -----Urspr=FCngliche Nachricht----- Von: Saggi Mizrahi [<a rel=3D"nofollow" = href=3D"mailto:smizr...@redhat.com" style=3D"color: rgb(0, 0, 0); = ">mailto:smizr...@redhat.com</a>] Gesendet: Freitag, 30. M=E4rz 2012 21:00 An: rvak...@redhat.com Cc: <a href=3D"mailto:users@ovirt.org">users@ovirt.org</a>; Rene = Rosenberger Betreff: Re: AW: [Users] storage domain reactivate not working I am currently working on patches to fix the issues with upgraded domains. I've been ill for the most part of last week so it is taking a bit more time then it should. ----- Original Message ----- </pre><blockquote style=3D"border-left-color: rgb(85, 85, 238); = border-left-style: solid; border-left-width: 0.2em; margin: 0em; = padding-left: 0.85em; "><pre style=3D"margin: 0em; ">From: "Rami = Vaknin"<rvak...@redhat.com> To: "Saggi Mizrahi"<smizr...@redhat.com>, "Rene Rosenberger" <r.rosenber...@netbiscuits.com> Cc: <a href=3D"mailto:users@ovirt.org">users@ovirt.org</a> Sent: Thursday, March 29, 2012 11:57:08 AM Subject: Fwd: AW: [Users] storage domain reactivate not working Rene, VDSM can't read the storage domain's metadata, the problem is that vdsm tries to read the metadata using 'dd' command which applies to the old version of storage domains as in the new format the metadata is saved as vg tags. Are you using storage domain version lower that V2? Can you attach the full log? Saggi, any thoughts on that? -------- Original Message -------- Subject: AW: [Users] storage domain reactivate not working Date: Thu, 29 Mar 2012 06:33:27 -0400 From: Rene Rosenberger<r.rosenber...@netbiscuits.com> To: rvak...@redhat.com<rvak...@redhat.com> , <a = href=3D"mailto:users@ovirt.org">users@ovirt.org</a> <<a = href=3D"mailto:users@ovirt.org">users@ovirt.org</a>></pre></blockquote>= </blockquote><div><br></div><div><div>On Aug 20, 2012, at 5:09 AM, = Darrell Budic wrote:</div><br = class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>I = upgraded my overt setup to 3.1 and it went ok (following <a = href=3D"http://wiki.ovirt.org/wiki/OVirt_3.0_to_3.1_upgrade">http://wiki.o= virt.org/wiki/OVirt_3.0_to_3.1_upgrade</a>, I've got it running on = centos 6.2. note a need to copy a few more files from = /etc/pki/ovirt-engine-old, notable generatesshkeys, may have been due to = my previous version) until I tried to upgrade one of the nodes to the = latest vdsm as well. It happened to be the SPM at the time, and when I = put it into maintenance, the SPM started bouncing between the other two = nodes that were still up. The logs are full of these error messages, but = this seems to be important bit: <br><br>AcquireHostIdFailure: Cannot = acquire host id: ('e6ba97ae-7ccc-42ed-8739-f05b7a90d82c', = SanlockException(90, 'Sanlock lockspace add failure', 'Message too = long'))<br><br>I've since finished updating the vdsm node and it's up = and running, although it has the same issue. Additionally drops out of = active with the message that it can't access one of the storage domains = or the data center object. I've confirmed that all nodes can access all = the data centers. In this case, I suspect it means the DC object, but I = can't find any specific error messages to indicate that.<br><br>Any = thought on repairing the issue? Let me know if you want more specific = data. This vdsm.log excerpt is repeated on all 3 nodes. I have active = vms on the two old nodes, so I'm hesitant to shut everything down and = see if that helps, but if I've got = to...<br></div></blockquote></div><br><div apple-content-edited=3D"true"> <div>Darrell Budic</div><div>Bigwells Technology LLC</div><div>office: = 312.529.7816</div><div>cell: 608.239.4628</div><div><br></div><br = class=3D"Apple-interchange-newline"> </div> <br></div></body></html>= --Apple-Mail=_481D530E-8314-4567-AD0A-6E82E9DF45FF--
participants (1)
-
Darrell Budic