[Users] Master domain locked, error code 304

We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the hosts to activate. They are unable to activate the "master" domain. The master storage domain show "locked" while the other storage domains show Unknown (disks) and inactive (ISO) All the domains are on the same NFS server, we are able to mount it, the permissions are good. We believe we might be getting bit by https://bugzilla.redhat.com/show_bug.cgi?id=920694or http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it: Michael KublinApr 10 Patch Set 5: Do not submit Liron, please abondon this work. This interacts with host life cycle which will be changed, during a change a following problem will be solved as well. So, We were wondering what we can do to get our oVirt back online, or rather what the correct way is to solve this. We have a few VMs that are down which we are looking for ways to recover as quickly as possible. Thanks in advance, Tommy Here are the ovirt-engine logs: 2013-04-23 21:30:04,041 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' 2013-04-23 21:30:04,043 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 2013-04-23 21:30:04,049 WARN [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain failed. Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked Here are the logs from vdsm: Thread-29::DEBUG::2013-04-23 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None) Thread-29::DEBUG::2013-04-23 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso' (cwd None) Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::1151::TaskManager.Task::(prepare) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::568::TaskManager.Task::(_updateState) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing -> state finished Thread-29::DEBUG::2013-04-23 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-29::DEBUG::2013-04-23 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-29::DEBUG::2013-04-23 21:36:06,072::task::957::TaskManager.Task::(_decref) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] Thread-30::DEBUG::2013-04-23 21:36:06,112::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state preparing Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, options=None) Thread-30::DEBUG::2013-04-23 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at '__init__' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' for lock type 'exclusive' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking as 'exclusive' (1 active user) Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted request Thread-30::INFO::2013-04-23 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No session found.\n'; <rc> = 21 Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds Thread-30::DEBUG::2013-04-23 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/multipath' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result Thread-30::DEBUG::2013-04-23 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 Thread-30::WARNING::2013-04-23 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] Thread-30::DEBUG::2013-04-23 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active users) Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out if anyone is waiting for it. Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', Clearing records. Thread-30::ERROR::2013-04-23 21:36:08,526::task::833::TaskManager.Task::(_setError) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 642, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' Thread-30::DEBUG::2013-04-23 21:36:08,527::task::852::TaskManager.Task::(_run) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: f551fa3f-9d8c-4de3-895a-964c821060d4 ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, '0f63de0e-7d98-48ce-99ec-add109f83c4f', '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task Thread-30::DEBUG::2013-04-23 21:36:08,528::task::1177::TaskManager.Task::(stop) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing (force False) Thread-30::DEBUG::2013-04-23 21:36:08,528::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True Thread-30::INFO::2013-04-23 21:36:08,528::task::1134::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: 'Cannot find master domain' - code 304 Thread-30::DEBUG::2013-04-23 21:36:08,529::task::1139::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find master domain Thread-30::DEBUG::2013-04-23 21:36:08,529::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True Thread-30::DEBUG::2013-04-23 21:36:08,529::task::892::TaskManager.Task::(_doAbort) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False Thread-30::DEBUG::2013-04-23 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::DEBUG::2013-04-23 21:36:08,530::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing -> state aborting Thread-30::DEBUG::2013-04-23 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none Thread-30::DEBUG::2013-04-23 21:36:08,531::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> state failed Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::ERROR::2013-04-23 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} [root@vmserver3 vdsm]#

Hello Tommy, I had a similar experience and after try to recover my storage domain, I realized that my VMs had missed. You have to verify if your VM disks are inside of your storage domain. In my case, I had to add a new a new Storage domain as Master domain to be able to remove the old VMs from DB and reattach the old storage domain. I hope this were not your case. If you haven't lost your VMs it's possible that you can recover them. Good luck, Juanjo. On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely <tommythekid@gmail.com>wrote:
We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the hosts to activate. They are unable to activate the "master" domain. The master storage domain show "locked" while the other storage domains show Unknown (disks) and inactive (ISO) All the domains are on the same NFS server, we are able to mount it, the permissions are good. We believe we might be getting bit by https://bugzilla.redhat.com/show_bug.cgi?id=920694or http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
Michael Kublin Apr 10
Patch Set 5: Do not submit
Liron, please abondon this work. This interacts with host life cycle which will be changed, during a change a following problem will be solved as well.
So, We were wondering what we can do to get our oVirt back online, or rather what the correct way is to solve this. We have a few VMs that are down which we are looking for ways to recover as quickly as possible.
Thanks in advance, Tommy
Here are the ovirt-engine logs:
2013-04-23 21:30:04,041 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' 2013-04-23 21:30:04,043 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 2013-04-23 21:30:04,049 WARN [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain failed. Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked
Here are the logs from vdsm:
Thread-29::DEBUG::2013-04-23 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None) Thread-29::DEBUG::2013-04-23 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso' (cwd None) Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::1151::TaskManager.Task::(prepare) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::568::TaskManager.Task::(_updateState) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing -> state finished Thread-29::DEBUG::2013-04-23 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-29::DEBUG::2013-04-23 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-29::DEBUG::2013-04-23 21:36:06,072::task::957::TaskManager.Task::(_decref) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] Thread-30::DEBUG::2013-04-23 21:36:06,112::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state preparing Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, options=None) Thread-30::DEBUG::2013-04-23 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at '__init__' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' for lock type 'exclusive' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking as 'exclusive' (1 active user) Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted request Thread-30::INFO::2013-04-23 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No session found.\n'; <rc> = 21 Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds Thread-30::DEBUG::2013-04-23 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/multipath' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result Thread-30::DEBUG::2013-04-23 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 Thread-30::WARNING::2013-04-23 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] Thread-30::DEBUG::2013-04-23 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active users) Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out if anyone is waiting for it. Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', Clearing records. Thread-30::ERROR::2013-04-23 21:36:08,526::task::833::TaskManager.Task::(_setError) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 642, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' Thread-30::DEBUG::2013-04-23 21:36:08,527::task::852::TaskManager.Task::(_run) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: f551fa3f-9d8c-4de3-895a-964c821060d4 ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, '0f63de0e-7d98-48ce-99ec-add109f83c4f', '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task Thread-30::DEBUG::2013-04-23 21:36:08,528::task::1177::TaskManager.Task::(stop) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing (force False) Thread-30::DEBUG::2013-04-23 21:36:08,528::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True Thread-30::INFO::2013-04-23 21:36:08,528::task::1134::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: 'Cannot find master domain' - code 304 Thread-30::DEBUG::2013-04-23 21:36:08,529::task::1139::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find master domain Thread-30::DEBUG::2013-04-23 21:36:08,529::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True Thread-30::DEBUG::2013-04-23 21:36:08,529::task::892::TaskManager.Task::(_doAbort) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False Thread-30::DEBUG::2013-04-23 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::DEBUG::2013-04-23 21:36:08,530::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing -> state aborting Thread-30::DEBUG::2013-04-23 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none Thread-30::DEBUG::2013-04-23 21:36:08,531::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> state failed Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::ERROR::2013-04-23 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} [root@vmserver3 vdsm]#
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Juan, That sounds like a possible path to follow. Our "master" domain does not have any VMs in it. If no one else responds with an official path to resolution, then I will try going into the database and hacking it like that. I think it has something to do with the version or the metadata?? [root@vmserver3 dom_md]# cat metadata CLASS=Data DESCRIPTION=SFOTestMaster1 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=SFODC01 POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active POOL_SPM_ID=1 POOL_SPM_LVER=4 POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f REMOTE_PATH=10.101.0.148:/c/vpt1-master ROLE=Master SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c TYPE=NFS VERSION=0 _SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08 On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose <jj197005@gmail.com> wrote:
Hello Tommy,
I had a similar experience and after try to recover my storage domain, I realized that my VMs had missed. You have to verify if your VM disks are inside of your storage domain. In my case, I had to add a new a new Storage domain as Master domain to be able to remove the old VMs from DB and reattach the old storage domain. I hope this were not your case. If you haven't lost your VMs it's possible that you can recover them.
Good luck,
Juanjo.
On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely <tommythekid@gmail.com>wrote:
We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the hosts to activate. They are unable to activate the "master" domain. The master storage domain show "locked" while the other storage domains show Unknown (disks) and inactive (ISO) All the domains are on the same NFS server, we are able to mount it, the permissions are good. We believe we might be getting bit by https://bugzilla.redhat.com/show_bug.cgi?id=920694 or http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
Michael Kublin Apr 10
Patch Set 5: Do not submit
Liron, please abondon this work. This interacts with host life cycle which will be changed, during a change a following problem will be solved as well.
So, We were wondering what we can do to get our oVirt back online, or rather what the correct way is to solve this. We have a few VMs that are down which we are looking for ways to recover as quickly as possible.
Thanks in advance, Tommy
Here are the ovirt-engine logs:
2013-04-23 21:30:04,041 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' 2013-04-23 21:30:04,043 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 2013-04-23 21:30:04,049 WARN [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain failed. Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked
Here are the logs from vdsm:
Thread-29::DEBUG::2013-04-23 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None) Thread-29::DEBUG::2013-04-23 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso' (cwd None) Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::1151::TaskManager.Task::(prepare) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::568::TaskManager.Task::(_updateState) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing -> state finished Thread-29::DEBUG::2013-04-23 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-29::DEBUG::2013-04-23 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-29::DEBUG::2013-04-23 21:36:06,072::task::957::TaskManager.Task::(_decref) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] Thread-30::DEBUG::2013-04-23 21:36:06,112::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state preparing Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, options=None) Thread-30::DEBUG::2013-04-23 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at '__init__' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' for lock type 'exclusive' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking as 'exclusive' (1 active user) Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted request Thread-30::INFO::2013-04-23 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No session found.\n'; <rc> = 21 Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds Thread-30::DEBUG::2013-04-23 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/multipath' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result Thread-30::DEBUG::2013-04-23 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 Thread-30::WARNING::2013-04-23 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] Thread-30::DEBUG::2013-04-23 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active users) Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out if anyone is waiting for it. Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', Clearing records. Thread-30::ERROR::2013-04-23 21:36:08,526::task::833::TaskManager.Task::(_setError) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 642, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' Thread-30::DEBUG::2013-04-23 21:36:08,527::task::852::TaskManager.Task::(_run) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: f551fa3f-9d8c-4de3-895a-964c821060d4 ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, '0f63de0e-7d98-48ce-99ec-add109f83c4f', '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task Thread-30::DEBUG::2013-04-23 21:36:08,528::task::1177::TaskManager.Task::(stop) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing (force False) Thread-30::DEBUG::2013-04-23 21:36:08,528::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True Thread-30::INFO::2013-04-23 21:36:08,528::task::1134::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: 'Cannot find master domain' - code 304 Thread-30::DEBUG::2013-04-23 21:36:08,529::task::1139::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find master domain Thread-30::DEBUG::2013-04-23 21:36:08,529::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True Thread-30::DEBUG::2013-04-23 21:36:08,529::task::892::TaskManager.Task::(_doAbort) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False Thread-30::DEBUG::2013-04-23 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::DEBUG::2013-04-23 21:36:08,530::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing -> state aborting Thread-30::DEBUG::2013-04-23 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none Thread-30::DEBUG::2013-04-23 21:36:08,531::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> state failed Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::ERROR::2013-04-23 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} [root@vmserver3 vdsm]#
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi, Your problem is that the master domain is locked, so the engine does not send connectStorageServer to the vdsm host, and therefore the host does not see the master domain. You need to change the status of the master domain in the db from locked while the host is in maintenance. This can be tricky and not very recommended because if you do it wrong you might corrupt the db. Another, safer, way that I recommend is try to do connectStorageServer to the masterSD from vdsClient on the vdsm host and see what happens, it might solve your problem. -- Yeela ----- Original Message -----
From: "Tommy McNeely" <tommythekid@gmail.com> To: "Juan Jose" <jj197005@gmail.com> Cc: users@ovirt.org Sent: Wednesday, April 24, 2013 7:30:20 PM Subject: Re: [Users] Master domain locked, error code 304
Hi Juan,
That sounds like a possible path to follow. Our "master" domain does not have any VMs in it. If no one else responds with an official path to resolution, then I will try going into the database and hacking it like that. I think it has something to do with the version or the metadata??
[root@vmserver3 dom_md]# cat metadata CLASS=Data DESCRIPTION=SFOTestMaster1 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=SFODC01 POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active POOL_SPM_ID=1 POOL_SPM_LVER=4 POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f REMOTE_PATH=10.101.0.148:/c/vpt1-master ROLE=Master SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c TYPE=NFS VERSION=0 _SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08
On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose < jj197005@gmail.com > wrote:
Hello Tommy,
I had a similar experience and after try to recover my storage domain, I realized that my VMs had missed. You have to verify if your VM disks are inside of your storage domain. In my case, I had to add a new a new Storage domain as Master domain to be able to remove the old VMs from DB and reattach the old storage domain. I hope this were not your case. If you haven't lost your VMs it's possible that you can recover them.
Good luck,
Juanjo.
On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely < tommythekid@gmail.com > wrote:
We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the hosts to activate. They are unable to activate the "master" domain. The master storage domain show "locked" while the other storage domains show Unknown (disks) and inactive (ISO) All the domains are on the same NFS server, we are able to mount it, the permissions are good. We believe we might be getting bit by https://bugzilla.redhat.com/show_bug.cgi?id=920694 or http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
Michael Kublin Apr 10
Patch Set 5: Do not submit
Liron, please abondon this work. This interacts with host life cycle which will be changed, during a change a following problem will be solved as well.
So, We were wondering what we can do to get our oVirt back online, or rather what the correct way is to solve this. We have a few VMs that are down which we are looking for ways to recover as quickly as possible.
Thanks in advance, Tommy
Here are the ovirt-engine logs:
2013-04-23 21:30:04,041 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' 2013-04-23 21:30:04,043 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 2013-04-23 21:30:04,049 WARN [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain failed. Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked
Here are the logs from vdsm:
Thread-29::DEBUG::2013-04-23 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None) Thread-29::DEBUG::2013-04-23 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso' (cwd None) Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::1151::TaskManager.Task::(prepare) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::568::TaskManager.Task::(_updateState) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing -> state finished Thread-29::DEBUG::2013-04-23 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-29::DEBUG::2013-04-23 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-29::DEBUG::2013-04-23 21:36:06,072::task::957::TaskManager.Task::(_decref) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] Thread-30::DEBUG::2013-04-23 21:36:06,112::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state preparing Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, options=None) Thread-30::DEBUG::2013-04-23 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at '__init__' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' for lock type 'exclusive' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking as 'exclusive' (1 active user) Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted request Thread-30::INFO::2013-04-23 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No session found.\n'; <rc> = 21 Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds Thread-30::DEBUG::2013-04-23 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/multipath' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result Thread-30::DEBUG::2013-04-23 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 Thread-30::WARNING::2013-04-23 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] Thread-30::DEBUG::2013-04-23 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active users) Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out if anyone is waiting for it. Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', Clearing records. Thread-30::ERROR::2013-04-23 21:36:08,526::task::833::TaskManager.Task::(_setError) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 642, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' Thread-30::DEBUG::2013-04-23 21:36:08,527::task::852::TaskManager.Task::(_run) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: f551fa3f-9d8c-4de3-895a-964c821060d4 ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, '0f63de0e-7d98-48ce-99ec-add109f83c4f', '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task Thread-30::DEBUG::2013-04-23 21:36:08,528::task::1177::TaskManager.Task::(stop) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing (force False) Thread-30::DEBUG::2013-04-23 21:36:08,528::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True Thread-30::INFO::2013-04-23 21:36:08,528::task::1134::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: 'Cannot find master domain' - code 304 Thread-30::DEBUG::2013-04-23 21:36:08,529::task::1139::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find master domain Thread-30::DEBUG::2013-04-23 21:36:08,529::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True Thread-30::DEBUG::2013-04-23 21:36:08,529::task::892::TaskManager.Task::(_doAbort) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False Thread-30::DEBUG::2013-04-23 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::DEBUG::2013-04-23 21:36:08,530::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing -> state aborting Thread-30::DEBUG::2013-04-23 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none Thread-30::DEBUG::2013-04-23 21:36:08,531::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> state failed Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::ERROR::2013-04-23 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} [root@vmserver3 vdsm]#
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

----- Original Message -----
From: "Yeela Kaplan" <ykaplan@redhat.com> To: "Tommy McNeely" <tommythekid@gmail.com> Cc: users@ovirt.org Sent: Thursday, April 25, 2013 10:08:56 AM Subject: Re: [Users] Master domain locked, error code 304
Hi, Your problem is that the master domain is locked, so the engine does not send connectStorageServer to the vdsm host, and therefore the host does not see the master domain. You need to change the status of the master domain in the db from locked while the host is in maintenance. This can be tricky and not very recommended because if you do it wrong you might corrupt the db. Another, safer, way that I recommend is try to do connectStorageServer to the masterSD from vdsClient on the vdsm host and see what happens, it might solve your problem.
-- Yeela
----- Original Message -----
From: "Tommy McNeely" <tommythekid@gmail.com> To: "Juan Jose" <jj197005@gmail.com> Cc: users@ovirt.org Sent: Wednesday, April 24, 2013 7:30:20 PM Subject: Re: [Users] Master domain locked, error code 304
Hi Juan,
That sounds like a possible path to follow. Our "master" domain does not have any VMs in it. If no one else responds with an official path to resolution, then I will try going into the database and hacking it like that. I think it has something to do with the version or the metadata??
[root@vmserver3 dom_md]# cat metadata CLASS=Data DESCRIPTION=SFOTestMaster1 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=SFODC01 POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active POOL_SPM_ID=1 POOL_SPM_LVER=4 POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f REMOTE_PATH=10.101.0.148:/c/vpt1-master ROLE=Master SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c TYPE=NFS VERSION=0 _SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08
On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose < jj197005@gmail.com > wrote:
Hello Tommy,
I had a similar experience and after try to recover my storage domain, I realized that my VMs had missed. You have to verify if your VM disks are inside of your storage domain. In my case, I had to add a new a new Storage domain as Master domain to be able to remove the old VMs from DB and reattach the old storage domain. I hope this were not your case. If you haven't lost your VMs it's possible that you can recover them.
Good luck,
Juanjo.
On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely < tommythekid@gmail.com > wrote:
We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the hosts to activate. They are unable to activate the "master" domain. The master storage domain show "locked" while the other storage domains show Unknown (disks) and inactive (ISO) All the domains are on the same NFS server, we are able to mount it, the permissions are good. We believe we might be getting bit by https://bugzilla.redhat.com/show_bug.cgi?id=920694 or http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
Michael Kublin Apr 10
Patch Set 5: Do not submit
Liron, please abondon this work. This interacts with host life cycle which will be changed, during a change a following problem will be solved as well.
So, We were wondering what we can do to get our oVirt back online, or rather what the correct way is to solve this. We have a few VMs that are down which we are looking for ways to recover as quickly as possible.
Thanks in advance, Tommy
Here are the ovirt-engine logs:
2013-04-23 21:30:04,041 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' 2013-04-23 21:30:04,043 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 2013-04-23 21:30:04,049 WARN [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain failed. Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked
Hi, domain stuck in status Locked it is a bug and it is not directly related to discussed patch. No actions in vdsm can help in such situation, please do the following:
If domains are marked as Locked in GUI they should be unlocked in DB. My advice is to put host in maintainence, after that please run the following query : update storage_pool_iso_map set status = 0 where storage_id=... (Info about domains is located inside storage_domain_static table) Activate a host, after that host should try to connect to all storages and to pool again and reconstruct will run and I hope will success.
Here are the logs from vdsm:
Thread-29::DEBUG::2013-04-23 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None) Thread-29::DEBUG::2013-04-23 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso' (cwd None) Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::1151::TaskManager.Task::(prepare) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::568::TaskManager.Task::(_updateState) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing -> state finished Thread-29::DEBUG::2013-04-23 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-29::DEBUG::2013-04-23 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-29::DEBUG::2013-04-23 21:36:06,072::task::957::TaskManager.Task::(_decref) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] Thread-30::DEBUG::2013-04-23 21:36:06,112::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state preparing Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, options=None) Thread-30::DEBUG::2013-04-23 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at '__init__' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' for lock type 'exclusive' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking as 'exclusive' (1 active user) Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant) ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted request Thread-30::INFO::2013-04-23 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No session found.\n'; <rc> = 21 Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds Thread-30::DEBUG::2013-04-23 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/multipath' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result Thread-30::DEBUG::2013-04-23 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 Thread-30::WARNING::2013-04-23 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] Thread-30::DEBUG::2013-04-23 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active users) Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out if anyone is waiting for it. Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', Clearing records. Thread-30::ERROR::2013-04-23 21:36:08,526::task::833::TaskManager.Task::(_setError) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 642, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' Thread-30::DEBUG::2013-04-23 21:36:08,527::task::852::TaskManager.Task::(_run) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: f551fa3f-9d8c-4de3-895a-964c821060d4 ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, '0f63de0e-7d98-48ce-99ec-add109f83c4f', '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task Thread-30::DEBUG::2013-04-23 21:36:08,528::task::1177::TaskManager.Task::(stop) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing (force False) Thread-30::DEBUG::2013-04-23 21:36:08,528::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True Thread-30::INFO::2013-04-23 21:36:08,528::task::1134::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: 'Cannot find master domain' - code 304 Thread-30::DEBUG::2013-04-23 21:36:08,529::task::1139::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find master domain Thread-30::DEBUG::2013-04-23 21:36:08,529::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True Thread-30::DEBUG::2013-04-23 21:36:08,529::task::892::TaskManager.Task::(_doAbort) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False Thread-30::DEBUG::2013-04-23 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::DEBUG::2013-04-23 21:36:08,530::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing -> state aborting Thread-30::DEBUG::2013-04-23 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none Thread-30::DEBUG::2013-04-23 21:36:08,531::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> state failed Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::ERROR::2013-04-23 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} [root@vmserver3 vdsm]#
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Thanks Michael! That is exactly what we needed: engine=> select * from storage_pool_iso_map; storage_id | storage_pool_id | status | owner --------------------------------------+--------------------------------------+--------+------- 774e3604-f449-4b3e-8c06-7cd16f98720c | 0f63de0e-7d98-48ce-99ec-add109f83c4f | 5 | 0 baa42b1c-ae2e-4486-88a1-e09e1f7a59cb | 0f63de0e-7d98-48ce-99ec-add109f83c4f | 0 | 0 758c0abb-ea9a-43fb-bcd9-435f75cd0baa | 0f63de0e-7d98-48ce-99ec-add109f83c4f | 0 | 0 (3 rows) engine=> update storage_pool_iso_map set status=0 where storage_id='774e3604-f449-4b3e-8c06-7cd16f98720c'; UPDATE 1 Now my hosts are active, and I can boot my VMs. On Thu, Apr 25, 2013 at 1:24 AM, Michael Kublin <mkublin@redhat.com> wrote:
From: "Yeela Kaplan" <ykaplan@redhat.com> To: "Tommy McNeely" <tommythekid@gmail.com> Cc: users@ovirt.org Sent: Thursday, April 25, 2013 10:08:56 AM Subject: Re: [Users] Master domain locked, error code 304
Hi, Your problem is that the master domain is locked, so the engine does not send connectStorageServer to the vdsm host, and therefore the host does not see the master domain. You need to change the status of the master domain in the db from locked while the host is in maintenance. This can be tricky and not very recommended because if you do it wrong you might corrupt the db. Another, safer, way that I recommend is try to do connectStorageServer to the masterSD from vdsClient on the vdsm host and see what happens, it might solve your problem.
-- Yeela
----- Original Message -----
From: "Tommy McNeely" <tommythekid@gmail.com> To: "Juan Jose" <jj197005@gmail.com> Cc: users@ovirt.org Sent: Wednesday, April 24, 2013 7:30:20 PM Subject: Re: [Users] Master domain locked, error code 304
Hi Juan,
That sounds like a possible path to follow. Our "master" domain does not have any VMs in it. If no one else responds with an official path to resolution, then I will try going into the database and hacking it like that. I
it has something to do with the version or the metadata??
[root@vmserver3 dom_md]# cat metadata CLASS=Data DESCRIPTION=SFOTestMaster1 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=SFODC01
POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active
POOL_SPM_ID=1 POOL_SPM_LVER=4 POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f REMOTE_PATH=10.101.0.148:/c/vpt1-master ROLE=Master SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c TYPE=NFS VERSION=0 _SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08
On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose < jj197005@gmail.com > wrote:
Hello Tommy,
I had a similar experience and after try to recover my storage domain, I realized that my VMs had missed. You have to verify if your VM disks are inside of your storage domain. In my case, I had to add a new a new Storage domain as Master domain to be able to remove the old VMs from DB and reattach the old storage domain. I hope this were not your case. If you haven't lost your VMs it's possible that you can recover them.
Good luck,
Juanjo.
On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely < tommythekid@gmail.com> wrote:
We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get
----- Original Message ----- think the
hosts to activate. They are unable to activate the "master" domain. The master storage domain show "locked" while the other storage domains show Unknown (disks) and inactive (ISO) All the domains are on the same NFS server, we are able to mount it, the permissions are good. We believe we might be getting bit by https://bugzilla.redhat.com/show_bug.cgi?id=920694 or http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
Michael Kublin Apr 10
Patch Set 5: Do not submit
Liron, please abondon this work. This interacts with host life cycle which will be changed, during a change a following problem will be solved as well.
So, We were wondering what we can do to get our oVirt back online, or rather what the correct way is to solve this. We have a few VMs that are down which we are looking for ways to recover as quickly as possible.
Thanks in advance, Tommy
Here are the ovirt-engine logs:
2013-04-23 21:30:04,041 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' 2013-04-23 21:30:04,043 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
(pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34 2013-04-23 21:30:04,049 WARN [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand] (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain failed.
Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status
Locked
Hi, domain stuck in status Locked it is a bug and it is not directly related to discussed patch. No actions in vdsm can help in such situation, please do the following:
If domains are marked as Locked in GUI they should be unlocked in DB. My advice is to put host in maintainence, after that please run the following query : update storage_pool_iso_map set status = 0 where storage_id=... (Info about domains is located inside storage_domain_static table) Activate a host, after that host should try to connect to all storages and to pool again and reconstruct will run and I hope will success.
Here are the logs from vdsm:
Thread-29::DEBUG::2013-04-23 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo
/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None) Thread-29::DEBUG::2013-04-23 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148: _c_vpool-iso' (cwd None) Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::1151::TaskManager.Task::(prepare) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist': [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]} Thread-29::DEBUG::2013-04-23 21:36:06,071::task::568::TaskManager.Task::(_updateState) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state
-n preparing ->
state finished Thread-29::DEBUG::2013-04-23 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-29::DEBUG::2013-04-23 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-29::DEBUG::2013-04-23 21:36:06,072::task::957::TaskManager.Task::(_decref) Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197] Thread-30::DEBUG::2013-04-23 21:36:06,112::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state preparing Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1, scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f', msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73, options=None) Thread-30::DEBUG::2013-04-23 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__)
ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request
was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at '__init__' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::504::ResourceManager::(registerResource) Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' for lock type 'exclusive' Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::547::ResourceManager::(registerResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking as 'exclusive' (1 active user) Thread-30::DEBUG::2013-04-23 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant)
request Thread-30::INFO::2013-04-23 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain: 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73) Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage) Thread-30::DEBUG::2013-04-23 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan) Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling method Thread-30::DEBUG::2013-04-23 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = 'iscsiadm: No session found.\n'; <rc> = 21 Thread-30::DEBUG::2013-04-23 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host0/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host1/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd of=/sys/class/scsi_host/host2/scan' (cwd None) MainProcess|Thread-30::DEBUG::2013-04-23 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI scan, this will take up to 30 seconds Thread-30::DEBUG::2013-04-23 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/multipath' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0 Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result Thread-30::DEBUG::2013-04-23 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"r%.*%\\" ] } global { locking_type=1
ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted prioritise_write_locks=1
wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o
774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None) Thread-30::DEBUG::2013-04-23 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5 Thread-30::WARNING::2013-04-23 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found'] Thread-30::DEBUG::2013-04-23 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-30::DEBUG::2013-04-23 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource) Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource) Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active users) Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource) Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out if anyone is waiting for it. Thread-30::DEBUG::2013-04-23 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f', Clearing records. Thread-30::ERROR::2013-04-23 21:36:08,526::task::833::TaskManager.Task::(_setError) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 42, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 642, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c' Thread-30::DEBUG::2013-04-23 21:36:08,527::task::852::TaskManager.Task::(_run) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run: f551fa3f-9d8c-4de3-895a-964c821060d4 ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1, '0f63de0e-7d98-48ce-99ec-add109f83c4f', '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task Thread-30::DEBUG::2013-04-23 21:36:08,528::task::1177::TaskManager.Task::(stop) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state
(force False) Thread-30::DEBUG::2013-04-23 21:36:08,528::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True Thread-30::INFO::2013-04-23 21:36:08,528::task::1134::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted: 'Cannot find master domain' - code 304 Thread-30::DEBUG::2013-04-23 21:36:08,529::task::1139::TaskManager.Task::(prepare) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find master domain Thread-30::DEBUG::2013-04-23 21:36:08,529::task::957::TaskManager.Task::(_decref) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True Thread-30::DEBUG::2013-04-23 21:36:08,529::task::892::TaskManager.Task::(_doAbort) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False Thread-30::DEBUG::2013-04-23 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::DEBUG::2013-04-23 21:36:08,530::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free preparing preparing ->
state aborting Thread-30::DEBUG::2013-04-23 21:36:08,530::task::523::TaskManager.Task::(__state_aborting) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none Thread-30::DEBUG::2013-04-23 21:36:08,531::task::568::TaskManager.Task::(_updateState) Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting -> state failed Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-30::DEBUG::2013-04-23 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-30::ERROR::2013-04-23 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot find master domain: 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f, msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}} [root@vmserver3 vdsm]#
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (4)
-
Juan Jose
-
Michael Kublin
-
Tommy McNeely
-
Yeela Kaplan