[Users] Master domain locked, error code 304

Tommy McNeely tommythekid at gmail.com
Wed Apr 24 12:30:20 EDT 2013


Hi Juan,

That sounds like a possible path to follow. Our "master" domain does not
have any VMs in it. If no one else responds with an official path to
resolution, then I will try going into the database and hacking it like
that. I think it has something to do with the version or the metadata??

[root at vmserver3 dom_md]# cat metadata
CLASS=Data
DESCRIPTION=SFOTestMaster1
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
MASTER_VERSION=1
POOL_DESCRIPTION=SFODC01
POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active
POOL_SPM_ID=1
POOL_SPM_LVER=4
POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f
REMOTE_PATH=10.101.0.148:/c/vpt1-master
ROLE=Master
SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c
TYPE=NFS
VERSION=0
_SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08


On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose <jj197005 at gmail.com> wrote:

> Hello Tommy,
>
> I had a similar experience and after try to recover my storage domain, I
> realized that my VMs had missed. You have to verify if your VM disks are
> inside of your storage domain. In my case, I had to add a new a new Storage
> domain as Master domain to be able to remove the old VMs from DB and
> reattach the old storage domain. I hope this were not your case. If you
> haven't lost your VMs it's possible that you can recover them.
>
> Good luck,
>
> Juanjo.
>
>
> On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely <tommythekid at gmail.com>wrote:
>
>>
>> We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We
>> have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the
>> hosts to activate. They are unable to activate the "master" domain. The
>> master storage domain show "locked" while the other storage domains show
>> Unknown (disks) and inactive (ISO) All the domains are on the same NFS
>> server, we are able to mount it, the permissions are good. We believe we
>> might be getting bit by
>> https://bugzilla.redhat.com/show_bug.cgi?id=920694 or
>> http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
>>
>> Michael Kublin Apr 10
>>
>> Patch Set 5: Do not submit
>>
>> Liron, please abondon this work. This interacts with host life cycle
>> which will be changed, during a change a following problem will be solved
>> as well.
>>
>>
>> So, We were wondering what we can do to get our oVirt back online, or
>> rather what the correct way is to solve this. We have a few VMs that are
>> down which we are looking for ways to recover as quickly as possible.
>>
>> Thanks in advance,
>> Tommy
>>
>> Here are the ovirt-engine logs:
>>
>> 2013-04-23 21:30:04,041 ERROR
>> [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command
>> ConnectStoragePoolVDS execution failed. Exception:
>> IRSNoMasterDomainException: IRSGenericException: IRSErrorException:
>> IRSNoMasterDomainException: Cannot find master domain:
>> 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f,
>> msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'
>> 2013-04-23 21:30:04,043 INFO
>>  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
>> (pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34
>> 2013-04-23 21:30:04,049 WARN
>>  [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand]
>> (pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain
>> failed.
>> Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status
>> Locked
>>
>>
>>
>> Here are the logs from vdsm:
>>
>> Thread-29::DEBUG::2013-04-23
>> 21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
>> /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3
>> 10.101.0.148:/c/vpt1-vmdisks1 /rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1'
>> (cwd None)
>> Thread-29::DEBUG::2013-04-23
>> 21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
>> /bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3
>> 10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso'
>> (cwd None)
>> Thread-29::INFO::2013-04-23
>> 21:36:06,065::logUtils::44::dispatcher::(wrapper) Run and protect:
>> connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id':
>> '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0, 'id':
>> 'eff2ef61-0b12-4429-b087-8742be17ae90'}]}
>> Thread-29::DEBUG::2013-04-23
>> 21:36:06,071::task::1151::TaskManager.Task::(prepare)
>> Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist':
>> [{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'}, {'status': 0,
>> 'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]}
>> Thread-29::DEBUG::2013-04-23
>> 21:36:06,071::task::568::TaskManager.Task::(_updateState)
>> Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing ->
>> state finished
>> Thread-29::DEBUG::2013-04-23
>> 21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll)
>> Owner.releaseAll requests {} resources {}
>> Thread-29::DEBUG::2013-04-23
>> 21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll)
>> Owner.cancelAll requests {}
>> Thread-29::DEBUG::2013-04-23
>> 21:36:06,072::task::957::TaskManager.Task::(_decref)
>> Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,112::BindingXMLRPC::161::vds::(wrapper) [10.101.0.197]
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,112::task::568::TaskManager.Task::(_updateState)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init ->
>> state preparing
>> Thread-30::INFO::2013-04-23
>> 21:36:06,113::logUtils::41::dispatcher::(wrapper) Run and protect:
>> connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1,
>> scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f',
>> msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73,
>> options=None)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__)
>> ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request
>> was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at
>> '__init__'
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,114::resourceManager::504::ResourceManager::(registerResource)
>> Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f'
>> for lock type 'exclusive'
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,114::resourceManager::547::ResourceManager::(registerResource)
>> Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now
>> locking as 'exclusive' (1 active user)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,114::resourceManager::227::ResourceManager.Request::(grant)
>> ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted
>> request
>> Thread-30::INFO::2013-04-23
>> 21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to
>> the storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain:
>> 774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm
>> invalidate operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm
>> invalidate operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm
>> invalidate operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm
>> invalidate operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm
>> invalidate operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm
>> invalidate operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter
>> sampling method (storage.sdc.refreshStorage)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling
>> method
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter
>> sampling method (storage.iscsi.rescan)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling
>> method
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
>> /sbin/iscsiadm -m session -R' (cwd None)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> =
>> 'iscsiadm: No session found.\n'; <rc> = 21
>> Thread-30::DEBUG::2013-04-23
>> 21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result
>> MainProcess|Thread-30::DEBUG::2013-04-23
>> 21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd
>> of=/sys/class/scsi_host/host0/scan' (cwd None)
>> MainProcess|Thread-30::DEBUG::2013-04-23
>> 21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd
>> of=/sys/class/scsi_host/host1/scan' (cwd None)
>>  MainProcess|Thread-30::DEBUG::2013-04-23
>> 21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd
>> of=/sys/class/scsi_host/host2/scan' (cwd None)
>> MainProcess|Thread-30::DEBUG::2013-04-23
>> 21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI
>> scan, this will take up to 30 seconds
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
>> /sbin/multipath' (cwd None)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> =
>> ''; <rc> = 0
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm
>> invalidate operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm
>> invalidate operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm
>> invalidate operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm
>> invalidate operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm
>> invalidate operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm
>> invalidate operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload
>> operation' got the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
>> /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"]
>> ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3
>> filter = [ \\"r%.*%\\" ] }  global {  locking_type=1
>>  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50
>>  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o
>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free
>> 774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = '
>>  Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n'; <rc> = 5
>> Thread-30::WARNING::2013-04-23
>> 21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
>>  Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found']
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload
>> operation' released the operation mutex
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,524::resourceManager::557::ResourceManager::(releaseResource)
>> Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f'
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,525::resourceManager::573::ResourceManager::(releaseResource)
>> Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active
>> users)
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,525::resourceManager::578::ResourceManager::(releaseResource)
>> Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding
>> out if anyone is waiting for it.
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No
>> one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f',
>> Clearing records.
>> Thread-30::ERROR::2013-04-23
>> 21:36:08,526::task::833::TaskManager.Task::(_setError)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/task.py", line 840, in _run
>>     return fn(*args, **kargs)
>>   File "/usr/share/vdsm/logUtils.py", line 42, in wrapper
>>     res = f(*args, **kwargs)
>>   File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool
>>     masterVersion, options)
>>   File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool
>>     res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
>>   File "/usr/share/vdsm/storage/sp.py", line 642, in connect
>>     self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
>>   File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild
>>     self.masterDomain = self.getMasterDomain(msdUUID=msdUUID,
>> masterVersion=masterVersion)
>>   File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain
>>     raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
>> StoragePoolMasterNotFound: Cannot find master domain:
>> 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f,
>> msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,527::task::852::TaskManager.Task::(_run)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run:
>> f551fa3f-9d8c-4de3-895a-964c821060d4
>> ('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1,
>> '0f63de0e-7d98-48ce-99ec-add109f83c4f',
>> '774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,528::task::1177::TaskManager.Task::(stop)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing
>> (force False)
>>  Thread-30::DEBUG::2013-04-23
>> 21:36:08,528::task::957::TaskManager.Task::(_decref)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True
>> Thread-30::INFO::2013-04-23
>> 21:36:08,528::task::1134::TaskManager.Task::(prepare)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted:
>> 'Cannot find master domain' - code 304
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,529::task::1139::TaskManager.Task::(prepare)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find
>> master domain
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,529::task::957::TaskManager.Task::(_decref)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,529::task::892::TaskManager.Task::(_doAbort)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll)
>> Owner.cancelAll requests {}
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,530::task::568::TaskManager.Task::(_updateState)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing ->
>> state aborting
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,530::task::523::TaskManager.Task::(__state_aborting)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none
>>  Thread-30::DEBUG::2013-04-23
>> 21:36:08,531::task::568::TaskManager.Task::(_updateState)
>> Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting ->
>> state failed
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll)
>> Owner.releaseAll requests {} resources {}
>> Thread-30::DEBUG::2013-04-23
>> 21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll)
>> Owner.cancelAll requests {}
>> Thread-30::ERROR::2013-04-23
>> 21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status':
>> {'message': "Cannot find master domain:
>> 'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f,
>> msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}}
>> [root at vmserver3 vdsm]#
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20130424/cb197782/attachment-0001.html>


More information about the Users mailing list