Hi,
Your problem is that the master domain is locked, so the engine does not send
connectStorageServer to the vdsm host,
and therefore the host does not see the master domain.
You need to change the status of the master domain in the db from locked while the host is
in maintenance.
This can be tricky and not very recommended because if you do it wrong you might corrupt
the db.
Another, safer, way that I recommend is try to do connectStorageServer to the masterSD
from vdsClient on the vdsm host and see what happens, it might solve your problem.
--
Yeela
----- Original Message -----
From: "Tommy McNeely" <tommythekid(a)gmail.com>
To: "Juan Jose" <jj197005(a)gmail.com>
Cc: users(a)ovirt.org
Sent: Wednesday, April 24, 2013 7:30:20 PM
Subject: Re: [Users] Master domain locked, error code 304
Hi Juan,
That sounds like a possible path to follow. Our "master" domain does not have
any VMs in it. If no one else responds with an official path to resolution,
then I will try going into the database and hacking it like that. I think it
has something to do with the version or the metadata??
[root@vmserver3 dom_md]# cat metadata
CLASS=Data
DESCRIPTION=SFOTestMaster1
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
MASTER_VERSION=1
POOL_DESCRIPTION=SFODC01
POOL_DOMAINS=774e3604-f449-4b3e-8c06-7cd16f98720c:Active,758c0abb-ea9a-43fb-bcd9-435f75cd0baa:Active,baa42b1c-ae2e-4486-88a1-e09e1f7a59cb:Active
POOL_SPM_ID=1
POOL_SPM_LVER=4
POOL_UUID=0f63de0e-7d98-48ce-99ec-add109f83c4f
REMOTE_PATH=10.101.0.148:/c/vpt1-master
ROLE=Master
SDUUID=774e3604-f449-4b3e-8c06-7cd16f98720c
TYPE=NFS
VERSION=0
_SHA_CKSUM=fa8ef0e7cd5e50e107384a146e4bfc838d24ba08
On Wed, Apr 24, 2013 at 5:57 AM, Juan Jose < jj197005(a)gmail.com > wrote:
Hello Tommy,
I had a similar experience and after try to recover my storage domain, I
realized that my VMs had missed. You have to verify if your VM disks are
inside of your storage domain. In my case, I had to add a new a new Storage
domain as Master domain to be able to remove the old VMs from DB and
reattach the old storage domain. I hope this were not your case. If you
haven't lost your VMs it's possible that you can recover them.
Good luck,
Juanjo.
On Wed, Apr 24, 2013 at 6:43 AM, Tommy McNeely < tommythekid(a)gmail.com >
wrote:
We had a hard crash (network, then power) on our 2 node Ovirt Cluster. We
have NFS datastore on CentOS 6 (3.2.0-1.39.el6). We can no longer get the
hosts to activate. They are unable to activate the "master" domain. The
master storage domain show "locked" while the other storage domains show
Unknown (disks) and inactive (ISO) All the domains are on the same NFS
server, we are able to mount it, the permissions are good. We believe we
might be getting bit by
https://bugzilla.redhat.com/show_bug.cgi?id=920694
or
http://gerrit.ovirt.org/#/c/13709/ which says to cease working on it:
Michael Kublin Apr 10
Patch Set 5: Do not submit
Liron, please abondon this work. This interacts with host life cycle which
will be changed, during a change a following problem will be solved as well.
So, We were wondering what we can do to get our oVirt back online, or rather
what the correct way is to solve this. We have a few VMs that are down which
we are looking for ways to recover as quickly as possible.
Thanks in advance,
Tommy
Here are the ovirt-engine logs:
2013-04-23 21:30:04,041 ERROR
[org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command
ConnectStoragePoolVDS execution failed. Exception:
IRSNoMasterDomainException: IRSGenericException: IRSErrorException:
IRSNoMasterDomainException: Cannot find master domain:
'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f,
msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'
2013-04-23 21:30:04,043 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand]
(pool-3-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 50524b34
2013-04-23 21:30:04,049 WARN
[org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand]
(pool-3-thread-49) [7c5867d6] CanDoAction of action ReconstructMasterDomain
failed.
Reasons:VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status
Locked
Here are the logs from vdsm:
Thread-29::DEBUG::2013-04-23
21:36:05,906::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3
10.101.0.148:/c/vpt1-vmdisks1
/rhev/data-center/mnt/10.101.0.148:_c_vpt1-vmdisks1' (cwd None)
Thread-29::DEBUG::2013-04-23
21:36:06,008::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
/bin/mount -t nfs -o soft,nosharecache,timeo=600,retrans=6,nfsvers=3
10.101.0.148:/c/vpool-iso /rhev/data-center/mnt/10.101.0.148:_c_vpool-iso'
(cwd None)
Thread-29::INFO::2013-04-23 21:36:06,065::logUtils::44::dispatcher::(wrapper)
Run and protect: connectStorageServer, Return response: {'statuslist':
[{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'},
{'status': 0,
'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]}
Thread-29::DEBUG::2013-04-23
21:36:06,071::task::1151::TaskManager.Task::(prepare)
Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::finished: {'statuslist':
[{'status': 0, 'id': '7c19bd42-c3dc-41b9-b81b-d9b75214b8dc'},
{'status': 0,
'id': 'eff2ef61-0b12-4429-b087-8742be17ae90'}]}
Thread-29::DEBUG::2013-04-23
21:36:06,071::task::568::TaskManager.Task::(_updateState)
Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::moving from state preparing ->
state finished
Thread-29::DEBUG::2013-04-23
21:36:06,071::resourceManager::830::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-29::DEBUG::2013-04-23
21:36:06,072::resourceManager::864::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-29::DEBUG::2013-04-23
21:36:06,072::task::957::TaskManager.Task::(_decref)
Task=`48337e40-2446-4357-b6dc-2c86f4da67e2`::ref 0 aborting False
Thread-30::DEBUG::2013-04-23 21:36:06,112::BindingXMLRPC::161::vds::(wrapper)
[10.101.0.197]
Thread-30::DEBUG::2013-04-23
21:36:06,112::task::568::TaskManager.Task::(_updateState)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state init -> state
preparing
Thread-30::INFO::2013-04-23 21:36:06,113::logUtils::41::dispatcher::(wrapper)
Run and protect:
connectStoragePool(spUUID='0f63de0e-7d98-48ce-99ec-add109f83c4f', hostID=1,
scsiKey='0f63de0e-7d98-48ce-99ec-add109f83c4f',
msdUUID='774e3604-f449-4b3e-8c06-7cd16f98720c', masterVersion=73,
options=None)
Thread-30::DEBUG::2013-04-23
21:36:06,113::resourceManager::190::ResourceManager.Request::(__init__)
ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Request
was made in '/usr/share/vdsm/storage/resourceManager.py' line '189' at
'__init__'
Thread-30::DEBUG::2013-04-23
21:36:06,114::resourceManager::504::ResourceManager::(registerResource)
Trying to register resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f'
for lock type 'exclusive'
Thread-30::DEBUG::2013-04-23
21:36:06,114::resourceManager::547::ResourceManager::(registerResource)
Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free. Now locking
as 'exclusive' (1 active user)
Thread-30::DEBUG::2013-04-23
21:36:06,114::resourceManager::227::ResourceManager.Request::(grant)
ResName=`Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f`ReqID=`ee74329a-0a92-465a-be50-b8acc6d7246a`::Granted
request
Thread-30::INFO::2013-04-23
21:36:06,115::sp::625::Storage.StoragePool::(connect) Connect host #1 to the
storage pool 0f63de0e-7d98-48ce-99ec-add109f83c4f with master domain:
774e3604-f449-4b3e-8c06-7cd16f98720c (ver = 73)
Thread-30::DEBUG::2013-04-23
21:36:06,116::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm
invalidate operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:06,116::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm
invalidate operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:06,117::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm
invalidate operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:06,117::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm
invalidate operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:06,117::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm
invalidate operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:06,118::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm
invalidate operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:06,118::misc::1054::SamplingMethod::(__call__) Trying to enter
sampling method (storage.sdc.refreshStorage)
Thread-30::DEBUG::2013-04-23
21:36:06,118::misc::1056::SamplingMethod::(__call__) Got in to sampling
method
Thread-30::DEBUG::2013-04-23
21:36:06,119::misc::1054::SamplingMethod::(__call__) Trying to enter
sampling method (storage.iscsi.rescan)
Thread-30::DEBUG::2013-04-23
21:36:06,119::misc::1056::SamplingMethod::(__call__) Got in to sampling
method
Thread-30::DEBUG::2013-04-23
21:36:06,119::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
/sbin/iscsiadm -m session -R' (cwd None)
Thread-30::DEBUG::2013-04-23
21:36:06,136::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> =
'iscsiadm: No session found.\n'; <rc> = 21
Thread-30::DEBUG::2013-04-23
21:36:06,136::misc::1064::SamplingMethod::(__call__) Returning last result
MainProcess|Thread-30::DEBUG::2013-04-23
21:36:06,139::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd
of=/sys/class/scsi_host/host0/scan' (cwd None)
MainProcess|Thread-30::DEBUG::2013-04-23
21:36:06,142::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd
of=/sys/class/scsi_host/host1/scan' (cwd None)
MainProcess|Thread-30::DEBUG::2013-04-23
21:36:06,146::misc::84::Storage.Misc.excCmd::(<lambda>) '/bin/dd
of=/sys/class/scsi_host/host2/scan' (cwd None)
MainProcess|Thread-30::DEBUG::2013-04-23
21:36:06,149::iscsi::402::Storage.ISCSI::(forceIScsiScan) Performing SCSI
scan, this will take up to 30 seconds
Thread-30::DEBUG::2013-04-23
21:36:08,152::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
/sbin/multipath' (cwd None)
Thread-30::DEBUG::2013-04-23
21:36:08,254::misc::84::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> =
'';
<rc> = 0
Thread-30::DEBUG::2013-04-23
21:36:08,256::lvm::477::OperationMutex::(_invalidateAllPvs) Operation 'lvm
invalidate operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,256::lvm::479::OperationMutex::(_invalidateAllPvs) Operation 'lvm
invalidate operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,257::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm
invalidate operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,257::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm
invalidate operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,258::lvm::508::OperationMutex::(_invalidateAllLvs) Operation 'lvm
invalidate operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,258::lvm::510::OperationMutex::(_invalidateAllLvs) Operation 'lvm
invalidate operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,258::misc::1064::SamplingMethod::(__call__) Returning last result
Thread-30::DEBUG::2013-04-23
21:36:08,259::lvm::368::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' got the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,261::misc::84::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n
/sbin/lvm vgs --config " devices { preferred_names =
[\\"^/dev/mapper/\\"]
ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3
filter = [ \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1
wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings
--units b --nosuffix --separator | -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free
774e3604-f449-4b3e-8c06-7cd16f98720c' (cwd None)
Thread-30::DEBUG::2013-04-23
21:36:08,514::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> =
'
Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found\n';
<rc> = 5
Thread-30::WARNING::2013-04-23
21:36:08,516::lvm::373::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
Volume group "774e3604-f449-4b3e-8c06-7cd16f98720c" not found']
Thread-30::DEBUG::2013-04-23
21:36:08,518::lvm::397::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' released the operation mutex
Thread-30::DEBUG::2013-04-23
21:36:08,524::resourceManager::557::ResourceManager::(releaseResource)
Trying to release resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f'
Thread-30::DEBUG::2013-04-23
21:36:08,525::resourceManager::573::ResourceManager::(releaseResource)
Released resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' (0 active
users)
Thread-30::DEBUG::2013-04-23
21:36:08,525::resourceManager::578::ResourceManager::(releaseResource)
Resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f' is free, finding out
if anyone is waiting for it.
Thread-30::DEBUG::2013-04-23
21:36:08,525::resourceManager::585::ResourceManager::(releaseResource) No
one is waiting for resource 'Storage.0f63de0e-7d98-48ce-99ec-add109f83c4f',
Clearing records.
Thread-30::ERROR::2013-04-23
21:36:08,526::task::833::TaskManager.Task::(_setError)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 840, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 42, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 926, in connectStoragePool
masterVersion, options)
File "/usr/share/vdsm/storage/hsm.py", line 973, in _connectStoragePool
res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
File "/usr/share/vdsm/storage/sp.py", line 642, in connect
self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
File "/usr/share/vdsm/storage/sp.py", line 1166, in __rebuild
self.masterDomain = self.getMasterDomain(msdUUID=msdUUID,
masterVersion=masterVersion)
File "/usr/share/vdsm/storage/sp.py", line 1505, in getMasterDomain
raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
StoragePoolMasterNotFound: Cannot find master domain:
'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f,
msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'
Thread-30::DEBUG::2013-04-23
21:36:08,527::task::852::TaskManager.Task::(_run)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._run:
f551fa3f-9d8c-4de3-895a-964c821060d4
('0f63de0e-7d98-48ce-99ec-add109f83c4f', 1,
'0f63de0e-7d98-48ce-99ec-add109f83c4f',
'774e3604-f449-4b3e-8c06-7cd16f98720c', 73) {} failed - stopping task
Thread-30::DEBUG::2013-04-23
21:36:08,528::task::1177::TaskManager.Task::(stop)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::stopping in state preparing
(force False)
Thread-30::DEBUG::2013-04-23
21:36:08,528::task::957::TaskManager.Task::(_decref)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 1 aborting True
Thread-30::INFO::2013-04-23
21:36:08,528::task::1134::TaskManager.Task::(prepare)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::aborting: Task is aborted:
'Cannot find master domain' - code 304
Thread-30::DEBUG::2013-04-23
21:36:08,529::task::1139::TaskManager.Task::(prepare)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Prepare: aborted: Cannot find
master domain
Thread-30::DEBUG::2013-04-23
21:36:08,529::task::957::TaskManager.Task::(_decref)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::ref 0 aborting True
Thread-30::DEBUG::2013-04-23
21:36:08,529::task::892::TaskManager.Task::(_doAbort)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::Task._doAbort: force False
Thread-30::DEBUG::2013-04-23
21:36:08,530::resourceManager::864::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-30::DEBUG::2013-04-23
21:36:08,530::task::568::TaskManager.Task::(_updateState)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state preparing ->
state aborting
Thread-30::DEBUG::2013-04-23
21:36:08,530::task::523::TaskManager.Task::(__state_aborting)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::_aborting: recover policy none
Thread-30::DEBUG::2013-04-23
21:36:08,531::task::568::TaskManager.Task::(_updateState)
Task=`f551fa3f-9d8c-4de3-895a-964c821060d4`::moving from state aborting ->
state failed
Thread-30::DEBUG::2013-04-23
21:36:08,531::resourceManager::830::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-30::DEBUG::2013-04-23
21:36:08,531::resourceManager::864::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-30::ERROR::2013-04-23
21:36:08,532::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status':
{'message': "Cannot find master domain:
'spUUID=0f63de0e-7d98-48ce-99ec-add109f83c4f,
msdUUID=774e3604-f449-4b3e-8c06-7cd16f98720c'", 'code': 304}}
[root@vmserver3 vdsm]#
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users