[Users] Error creating the first storage domain (NFS)

Brian Vetter bjvetter at gmail.com
Wed Oct 24 00:02:31 UTC 2012


I reinstalled my system and ran the setup again. This time I configured both my host and the ovirt-engine systems to use nfs3 (it was using nfs4 by default). After getting all of the iptables straightened out (nfs3 apparently ignores the port settings in /etc/sysconfig/nfs and instead looks at /etc/services), I was able to do mounts between the two systems. 

When I attempt to add the storage domain, I am getting the same error as before (from sanlock.log):

2012-10-23 18:45:16-0500 8418 [979]: s1 lockspace 42c7d146-86e1-403f-97de-1da0dcbf95ec:250:/rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage/42c7d146-86e1-403f-97de-1da0dcbf95ec/dom_md/ids:0
2012-10-23 18:45:16-0500 8418 [4285]: open error -13 /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage/42c7d146-86e1-403f-97de-1da0dcbf95ec/dom_md/ids
2012-10-23 18:45:16-0500 8418 [4285]: s1 open_disk /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage/42c7d146-86e1-403f-97de-1da0dcbf95ec/dom_md/ids error -13

It all goes downhill from there. So it doesn't appear to be nfs4 vs nfsv3 related.

I can send more logs, but they are pretty much the same as what I sent before. Also, if it wasn't clear from before, I'm running the ovirt-engine on a full fedora 17 system and I am running the host on a minimal fc17 system with the kernel at version 3.3.4-5.fc17 (to avoid the prior nfs hanging issues).

Brian

On Oct 23, 2012, at 4:38 AM, Vered Volansky wrote:

> Hi Brian,
> 
> We'll need your engine & host (full) logs at the very least to look into the problem.
> Can you try it with nfs3 and tell us if it works?
> 
> Note, more comments in the email body.
> 
> Regards,
> Vered
> 
> ----- Original Message -----
>> From: "Brian Vetter" <bjvetter at gmail.com>
>> To: users at ovirt.org
>> Sent: Tuesday, October 23, 2012 5:06:06 AM
>> Subject: [Users] Error creating the first storage domain (NFS)
>> 
>> 
>> I have reinstalled my ovirt installation using the nightly builds so
>> that I can try out non-admin REST API access to ovirt. After
>> installing the engine, connecting to my directory system, creating a
>> domain, and adding a host (all successfully), I tried to add my
>> first storage domain (NFS).
>> 
>> 
>> While creating the storage domain, I get an error at the end along
>> with a couple of events that say:
>> 
>> 
>> 
>> 
>> "Failed to attach Storage Domains to Data Center DCC. (User:
>> admin at internal)"
>> 
>> 
>> followed by:
>> 
>> 
>> 
>> 
>> "Failed to attach Storage Domain DCVMStorage to Data Center DCC.
>> (User: admin at internal)"
>> 
>> 
>> I see the following in the engine.log file:
>> 
>> 
>> 
>> 
>> 
>> 2012-10-22 20:17:57,617 WARN
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (ajp--127.0.0.1-8009-7) [7d1ffd97] Weird return value: Class Name:
>> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
>> 
>> mCode 661
>> 
>> mMessage Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 
>> 
>> 
>> 2012-10-22 20:17:57,619 WARN
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (ajp--127.0.0.1-8009-7) [7d1ffd97] Weird return value: Class Name:
>> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
>> 
>> mCode 661
>> 
>> mMessage Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 
>> 
>> 
>> 2012-10-22 20:17:57,620 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (ajp--127.0.0.1-8009-7) [7d1ffd97] Failed in CreateStoragePoolVDS
>> method
>> 
>> 2012-10-22 20:17:57,620 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (ajp--127.0.0.1-8009-7) [7d1ffd97] Error code unexpected and error
>> message VDSGenericException: VDSErrorException: Failed to
>> CreateStoragePoolVDS, error = Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 
>> On the host where it tried to install from, I see the following in
>> the vdsm.log:
>> 
>> 
>> 
>> 
>> 
>> Thread-243::INFO::2012-10-22
>> 20:17:56,624::safelease::156::SANLock::(acquireHostId) Acquiring
>> host id for domain b97019e9-bd43-46d8-afd0-421d6768271b (id: 250)
>> 
>> Thread-243::ERROR::2012-10-22
>> 20:17:57,628::task::853::TaskManager.Task::(_setError)
>> Task=`1ead54dc-407c-4d0b-96f4-8dc56c74d4cf`::Unexpected error
>> 
>> Traceback (most recent call last):
>> 
>> File "/usr/share/vdsm/storage/task.py", line 861, in _run
>> 
>> return fn(*args, **kargs)
>> 
>> File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
>> 
>> res = f(*args, **kwargs)
>> 
>> File "/usr/share/vdsm/storage/hsm.py", line 790, in createStoragePool
>> 
>> return sp.StoragePool(spUUID, self.taskMng).create(poolName,
>> masterDom, domList, masterV
>> 
>> ersion, safeLease)
>> 
>> File "/usr/share/vdsm/storage/sp.py", line 567, in create
>> 
>> self._acquireTemporaryClusterLock(msdUUID, safeLease)
>> 
>> File "/usr/share/vdsm/storage/sp.py", line 508, in
>> _acquireTemporaryClusterLock
>> 
>> msd.acquireHostId(self.id)
>> 
>> File "/usr/share/vdsm/storage/sd.py", line 407, in acquireHostId
>> 
>> self._clusterLock.acquireHostId(hostId)
>> 
>> File "/usr/share/vdsm/storage/safelease.py", line 162, in
>> acquireHostId
>> 
>> raise se.AcquireHostIdFailure(self._sdUUID, e)
>> 
>> AcquireHostIdFailure: Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 
>> After I get this error, I logged into the host and see that the nfs
>> mount is present:
>> 
>> 
>> 
>> 
>> 
>> eos.dcc.mobi:/home/vmstorage on
>> /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage type nfs4
>> (rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=10.1.1.12,minorversion=0,local_lock=none,addr=10.1.1.11)
>> 
>> 
>> And when I look at the directory, I see the following:
>> 
>> 
>> 
>> 
>> 
>> [root at mech ~]# ls -laR
>> /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage
>> 
>> /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage:
>> 
>> total 12
>> 
>> drwxr-xr-x. 3 vdsm kvm 4096 Oct 22 20:17 .
>> 
>> drwxr-xr-x. 6 vdsm kvm 4096 Oct 22 20:17 ..
>> 
>> drwxr-xr-x. 4 vdsm kvm 4096 Oct 22 20:17
>> b97019e9-bd43-46d8-afd0-421d6768271b
>> 
>> 
>> 
>> 
>> /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage/b97019e9-bd43-46d8-afd0-421d6768271b:
>> 
>> total 16
>> 
>> drwxr-xr-x. 4 vdsm kvm 4096 Oct 22 20:17 .
>> 
>> drwxr-xr-x. 3 vdsm kvm 4096 Oct 22 20:17 ..
>> 
>> drwxr-xr-x. 2 vdsm kvm 4096 Oct 22 20:17 dom_md
>> 
>> drwxr-xr-x. 2 vdsm kvm 4096 Oct 22 20:17 images
>> 
>> 
>> 
>> 
>> /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage/b97019e9-bd43-46d8-afd0-421d6768271b/dom_md:
>> 
>> total 2060
>> 
>> drwxr-xr-x. 2 vdsm kvm 4096 Oct 22 20:17 .
>> 
>> drwxr-xr-x. 4 vdsm kvm 4096 Oct 22 20:17 ..
>> 
>> -rw-rw----. 1 vdsm kvm 1048576 Oct 22 20:17 ids
>> 
>> -rw-rw----. 1 vdsm kvm 0 Oct 22 20:17 inbox
>> 
>> -rw-rw----. 1 vdsm kvm 1048576 Oct 22 20:17 leases
>> 
>> -rw-r--r--. 1 vdsm kvm 308 Oct 22 20:17 metadata
>> 
>> -rw-rw----. 1 vdsm kvm 0 Oct 22 20:17 outbox
>> 
>> 
>> 
>> 
>> /rhev/data-center/mnt/eos.dcc.mobi:_home_vmstorage/b97019e9-bd43-46d8-afd0-421d6768271b/images:
>> 
>> total 8
>> 
>> drwxr-xr-x. 2 vdsm kvm 4096 Oct 22 20:17 .
>> 
>> drwxr-xr-x. 4 vdsm kvm 4096 Oct 22 20:17 ..
>> 
>> 
>> It looks like it was able to mount the directory and create a bunch
>> of files and directories owned by vdsm:kvm.
>> 
>> 
>> So after all this, I was stuck with a Storage domain that wasn't
>> assigned to my data center. When I tried to attach it to my Data
>> Center, I got another error:
>> 
>> 
>> 
>> 
>> "Failed to attach Storage Domains to Data Center dcc. (User:
>> admin at internal)"
>> 
>> 
>> And I saw this in engine.log:
>> 
>> 
>> 
>> 2012-10-22 21:30:53,788 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (pool-3-thread-50) [4eaa9670] Failed in CreateStoragePoolVDS method
>> 
>> 2012-10-22 21:30:53,789 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (pool-3-thread-50) [4eaa9670] Error code unexpected and error
>> message VDSGenericException: VDSErrorException: Failed to
>> CreateStoragePoolVDS, error = Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 2012-10-22 21:30:53,790 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (pool-3-thread-50) [4eaa9670] Command
>> org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand
>> return value
>> 
>> Class Name:
>> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
>> 
>> mStatus Class Name:
>> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
>> 
>> mCode 661
>> 
>> mMessage Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 2012-10-22 21:30:53,791 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
>> (pool-3-thread-50) [4eaa9670] Vds: mechis3
>> 
>> 2012-10-22 21:30:53,792 ERROR
>> [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-50)
>> [4eaa9670] Command CreateStoragePoolVDS execution failed. Exception:
>> VDSErrorException: VDSGenericException: VDSErrorException: Failed to
>> CreateStoragePoolVDS, error = Cannot acquire host id:
>> ('b97019e9-bd43-46d8-afd0-421d6768271b', SanlockException(19,
>> 'Sanlock lockspace add failure', 'No such device'))
>> 
>> 2012-10-22 21:30:53,793 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand]
>> (pool-3-thread-50) [4eaa9670] FINISH, CreateStoragePoolVDSCommand,
>> log id: 4015ca0d
>> 
>> 
>> This all looks familiar - as does the vdsm.log file (not repeated).
>> 
>> 
>> Now, my system is in a different state. It now shows that the storage
>> domain is associated with my Data Center (if I click on the data
>> center in the ui and look at the storage tab below, I see that the
>> nfs storage domain is listed with this data center. I also see that
>> it reports its status in the data center as "locked". I don't see
>> any way to "unlock" it, although I suspect that if I did, I'd get
>> the same error as above (SanlockException).
>> 
>> 
>> If I try to destroy/delete the storage domain, I get an error that
>> says that I can't destroy the master storage domain.
> 
> Make sure your destruction request is included in the logs. This issue seems unrelated to the previous one, make sure you're not using a pool/mounted pool while trying to destroy the
> 
>> 
>> 
>> So how do I get out of this mess?
>> 
>> 
>> As to versions, I see the following ovirt packages when I dump the
>> ovirt version info for my ovirt-engine system:
>> 
>> 
>> 
>> 
>> 
>> ovirt-engine.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-backend.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-cli.noarch 3.2.0.5-1.20121015.git4189352.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-config.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-dbscripts.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-genericapi.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-notification-service.noarch
>> 
>> 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-restapi.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-sdk.noarch 3.2.0.2-1.20120927.git663b765.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-setup.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-tools-common.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-userportal.noarch 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-engine-webadmin-portal.noarch
>> 3.1.0-3.1345126685.git7649eed.fc17
>> 
>> @ovirt-nightly
>> 
>> ovirt-image-uploader.noarch 3.1.0-0.git9c42c8.fc17 @ovirt-stable
>> 
>> ovirt-iso-uploader.noarch 3.1.0-0.git1841d9.fc17 @ovirt-stable
>> 
>> ovirt-log-collector.noarch 3.1.0-0.git10d719.fc17 @ovirt-stable
>> 
>> ovirt-release-fedora.noarch 4-2 @/ovirt-release-fedora.noarch
>> 
>> 
>> This is a few of the packages on my vm host:
>> 
>> 
>> 
>> libvirt.x86_64 0.9.11.5-3.fc17 @updates
>> 
>> 
>> libvirt-client.x86_64 0.9.11.5-3.fc17 @updates
>> libvirt-daemon.x86_64 0.9.11.5-3.fc17 @updates
>> libvirt-daemon-config-network.x86_64 0.9.11.5-3.fc17 @updates
>> libvirt-daemon-config-nwfilter.x86_64 0.9.11.5-3.fc17 @updates
>> libvirt-lock-sanlock.x86_64 0.9.11.5-3.fc17 @updates
>> libvirt-python.x86_64 0.9.11.5-3.fc17 @updates
>> 
>> sanlock.x86_64 2.4-2.fc17 @updates
>> sanlock-lib.x86_64 2.4-2.fc17 @updates
>> sanlock-python.x86_64 2.4-2.fc17 @updates
>> 
>> 
>> vdsm.x86_64 4.10.0-10.fc17 @updates
>> 
>> vdsm-cli.noarch 4.10.0-10.fc17 @updates
>> 
>> vdsm-python.x86_64 4.10.0-10.fc17 @updates
>> 
>> vdsm-xmlrpc.noarch 4.10.0-10.fc17 @updates
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121023/a5ce65dc/attachment-0001.html>


More information about the Users mailing list