[Users] Local storage domain fails to attach after host reboot

Patrick Hurrelmann patrick.hurrelmann at lobster.de
Fri Jan 25 17:13:05 UTC 2013


On 24.01.2013 18:05, Patrick Hurrelmann wrote:
> Hi list,
> 
> after rebooting one host (single host dc with local storage) the local
> storage domain can't be attached again. The host was set to maintenance
> mode and all running vms were shutdown prior the reboot.
> 
> Vdsm keeps logging the following errors:
> 
> Thread-1266::ERROR::2013-01-24
> 17:51:46,042::task::853::TaskManager.Task::(_setError)
> Task=`a0c11f61-8bcf-4f76-9923-43e8b9cc1424`::Unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 861, in _run
>     return fn(*args, **kargs)
>   File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 817, in connectStoragePool
>     return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID,
> masterVersion, options)
>   File "/usr/share/vdsm/storage/hsm.py", line 859, in _connectStoragePool
>     res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 641, in connect
>     self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 1109, in __rebuild
>     self.masterDomain = self.getMasterDomain(msdUUID=msdUUID,
> masterVersion=masterVersion)
>   File "/usr/share/vdsm/storage/sp.py", line 1448, in getMasterDomain
>     raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)
> StoragePoolMasterNotFound: Cannot find master domain:
> 'spUUID=c9b86219-0d51-44c3-a7de-e0fe07e2c9e6,
> msdUUID=00ed91f3-43be-41be-8c05-f3786588a1ad'
> 
> and
> 
> Thread-1268::ERROR::2013-01-24
> 17:51:49,073::task::853::TaskManager.Task::(_setError)
> Task=`95b7f58b-afe0-47bd-9ebd-21d3224f5165`::Unexpected error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 861, in _run
>     return fn(*args, **kargs)
>   File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 528, in getSpmStatus
>     pool = self.getPool(spUUID)
>   File "/usr/share/vdsm/storage/hsm.py", line 265, in getPool
>     raise se.StoragePoolUnknown(spUUID)
> StoragePoolUnknown: Unknown pool id, pool not connected:
> ('c9b86219-0d51-44c3-a7de-e0fe07e2c9e6',)
> 
> while engine logs:
> 
> 2013-01-24 17:51:46,050 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase]
> (QuartzScheduler_Worker-43) [49026692] Command
> org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand
> return value
>  Class Name:
> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
> mStatus                       Class Name:
> org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
> mCode                         304
> mMessage                      Cannot find master domain:
> 'spUUID=c9b86219-0d51-44c3-a7de-e0fe07e2c9e6,
> msdUUID=00ed91f3-43be-41be-8c05-f3786588a1ad'
> 
> 
> Vdsm and engine logs are also attached. I set the affected host back to
> maintenance. How can I recover from this and attach the storage domain
> again? If more information is needed, please do not hesitate to request it.
> 
> This is on CentOS 6.3 using Dreyou's rpms. Installed versions on host:
> 
> vdsm.x86_64                                 4.10.0-0.44.14.el6
> vdsm-cli.noarch                             4.10.0-0.44.14.el6
> vdsm-python.x86_64                          4.10.0-0.44.14.el6
> vdsm-xmlrpc.noarch                          4.10.0-0.44.14.el6
> 
> Engine:
> 
> ovirt-engine.noarch                         3.1.0-3.19.el6
> ovirt-engine-backend.noarch                 3.1.0-3.19.el6
> ovirt-engine-cli.noarch                     3.1.0.7-1.el6
> ovirt-engine-config.noarch                  3.1.0-3.19.el6
> ovirt-engine-dbscripts.noarch               3.1.0-3.19.el6
> ovirt-engine-genericapi.noarch              3.1.0-3.19.el6
> ovirt-engine-jbossas711.x86_64              1-0
> ovirt-engine-notification-service.noarch    3.1.0-3.19.el6
> ovirt-engine-restapi.noarch                 3.1.0-3.19.el6
> ovirt-engine-sdk.noarch                     3.1.0.5-1.el6
> ovirt-engine-setup.noarch                   3.1.0-3.19.el6
> ovirt-engine-tools-common.noarch            3.1.0-3.19.el6
> ovirt-engine-userportal.noarch              3.1.0-3.19.el6
> ovirt-engine-webadmin-portal.noarch         3.1.0-3.19.el6
> ovirt-image-uploader.noarch                 3.1.0-16.el6
> ovirt-iso-uploader.noarch                   3.1.0-16.el6
> ovirt-log-collector.noarch                  3.1.0-16.el6
> 
> 
> Thanks and regards
> Patrick

Ok, managed to solve it. I force removed the datacenter and reinstalled
the host. I added a new local storage to it and re-created the vms (disk
images were moved and renamed from old non working local storage).

So this host is up an running again.

Regards
Patrick


-- 
Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg

HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich



More information about the Users mailing list