[Users] Local storage domain fails to attach after host reboot

Hi list, after rebooting one host (single host dc with local storage) the local storage domain can't be attached again. The host was set to maintenance mode and all running vms were shutdown prior the reboot. Vdsm keeps logging the following errors: Thread-1266::ERROR::2013-01-24 17:51:46,042::task::853::TaskManager.Task::(_setError) Task=`a0c11f61-8bcf-4f76-9923-43e8b9cc1424`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 817, in connectStoragePool return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 859, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 641, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1109, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1448, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=c9b86219-0d51-44c3-a7de-e0fe07e2c9e6, msdUUID=00ed91f3-43be-41be-8c05-f3786588a1ad' and Thread-1268::ERROR::2013-01-24 17:51:49,073::task::853::TaskManager.Task::(_setError) Task=`95b7f58b-afe0-47bd-9ebd-21d3224f5165`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 528, in getSpmStatus pool = self.getPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 265, in getPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('c9b86219-0d51-44c3-a7de-e0fe07e2c9e6',) while engine logs: 2013-01-24 17:51:46,050 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-43) [49026692] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 304 mMessage Cannot find master domain: 'spUUID=c9b86219-0d51-44c3-a7de-e0fe07e2c9e6, msdUUID=00ed91f3-43be-41be-8c05-f3786588a1ad' Vdsm and engine logs are also attached. I set the affected host back to maintenance. How can I recover from this and attach the storage domain again? If more information is needed, please do not hesitate to request it. This is on CentOS 6.3 using Dreyou's rpms. Installed versions on host: vdsm.x86_64 4.10.0-0.44.14.el6 vdsm-cli.noarch 4.10.0-0.44.14.el6 vdsm-python.x86_64 4.10.0-0.44.14.el6 vdsm-xmlrpc.noarch 4.10.0-0.44.14.el6 Engine: ovirt-engine.noarch 3.1.0-3.19.el6 ovirt-engine-backend.noarch 3.1.0-3.19.el6 ovirt-engine-cli.noarch 3.1.0.7-1.el6 ovirt-engine-config.noarch 3.1.0-3.19.el6 ovirt-engine-dbscripts.noarch 3.1.0-3.19.el6 ovirt-engine-genericapi.noarch 3.1.0-3.19.el6 ovirt-engine-jbossas711.x86_64 1-0 ovirt-engine-notification-service.noarch 3.1.0-3.19.el6 ovirt-engine-restapi.noarch 3.1.0-3.19.el6 ovirt-engine-sdk.noarch 3.1.0.5-1.el6 ovirt-engine-setup.noarch 3.1.0-3.19.el6 ovirt-engine-tools-common.noarch 3.1.0-3.19.el6 ovirt-engine-userportal.noarch 3.1.0-3.19.el6 ovirt-engine-webadmin-portal.noarch 3.1.0-3.19.el6 ovirt-image-uploader.noarch 3.1.0-16.el6 ovirt-iso-uploader.noarch 3.1.0-16.el6 ovirt-log-collector.noarch 3.1.0-16.el6 Thanks and regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich

On 24.01.2013 18:05, Patrick Hurrelmann wrote:
Hi list,
after rebooting one host (single host dc with local storage) the local storage domain can't be attached again. The host was set to maintenance mode and all running vms were shutdown prior the reboot.
Vdsm keeps logging the following errors:
Thread-1266::ERROR::2013-01-24 17:51:46,042::task::853::TaskManager.Task::(_setError) Task=`a0c11f61-8bcf-4f76-9923-43e8b9cc1424`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 817, in connectStoragePool return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 859, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 641, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1109, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1448, in getMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID) StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=c9b86219-0d51-44c3-a7de-e0fe07e2c9e6, msdUUID=00ed91f3-43be-41be-8c05-f3786588a1ad'
and
Thread-1268::ERROR::2013-01-24 17:51:49,073::task::853::TaskManager.Task::(_setError) Task=`95b7f58b-afe0-47bd-9ebd-21d3224f5165`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 528, in getSpmStatus pool = self.getPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 265, in getPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('c9b86219-0d51-44c3-a7de-e0fe07e2c9e6',)
while engine logs:
2013-01-24 17:51:46,050 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-43) [49026692] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 304 mMessage Cannot find master domain: 'spUUID=c9b86219-0d51-44c3-a7de-e0fe07e2c9e6, msdUUID=00ed91f3-43be-41be-8c05-f3786588a1ad'
Vdsm and engine logs are also attached. I set the affected host back to maintenance. How can I recover from this and attach the storage domain again? If more information is needed, please do not hesitate to request it.
This is on CentOS 6.3 using Dreyou's rpms. Installed versions on host:
vdsm.x86_64 4.10.0-0.44.14.el6 vdsm-cli.noarch 4.10.0-0.44.14.el6 vdsm-python.x86_64 4.10.0-0.44.14.el6 vdsm-xmlrpc.noarch 4.10.0-0.44.14.el6
Engine:
ovirt-engine.noarch 3.1.0-3.19.el6 ovirt-engine-backend.noarch 3.1.0-3.19.el6 ovirt-engine-cli.noarch 3.1.0.7-1.el6 ovirt-engine-config.noarch 3.1.0-3.19.el6 ovirt-engine-dbscripts.noarch 3.1.0-3.19.el6 ovirt-engine-genericapi.noarch 3.1.0-3.19.el6 ovirt-engine-jbossas711.x86_64 1-0 ovirt-engine-notification-service.noarch 3.1.0-3.19.el6 ovirt-engine-restapi.noarch 3.1.0-3.19.el6 ovirt-engine-sdk.noarch 3.1.0.5-1.el6 ovirt-engine-setup.noarch 3.1.0-3.19.el6 ovirt-engine-tools-common.noarch 3.1.0-3.19.el6 ovirt-engine-userportal.noarch 3.1.0-3.19.el6 ovirt-engine-webadmin-portal.noarch 3.1.0-3.19.el6 ovirt-image-uploader.noarch 3.1.0-16.el6 ovirt-iso-uploader.noarch 3.1.0-16.el6 ovirt-log-collector.noarch 3.1.0-16.el6
Thanks and regards Patrick
Ok, managed to solve it. I force removed the datacenter and reinstalled the host. I added a new local storage to it and re-created the vms (disk images were moved and renamed from old non working local storage). So this host is up an running again. Regards Patrick -- Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg HRB 178831, Amtsgericht München Geschäftsführer: Dr. Martin Fischer, Rolf Henrich
participants (1)
-
Patrick Hurrelmann