[Users] Gluster volume options change, crash, vdsm reports SD does not exist

I was playing with some gluster performance options on my storage volume and managed to crash it. On recovery (and heal) my storage domain won't come up. I'm getting these errors in vdsm.log: Thread-24::WARNING::2014-03-13 15:51:18,741::lvm::391::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "47df1e9f-6a83-46b9-a7ec-9568abd10d1a" not found', ' Skipping volume group 47df1e9f-6a83-46b9-a7ec-9568abd10d1a'] Thread-24::DEBUG::2014-03-13 15:51:18,741::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-24::ERROR::2014-03-13 15:51:18,748::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 47df1e9f-6a83-46b9-a7ec-9568abd10d1a not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('47df1e9f-6a83-46b9-a7ec-9568abd10d1a',) Thread-24::ERROR::2014-03-13 15:51:18,748::domainMonitor::231::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 47df1e9f-6a83-46b9-a7ec-9568abd10d1a monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 196, in _monitorDomain self.domain = sdCache.produce(self.sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 98, in produce domain.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('47df1e9f-6a83-46b9-a7ec-9568abd10d1a',) I destroyed (UI) the storage domain (47df1e9f-6a83-46b9-a7ec-9568abd10d1a was an iso domain) but still get the same errors. I've tried re-installing the hosts without change. Is there a local cache vdsm builds that it might be referencing that I can clear out? I'm only showing one SD in the db which is my gluster SD: engine=# select * from storage_domain_static; id | storage | storage_name | storage_domain_type | storage_type | storage_domain_format_type | _create_date | _update_dat e | recoverable | last_time_used_as_master | storage_description | storage_comment --------------------------------------+--------------------------------------+--------------+---------------------+--------------+----------------------------+-------------------------------+-------------------- -----------+-------------+--------------------------+---------------------+----------------- 0de3b516-6c74-4ad8-8958-d3f571ceda8d | e329ad38-b5d8-47a3-ac01-3dfdd5193032 | rep2 | 0 | 7 | 3 | 2014-02-23 16:20:42.327492-05 | 2014-02-23 16:20:49 .094751-05 | t | 0 | | (1 row) engine=# select * from storage_domain_; storage_domain_dynamic storage_domain_file_repos storage_domain_static storage_domain_static_view engine=# select * from storage_domain_dynamic; id | available_disk_size | used_disk_size | _update_date --------------------------------------+---------------------+----------------+------------------------------- 0de3b516-6c74-4ad8-8958-d3f571ceda8d | 1887 | 160 | 2014-03-12 01:59:52.258384-04 (1 row) [root@ovirt001 vdsm]# vdsClient -s 0 getStorageDomainsList [root@ovirt001 vdsm]# engine.log contains this repeating: TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654, mMessage=Not SPM]] 2014-03-13 16:02:03,538 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] HostName = ovirt001 2014-03-13 16:02:03,538 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Command HSMGetAllTasksStatusesVDS execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM 2014-03-13 16:02:03,539 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, SpmStopVDSCommand, log id: 41fe5668 2014-03-13 16:02:03,539 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spm stop succeeded, continuing with spm selection 2014-03-13 16:02:03,560 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] starting spm on vds ovirt002, storage pool IT, prevId 1, LVER 7 2014-03-13 16:02:03,561 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] START, SpmStartVDSCommand(HostName = ovirt002, HostId = fac716fc-baff-46fe-9323-fd581a26a983, storagePoolId = 8da661c0-8125-4efb-851e-c9320d268578, prevId=1, prevLVER=7, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=true), log id: 53385b6b 2014-03-13 16:02:03,572 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling started: taskId = 0fb302dd-32ff-4442-aa98-5eecaf571da4 2014-03-13 16:02:05,598 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Failed in HSMGetTaskStatusVDS method 2014-03-13 16:02:05,599 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2014-03-13 16:02:05,599 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling ended: taskId = 0fb302dd-32ff-4442-aa98-5eecaf571da4 task status = finished 2014-03-13 16:02:05,600 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2014-03-13 16:02:05,606 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling ended, spm status: Free 2014-03-13 16:02:05,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] START, HSMClearTaskVDSCommand(HostName = ovirt002, HostId = fac716fc-baff-46fe-9323-fd581a26a983, taskId=0fb302dd-32ff-4442-aa98-5eecaf571da4), log id: 6e65ac77 2014-03-13 16:02:05,613 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, HSMClearTaskVDSCommand, log id: 6e65ac77 2014-03-13 16:02:05,613 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, SpmStartVDSCommand, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@798917ef, log id: 53385b6b 2014-03-13 16:02:05,615 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-21) [17828a3f] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 8da661c0-8125-4efb-851e-c9320d268578 Type: StoragePool 2014-03-13 16:02:05,656 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-21) [17828a3f] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed 2014-03-13 16:02:15,674 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-39) [7b210b3d] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand return value TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654, mMessage=Not SPM]] Any help appreciated. vdsm.log from both hosts and engine.log attached. *Steve Dainard * IT Infrastructure Manager Miovision <http://miovision.com/> | *Rethink Traffic* *Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>* ------------------------------ Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.
participants (1)
-
Steve Dainard