I was playing with some gluster performance options on my storage volume and managed to crash it.

On recovery (and heal) my storage domain won't come up.

I'm getting these errors in vdsm.log:

Thread-24::WARNING::2014-03-13 15:51:18,741::lvm::391::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['  Volume group "47df1e9f-6a83-46b9-a7ec-9568abd10d1a" not found', '  Skipping volume group 47df1e9f-6a83-46b9-a7ec-9568abd10d1a']
Thread-24::DEBUG::2014-03-13 15:51:18,741::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-24::ERROR::2014-03-13 15:51:18,748::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 47df1e9f-6a83-46b9-a7ec-9568abd10d1a not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('47df1e9f-6a83-46b9-a7ec-9568abd10d1a',)
Thread-24::ERROR::2014-03-13 15:51:18,748::domainMonitor::231::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 47df1e9f-6a83-46b9-a7ec-9568abd10d1a monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 196, in _monitorDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('47df1e9f-6a83-46b9-a7ec-9568abd10d1a',)

I destroyed (UI) the storage domain (47df1e9f-6a83-46b9-a7ec-9568abd10d1a was an iso domain) but still get the same errors. I've tried re-installing the hosts without change. Is there a local cache vdsm builds that it might be referencing that I can clear out?

I'm only showing one SD in the db which is my gluster SD:

engine=# select * from storage_domain_static;
                  id                  |               storage                | storage_name | storage_domain_type | storage_type | storage_domain_format_type |         _create_date          |         _update_dat
e          | recoverable | last_time_used_as_master | storage_description | storage_comment 
--------------------------------------+--------------------------------------+--------------+---------------------+--------------+----------------------------+-------------------------------+--------------------
-----------+-------------+--------------------------+---------------------+-----------------
 0de3b516-6c74-4ad8-8958-d3f571ceda8d | e329ad38-b5d8-47a3-ac01-3dfdd5193032 | rep2         |                   0 |            7 | 3                          | 2014-02-23 16:20:42.327492-05 | 2014-02-23 16:20:49
.094751-05 | t           |                        0 |                     | 
(1 row)

engine=# select * from storage_domain_;
storage_domain_dynamic      storage_domain_file_repos   storage_domain_static       storage_domain_static_view  
engine=# select * from storage_domain_dynamic;
                  id                  | available_disk_size | used_disk_size |         _update_date          
--------------------------------------+---------------------+----------------+-------------------------------
 0de3b516-6c74-4ad8-8958-d3f571ceda8d |                1887 |            160 | 2014-03-12 01:59:52.258384-04
(1 row)


[root@ovirt001 vdsm]# vdsClient -s 0 getStorageDomainsList

[root@ovirt001 vdsm]# 

engine.log contains this repeating:

TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654, mMessage=Not SPM]]

2014-03-13 16:02:03,538 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] HostName = ovirt001
2014-03-13 16:02:03,538 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Command HSMGetAllTasksStatusesVDS execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM
2014-03-13 16:02:03,539 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, SpmStopVDSCommand, log id: 41fe5668
2014-03-13 16:02:03,539 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spm stop succeeded, continuing with spm selection
2014-03-13 16:02:03,560 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] starting spm on vds ovirt002, storage pool IT, prevId 1, LVER 7
2014-03-13 16:02:03,561 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] START, SpmStartVDSCommand(HostName = ovirt002, HostId = fac716fc-baff-46fe-9323-fd581a26a983, storagePoolId = 8da661c0-8125-4efb-851e-c9320d268578, prevId=1, prevLVER=7, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=true), log id: 53385b6b
2014-03-13 16:02:03,572 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling started: taskId = 0fb302dd-32ff-4442-aa98-5eecaf571da4
2014-03-13 16:02:05,598 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Failed in HSMGetTaskStatusVDS method
2014-03-13 16:02:05,599 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id
2014-03-13 16:02:05,599 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling ended: taskId = 0fb302dd-32ff-4442-aa98-5eecaf571da4 task status = finished
2014-03-13 16:02:05,600 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id
2014-03-13 16:02:05,606 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling ended, spm status: Free
2014-03-13 16:02:05,607 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] START, HSMClearTaskVDSCommand(HostName = ovirt002, HostId = fac716fc-baff-46fe-9323-fd581a26a983, taskId=0fb302dd-32ff-4442-aa98-5eecaf571da4), log id: 6e65ac77
2014-03-13 16:02:05,613 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, HSMClearTaskVDSCommand, log id: 6e65ac77
2014-03-13 16:02:05,613 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, SpmStartVDSCommand, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@798917ef, log id: 53385b6b
2014-03-13 16:02:05,615 INFO  [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (DefaultQuartzScheduler_Worker-21) [17828a3f] Running command: SetStoragePoolStatusCommand internal: true. Entities affected :  ID: 8da661c0-8125-4efb-851e-c9320d268578 Type: StoragePool
2014-03-13 16:02:05,656 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-21) [17828a3f] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
2014-03-13 16:02:15,674 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-39) [7b210b3d] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand return value 
 
TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654, mMessage=Not SPM]]


Any help appreciated. vdsm.log from both hosts and engine.log attached.


Steve Dainard 
IT Infrastructure Manager
Miovision | Rethink Traffic

Blog  |  LinkedIn  |  Twitter  |  Facebook

Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3
This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.