[Users] Gluster volume options change, crash, vdsm reports SD does not exist

Steve Dainard sdainard at miovision.com
Thu Mar 13 16:08:59 EDT 2014


I was playing with some gluster performance options on my storage volume
and managed to crash it.

On recovery (and heal) my storage domain won't come up.

I'm getting these errors in vdsm.log:

Thread-24::WARNING::2014-03-13
15:51:18,741::lvm::391::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['
 Volume group "47df1e9f-6a83-46b9-a7ec-9568abd10d1a" not found', '
 Skipping volume group 47df1e9f-6a83-46b9-a7ec-9568abd10d1a']
Thread-24::DEBUG::2014-03-13
15:51:18,741::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload
operation' released the operation mutex
Thread-24::ERROR::2014-03-13
15:51:18,748::sdc::143::Storage.StorageDomainCache::(_findDomain) domain
47df1e9f-6a83-46b9-a7ec-9568abd10d1a not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
('47df1e9f-6a83-46b9-a7ec-9568abd10d1a',)
Thread-24::ERROR::2014-03-13
15:51:18,748::domainMonitor::231::Storage.DomainMonitorThread::(_monitorDomain)
Error while collecting domain 47df1e9f-6a83-46b9-a7ec-9568abd10d1a
monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 196, in
_monitorDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist:
('47df1e9f-6a83-46b9-a7ec-9568abd10d1a',)

I destroyed (UI) the storage domain
(47df1e9f-6a83-46b9-a7ec-9568abd10d1a was an iso domain) but still get the
same errors. I've tried re-installing the hosts without change. Is there a
local cache vdsm builds that it might be referencing that I can clear out?

I'm only showing one SD in the db which is my gluster SD:

engine=# select * from storage_domain_static;
                  id                  |               storage
 | storage_name | storage_domain_type | storage_type |
storage_domain_format_type |         _create_date          |
_update_dat
e          | recoverable | last_time_used_as_master | storage_description |
storage_comment
--------------------------------------+--------------------------------------+--------------+---------------------+--------------+----------------------------+-------------------------------+--------------------
-----------+-------------+--------------------------+---------------------+-----------------
 0de3b516-6c74-4ad8-8958-d3f571ceda8d |
e329ad38-b5d8-47a3-ac01-3dfdd5193032 | rep2         |                   0 |
           7 | 3                          | 2014-02-23 16:20:42.327492-05 |
2014-02-23 16:20:49
.094751-05 | t           |                        0 |                     |
(1 row)

engine=# select * from storage_domain_;
storage_domain_dynamic      storage_domain_file_repos
storage_domain_static       storage_domain_static_view
engine=# select * from storage_domain_dynamic;
                  id                  | available_disk_size |
used_disk_size |         _update_date
--------------------------------------+---------------------+----------------+-------------------------------
 0de3b516-6c74-4ad8-8958-d3f571ceda8d |                1887 |
 160 | 2014-03-12 01:59:52.258384-04
(1 row)


[root at ovirt001 vdsm]# vdsClient -s 0 getStorageDomainsList

[root at ovirt001 vdsm]#

engine.log contains this repeating:

TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654,
mMessage=Not SPM]]

2014-03-13 16:02:03,538 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] HostName = ovirt001
2014-03-13 16:02:03,538 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] Command
HSMGetAllTasksStatusesVDS execution failed. Exception:
IRSNonOperationalException: IRSGenericException: IRSErrorException:
IRSNonOperationalException: Not SPM
2014-03-13 16:02:03,539 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, SpmStopVDSCommand,
log id: 41fe5668
2014-03-13 16:02:03,539 INFO
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] spm stop succeeded,
continuing with spm selection
2014-03-13 16:02:03,560 INFO
 [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] starting spm on vds ovirt002,
storage pool IT, prevId 1, LVER 7
2014-03-13 16:02:03,561 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] START,
SpmStartVDSCommand(HostName = ovirt002, HostId =
fac716fc-baff-46fe-9323-fd581a26a983, storagePoolId =
8da661c0-8125-4efb-851e-c9320d268578, prevId=1, prevLVER=7,
storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=true), log id:
53385b6b
2014-03-13 16:02:03,572 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling started:
taskId = 0fb302dd-32ff-4442-aa98-5eecaf571da4
2014-03-13 16:02:05,598 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] Failed in HSMGetTaskStatusVDS
method
2014-03-13 16:02:05,599 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] Error code
AcquireHostIdFailure and error message VDSGenericException:
VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire
host id
2014-03-13 16:02:05,599 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling ended:
taskId = 0fb302dd-32ff-4442-aa98-5eecaf571da4 task status = finished
2014-03-13 16:02:05,600 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] Start SPM Task failed -
result: cleanSuccess, message: VDSGenericException: VDSErrorException:
Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id
2014-03-13 16:02:05,606 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] spmStart polling ended, spm
status: Free
2014-03-13 16:02:05,607 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] START,
HSMClearTaskVDSCommand(HostName = ovirt002, HostId =
fac716fc-baff-46fe-9323-fd581a26a983,
taskId=0fb302dd-32ff-4442-aa98-5eecaf571da4), log id: 6e65ac77
2014-03-13 16:02:05,613 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] FINISH,
HSMClearTaskVDSCommand, log id: 6e65ac77
2014-03-13 16:02:05,613 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
(DefaultQuartzScheduler_Worker-21) [4f649953] FINISH, SpmStartVDSCommand,
return:
org.ovirt.engine.core.common.businessentities.SpmStatusResult at 798917ef, log
id: 53385b6b
2014-03-13 16:02:05,615 INFO
 [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand]
(DefaultQuartzScheduler_Worker-21) [17828a3f] Running command:
SetStoragePoolStatusCommand internal: true. Entities affected :  ID:
8da661c0-8125-4efb-851e-c9320d268578 Type: StoragePool
2014-03-13 16:02:05,656 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
(DefaultQuartzScheduler_Worker-21) [17828a3f]
IrsBroker::Failed::GetStoragePoolInfoVDS due to:
IrsSpmStartFailedException: IRSGenericException: IRSErrorException:
SpmStart failed
2014-03-13 16:02:15,674 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand]
(DefaultQuartzScheduler_Worker-39) [7b210b3d] Command
org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand
return value

TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654,
mMessage=Not SPM]]


Any help appreciated. vdsm.log from both hosts and engine.log attached.


*Steve Dainard *
IT Infrastructure Manager
Miovision <http://miovision.com/> | *Rethink Traffic*

*Blog <http://miovision.com/blog>  |  **LinkedIn
<https://www.linkedin.com/company/miovision-technologies>  |  Twitter
<https://twitter.com/miovision>  |  Facebook
<https://www.facebook.com/miovision>*
------------------------------
 Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON,
Canada | N2C 1L3
This e-mail may contain information that is privileged or confidential. If
you are not the intended recipient, please delete the e-mail and any
attachments and notify us immediately.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140313/fe3fe532/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.tar.gz
Type: application/x-gzip
Size: 2187849 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140313/fe3fe532/attachment-0001.gz>


More information about the Users mailing list