[Users] "Volume Group does not exist". Blame device-mapper ?

Nicolas Ecarnot nicolas at ecarnot.net
Mon Jan 27 07:26:31 UTC 2014


Le 26/01/2014 23:23, Itamar Heim a écrit :
> On 01/20/2014 12:06 PM, Nicolas Ecarnot wrote:
>> Hi,
>>
>> oVirt 3.3, no big issue since the recent snapshot joke, but all in all
>> running fine.
>>
>> All my VM are stored in a iSCSI SAN. The VM usually are using only one
>> or two disks (1: system, 2: data) and it is OK.
>>
>> Friday, I created a new LUN. Inside a VM, I linked to it via iscsiadm
>> and successfully login to the Lun (session, automatic attach on boot,
>> read, write) : nice.
>>
>> Then after detaching it and shuting down the MV, and for the first time,
>> I tried to make use of the feature "direct attach" to attach the disk
>> directly from oVirt, login the session via oVirt.
>> I connected nice and I saw the disk appear in my VM as /dev/sda or
>> whatever. I was able to mount it, read and write.
>>
>> Then disaster stoke all this : many nodes suddenly began to become
>> unresponsive, quickly migrating their VM to the remaining nodes.
>> Hopefully, the migrations ran fine and I lost no VM nor downtime, but I
>> had to reboot every concerned node (other actions failed).
>>
>> In the failing nodes, /var/log/messages showed the log you can read in
>> the end of this message.
>> I first get device-mapper warnings, then the host unable to collaborate
>> with the logical volumes.
>>
>> The 3 volumes are the three main storage domains, perfectly up and
>> running where I store my oVirt VMs.
>>
>> My reflexions :
>> - I'm not sure device-mapper is to blame. I frequently see device mapper
>> complaining and nothing is getting worse (not oVirt specifically)
>> - I have not change my network settings for months (bonding, linking...)
>> The only new factor is the usage of direct attach LUN.
>> - This morning I was able to reproduce the bug, just by trying again
>> this attachement, and booting the VM. No mounting of the LUN, just VM
>> booting, waiting, and this is enough to crash oVirt.
>> - when the disaster happens, usually, amongst the nodes, only three
>> nodes gets stroke, the only one that run VMs. Obviously, after
>> migration, different nodes are hosting the VMs, and those new nodes are
>> the one that then get stroke.
>>
>> This is quite reproductible.
>>
>> And frightening.
>>
>>
>> The log :
>>
>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: table: 253:36:
>> multipath: error getting device
>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: ioctl: error adding
>> target to table
>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: table: 253:36:
>> multipath: error getting device
>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: ioctl: error adding
>> target to table
>> Jan 20 10:20:47 serv-vm-adm11 vdsm TaskManager.Task ERROR
>> Task=`847653e6-8b23-4429-ab25-257538b35293`::Unexpected
>> error#012Traceback (most recent call last):#012  File
>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in
>> wrapper#012    res = f(*args, **kwargs)#012  File
>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line 333,
>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py", line
>> 52, in getRealDomain#012    return
>> self._cache._realProduce(self._sdUUID)#012  File
>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012 domain =
>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>> return
>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012  File
>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid: %s"
>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>> ('vg_uuid: 1429ffe2-4137-416c-bb38-63fd73f4bcc1',)
>> Jan 20 10:20:47 serv-vm-adm11 ¿<11>vdsm vm.Vm ERROR
>> vmId=`2c0bbb51-0f94-4bf1-9579-4e897260f88e`::Unable to update the volume
>> 80bac371-6899-4fbe-a8e1-272037186bfb (domain:
>> 1429ffe2-4137-416c-bb38-63fd73f4bcc1 image:
>> a5995c25-cdc9-4499-b9b4-08394a38165c) for the drive vda
>> Jan 20 10:20:48 serv-vm-adm11 vdsm TaskManager.Task ERROR
>> Task=`886e07bd-637b-4286-8a44-08dce5c8b207`::Unexpected
>> error#012Traceback (most recent call last):#012  File
>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in
>> wrapper#012    res = f(*args, **kwargs)#012  File
>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line 333,
>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py", line
>> 52, in getRealDomain#012    return
>> self._cache._realProduce(self._sdUUID)#012  File
>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012 domain =
>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>> return
>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012  File
>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid: %s"
>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>> ('vg_uuid: 1429ffe2-4137-416c-bb38-63fd73f4bcc1',)
>> Jan 20 10:20:48 serv-vm-adm11 ¿<11>vdsm vm.Vm ERROR
>> vmId=`2c0bbb51-0f94-4bf1-9579-4e897260f88e`::Unable to update the volume
>> ea9c8f12-4eb6-42de-b6d6-6296555d0ac0 (domain:
>> 1429ffe2-4137-416c-bb38-63fd73f4bcc1 image:
>> f42e0c9d-ad1b-4337-b82c-92914153ff44) for the drive vdb
>> Jan 20 10:21:03 serv-vm-adm11 vdsm TaskManager.Task ERROR
>> Task=`27bb14f9-0cd1-4316-95b0-736d162d5681`::Unexpected
>> error#012Traceback (most recent call last):#012  File
>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in
>> wrapper#012    res = f(*args, **kwargs)#012  File
>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line 333,
>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py", line
>> 52, in getRealDomain#012    return
>> self._cache._realProduce(self._sdUUID)#012  File
>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012 domain =
>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>> return
>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012  File
>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid: %s"
>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>> ('vg_uuid: 83d39199-d4e4-474c-b232-7088c76a2811',)
>>
>>
>>
>
> was this diagnosed/resolved?

- Diagnosed : I discovered no further deeper way to diagnose this issue
- Resolved : I found nor received no further way to solve it.

-- 
Nicolas Ecarnot



More information about the Users mailing list