[Users] "Volume Group does not exist". Blame device-mapper ?

Nicolas Ecarnot nicolas at ecarnot.net
Wed Jan 29 12:35:09 UTC 2014


Le 29/01/2014 13:29, Maor Lipchuk a écrit :
> Hi Nicolas,
>
> Can u please attach the VDSM logs of the problematic nodes and valid
> nodes, the engine log and also the sanlock log.
>
> You wrote that many nodes suddenly began to become
> unresponsive,
> Do you mean that the hosts switched to non-responsive status in the engine?
> I'm asking that because non-responsive status indicate that the engine
> could not communicate with the hosts, it could be related to sanlock
> since if the host encountered a problem to write to the master domain it
> causes sanlock to restart VDSM and make the hosts non responsive.
>
> regards,
> Maor

It will be hard work to provide these logs but I will try asap.
But to answer your question : the engine saw the failing nodes as 
unresponsive, but I was always fully able to ping them and ssh-log on them.

Is there some place I could read further doc about sanlock?

Nicolas Ecarnot

>
> On 01/27/2014 09:26 AM, Nicolas Ecarnot wrote:
>> Le 26/01/2014 23:23, Itamar Heim a écrit :
>>> On 01/20/2014 12:06 PM, Nicolas Ecarnot wrote:
>>>> Hi,
>>>>
>>>> oVirt 3.3, no big issue since the recent snapshot joke, but all in all
>>>> running fine.
>>>>
>>>> All my VM are stored in a iSCSI SAN. The VM usually are using only one
>>>> or two disks (1: system, 2: data) and it is OK.
>>>>
>>>> Friday, I created a new LUN. Inside a VM, I linked to it via iscsiadm
>>>> and successfully login to the Lun (session, automatic attach on boot,
>>>> read, write) : nice.
>>>>
>>>> Then after detaching it and shuting down the MV, and for the first time,
>>>> I tried to make use of the feature "direct attach" to attach the disk
>>>> directly from oVirt, login the session via oVirt.
>>>> I connected nice and I saw the disk appear in my VM as /dev/sda or
>>>> whatever. I was able to mount it, read and write.
>>>>
>>>> Then disaster stoke all this : many nodes suddenly began to become
>>>> unresponsive, quickly migrating their VM to the remaining nodes.
>>>> Hopefully, the migrations ran fine and I lost no VM nor downtime, but I
>>>> had to reboot every concerned node (other actions failed).
>>>>
>>>> In the failing nodes, /var/log/messages showed the log you can read in
>>>> the end of this message.
>>>> I first get device-mapper warnings, then the host unable to collaborate
>>>> with the logical volumes.
>>>>
>>>> The 3 volumes are the three main storage domains, perfectly up and
>>>> running where I store my oVirt VMs.
>>>>
>>>> My reflexions :
>>>> - I'm not sure device-mapper is to blame. I frequently see device mapper
>>>> complaining and nothing is getting worse (not oVirt specifically)
>>>> - I have not change my network settings for months (bonding, linking...)
>>>> The only new factor is the usage of direct attach LUN.
>>>> - This morning I was able to reproduce the bug, just by trying again
>>>> this attachement, and booting the VM. No mounting of the LUN, just VM
>>>> booting, waiting, and this is enough to crash oVirt.
>>>> - when the disaster happens, usually, amongst the nodes, only three
>>>> nodes gets stroke, the only one that run VMs. Obviously, after
>>>> migration, different nodes are hosting the VMs, and those new nodes are
>>>> the one that then get stroke.
>>>>
>>>> This is quite reproductible.
>>>>
>>>> And frightening.
>>>>
>>>>
>>>> The log :
>>>>
>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: table: 253:36:
>>>> multipath: error getting device
>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: ioctl: error adding
>>>> target to table
>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: table: 253:36:
>>>> multipath: error getting device
>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: ioctl: error adding
>>>> target to table
>>>> Jan 20 10:20:47 serv-vm-adm11 vdsm TaskManager.Task ERROR
>>>> Task=`847653e6-8b23-4429-ab25-257538b35293`::Unexpected
>>>> error#012Traceback (most recent call last):#012  File
>>>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>>>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in
>>>> wrapper#012    res = f(*args, **kwargs)#012  File
>>>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>>>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line 333,
>>>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>>>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py", line
>>>> 52, in getRealDomain#012    return
>>>> self._cache._realProduce(self._sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012 domain =
>>>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>>>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>>>> return
>>>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012  File
>>>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>>>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>>>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>>>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid: %s"
>>>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>>>> ('vg_uuid: 1429ffe2-4137-416c-bb38-63fd73f4bcc1',)
>>>> Jan 20 10:20:47 serv-vm-adm11 ¿<11>vdsm vm.Vm ERROR
>>>> vmId=`2c0bbb51-0f94-4bf1-9579-4e897260f88e`::Unable to update the volume
>>>> 80bac371-6899-4fbe-a8e1-272037186bfb (domain:
>>>> 1429ffe2-4137-416c-bb38-63fd73f4bcc1 image:
>>>> a5995c25-cdc9-4499-b9b4-08394a38165c) for the drive vda
>>>> Jan 20 10:20:48 serv-vm-adm11 vdsm TaskManager.Task ERROR
>>>> Task=`886e07bd-637b-4286-8a44-08dce5c8b207`::Unexpected
>>>> error#012Traceback (most recent call last):#012  File
>>>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>>>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in
>>>> wrapper#012    res = f(*args, **kwargs)#012  File
>>>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>>>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line 333,
>>>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>>>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py", line
>>>> 52, in getRealDomain#012    return
>>>> self._cache._realProduce(self._sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012 domain =
>>>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>>>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>>>> return
>>>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012  File
>>>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>>>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>>>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>>>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid: %s"
>>>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>>>> ('vg_uuid: 1429ffe2-4137-416c-bb38-63fd73f4bcc1',)
>>>> Jan 20 10:20:48 serv-vm-adm11 ¿<11>vdsm vm.Vm ERROR
>>>> vmId=`2c0bbb51-0f94-4bf1-9579-4e897260f88e`::Unable to update the volume
>>>> ea9c8f12-4eb6-42de-b6d6-6296555d0ac0 (domain:
>>>> 1429ffe2-4137-416c-bb38-63fd73f4bcc1 image:
>>>> f42e0c9d-ad1b-4337-b82c-92914153ff44) for the drive vdb
>>>> Jan 20 10:21:03 serv-vm-adm11 vdsm TaskManager.Task ERROR
>>>> Task=`27bb14f9-0cd1-4316-95b0-736d162d5681`::Unexpected
>>>> error#012Traceback (most recent call last):#012  File
>>>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>>>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in
>>>> wrapper#012    res = f(*args, **kwargs)#012  File
>>>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>>>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line 333,
>>>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>>>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py", line
>>>> 52, in getRealDomain#012    return
>>>> self._cache._realProduce(self._sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012 domain =
>>>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>>>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>>>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>>>> return
>>>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012  File
>>>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>>>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>>>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>>>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid: %s"
>>>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>>>> ('vg_uuid: 83d39199-d4e4-474c-b232-7088c76a2811',)
>>>>
>>>>
>>>>
>>>
>>> was this diagnosed/resolved?
>>
>> - Diagnosed : I discovered no further deeper way to diagnose this issue
>> - Resolved : I found nor received no further way to solve it.
>>
>


-- 
Nicolas Ecarnot



More information about the Users mailing list