[Users] "Volume Group does not exist". Blame device-mapper ?

Itamar Heim iheim at redhat.com
Wed Jan 29 12:36:25 UTC 2014


On 01/29/2014 02:35 PM, Nicolas Ecarnot wrote:
> Le 29/01/2014 13:29, Maor Lipchuk a écrit :
>> Hi Nicolas,
>>
>> Can u please attach the VDSM logs of the problematic nodes and valid
>> nodes, the engine log and also the sanlock log.
>>
>> You wrote that many nodes suddenly began to become
>> unresponsive,
>> Do you mean that the hosts switched to non-responsive status in the
>> engine?
>> I'm asking that because non-responsive status indicate that the engine
>> could not communicate with the hosts, it could be related to sanlock
>> since if the host encountered a problem to write to the master domain it
>> causes sanlock to restart VDSM and make the hosts non responsive.

non-resposneive for engine is if vdsm is up/responsive.
run locally;
# vdsClient -s 0 getVdsCaps

to check vdsm is ok

>>
>> regards,
>> Maor
>
> It will be hard work to provide these logs but I will try asap.
> But to answer your question : the engine saw the failing nodes as
> unresponsive, but I was always fully able to ping them and ssh-log on them.
>
> Is there some place I could read further doc about sanlock?
>
> Nicolas Ecarnot
>
>>
>> On 01/27/2014 09:26 AM, Nicolas Ecarnot wrote:
>>> Le 26/01/2014 23:23, Itamar Heim a écrit :
>>>> On 01/20/2014 12:06 PM, Nicolas Ecarnot wrote:
>>>>> Hi,
>>>>>
>>>>> oVirt 3.3, no big issue since the recent snapshot joke, but all in all
>>>>> running fine.
>>>>>
>>>>> All my VM are stored in a iSCSI SAN. The VM usually are using only one
>>>>> or two disks (1: system, 2: data) and it is OK.
>>>>>
>>>>> Friday, I created a new LUN. Inside a VM, I linked to it via iscsiadm
>>>>> and successfully login to the Lun (session, automatic attach on boot,
>>>>> read, write) : nice.
>>>>>
>>>>> Then after detaching it and shuting down the MV, and for the first
>>>>> time,
>>>>> I tried to make use of the feature "direct attach" to attach the disk
>>>>> directly from oVirt, login the session via oVirt.
>>>>> I connected nice and I saw the disk appear in my VM as /dev/sda or
>>>>> whatever. I was able to mount it, read and write.
>>>>>
>>>>> Then disaster stoke all this : many nodes suddenly began to become
>>>>> unresponsive, quickly migrating their VM to the remaining nodes.
>>>>> Hopefully, the migrations ran fine and I lost no VM nor downtime,
>>>>> but I
>>>>> had to reboot every concerned node (other actions failed).
>>>>>
>>>>> In the failing nodes, /var/log/messages showed the log you can read in
>>>>> the end of this message.
>>>>> I first get device-mapper warnings, then the host unable to
>>>>> collaborate
>>>>> with the logical volumes.
>>>>>
>>>>> The 3 volumes are the three main storage domains, perfectly up and
>>>>> running where I store my oVirt VMs.
>>>>>
>>>>> My reflexions :
>>>>> - I'm not sure device-mapper is to blame. I frequently see device
>>>>> mapper
>>>>> complaining and nothing is getting worse (not oVirt specifically)
>>>>> - I have not change my network settings for months (bonding,
>>>>> linking...)
>>>>> The only new factor is the usage of direct attach LUN.
>>>>> - This morning I was able to reproduce the bug, just by trying again
>>>>> this attachement, and booting the VM. No mounting of the LUN, just VM
>>>>> booting, waiting, and this is enough to crash oVirt.
>>>>> - when the disaster happens, usually, amongst the nodes, only three
>>>>> nodes gets stroke, the only one that run VMs. Obviously, after
>>>>> migration, different nodes are hosting the VMs, and those new nodes
>>>>> are
>>>>> the one that then get stroke.
>>>>>
>>>>> This is quite reproductible.
>>>>>
>>>>> And frightening.
>>>>>
>>>>>
>>>>> The log :
>>>>>
>>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: table: 253:36:
>>>>> multipath: error getting device
>>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: ioctl: error
>>>>> adding
>>>>> target to table
>>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: table: 253:36:
>>>>> multipath: error getting device
>>>>> Jan 20 10:20:45 serv-vm-adm11 kernel: device-mapper: ioctl: error
>>>>> adding
>>>>> target to table
>>>>> Jan 20 10:20:47 serv-vm-adm11 vdsm TaskManager.Task ERROR
>>>>> Task=`847653e6-8b23-4429-ab25-257538b35293`::Unexpected
>>>>> error#012Traceback (most recent call last):#012  File
>>>>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>>>>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line
>>>>> 45, in
>>>>> wrapper#012    res = f(*args, **kwargs)#012  File
>>>>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>>>>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line
>>>>> 333,
>>>>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>>>>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py",
>>>>> line
>>>>> 52, in getRealDomain#012    return
>>>>> self._cache._realProduce(self._sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012
>>>>> domain =
>>>>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>>>>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>>>>> return
>>>>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012
>>>>> File
>>>>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>>>>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>>>>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>>>>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid:
>>>>> %s"
>>>>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>>>>> ('vg_uuid: 1429ffe2-4137-416c-bb38-63fd73f4bcc1',)
>>>>> Jan 20 10:20:47 serv-vm-adm11 ¿<11>vdsm vm.Vm ERROR
>>>>> vmId=`2c0bbb51-0f94-4bf1-9579-4e897260f88e`::Unable to update the
>>>>> volume
>>>>> 80bac371-6899-4fbe-a8e1-272037186bfb (domain:
>>>>> 1429ffe2-4137-416c-bb38-63fd73f4bcc1 image:
>>>>> a5995c25-cdc9-4499-b9b4-08394a38165c) for the drive vda
>>>>> Jan 20 10:20:48 serv-vm-adm11 vdsm TaskManager.Task ERROR
>>>>> Task=`886e07bd-637b-4286-8a44-08dce5c8b207`::Unexpected
>>>>> error#012Traceback (most recent call last):#012  File
>>>>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>>>>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line
>>>>> 45, in
>>>>> wrapper#012    res = f(*args, **kwargs)#012  File
>>>>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>>>>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line
>>>>> 333,
>>>>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>>>>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py",
>>>>> line
>>>>> 52, in getRealDomain#012    return
>>>>> self._cache._realProduce(self._sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012
>>>>> domain =
>>>>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>>>>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>>>>> return
>>>>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012
>>>>> File
>>>>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>>>>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>>>>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>>>>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid:
>>>>> %s"
>>>>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>>>>> ('vg_uuid: 1429ffe2-4137-416c-bb38-63fd73f4bcc1',)
>>>>> Jan 20 10:20:48 serv-vm-adm11 ¿<11>vdsm vm.Vm ERROR
>>>>> vmId=`2c0bbb51-0f94-4bf1-9579-4e897260f88e`::Unable to update the
>>>>> volume
>>>>> ea9c8f12-4eb6-42de-b6d6-6296555d0ac0 (domain:
>>>>> 1429ffe2-4137-416c-bb38-63fd73f4bcc1 image:
>>>>> f42e0c9d-ad1b-4337-b82c-92914153ff44) for the drive vdb
>>>>> Jan 20 10:21:03 serv-vm-adm11 vdsm TaskManager.Task ERROR
>>>>> Task=`27bb14f9-0cd1-4316-95b0-736d162d5681`::Unexpected
>>>>> error#012Traceback (most recent call last):#012  File
>>>>> "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return
>>>>> fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line
>>>>> 45, in
>>>>> wrapper#012    res = f(*args, **kwargs)#012  File
>>>>> "/usr/share/vdsm/storage/hsm.py", line 3053, in getVolumeSize#012
>>>>> volUUID, bs=1))#012  File "/usr/share/vdsm/storage/volume.py", line
>>>>> 333,
>>>>> in getVSize#012    mysd = sdCache.produce(sdUUID=sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/sdc.py", line 98, in produce#012
>>>>> domain.getRealDomain()#012  File "/usr/share/vdsm/storage/sdc.py",
>>>>> line
>>>>> 52, in getRealDomain#012    return
>>>>> self._cache._realProduce(self._sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce#012
>>>>> domain =
>>>>> self._findDomain(sdUUID)#012  File "/usr/share/vdsm/storage/sdc.py",
>>>>> line 141, in _findDomain#012    dom = findMethod(sdUUID)#012  File
>>>>> "/usr/share/vdsm/storage/blockSD.py", line 1288, in findDomain#012
>>>>> return
>>>>> BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))#012
>>>>> File
>>>>> "/usr/share/vdsm/storage/blockSD.py", line 414, in __init__#012
>>>>> lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))#012
>>>>> File "/usr/share/vdsm/storage/lvm.py", line 976, in
>>>>> checkVGBlockSizes#012    raise se.VolumeGroupDoesNotExist("vg_uuid:
>>>>> %s"
>>>>> % vgUUID)#012VolumeGroupDoesNotExist: Volume Group does not exist:
>>>>> ('vg_uuid: 83d39199-d4e4-474c-b232-7088c76a2811',)
>>>>>
>>>>>
>>>>>
>>>>
>>>> was this diagnosed/resolved?
>>>
>>> - Diagnosed : I discovered no further deeper way to diagnose this issue
>>> - Resolved : I found nor received no further way to solve it.
>>>
>>
>
>




More information about the Users mailing list