[Users] SD Disk's Logical Volume not visible/activated on some nodes

Boyan Tabakov blade at alslayer.net
Tue Mar 4 13:53:24 UTC 2014


On Tue Mar  4 14:46:33 2014, Nir Soffer wrote:
> ----- Original Message -----
>> From: "Nir Soffer" <nsoffer at redhat.com>
>> To: "Boyan Tabakov" <blade at alslayer.net>
>> Cc: users at ovirt.org, "Zdenek Kabelac" <zkabelac at redhat.com>
>> Sent: Monday, March 3, 2014 9:39:47 PM
>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>>
>> Hi Zdenek, can you look into this strange incident?
>>
>> When user creates a disk on one host (create a new lv), the lv is not seen
>> on another host in the cluster.
>>
>> Calling multipath -r cause the new lv to appear on the other host.
>>
>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
>>
>> ----- Original Message -----
>>> From: "Boyan Tabakov" <blade at alslayer.net>
>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>> Cc: users at ovirt.org
>>> Sent: Monday, March 3, 2014 9:51:05 AM
>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some
>>> nodes
>>>>>>>>> Consequently, when creating/booting
>>>>>>>>> a VM with the said disk attached, the VM fails to start on host2,
>>>>>>>>> because host2 can't see the LV. Similarly, if the VM is started on
>>>>>>>>> host1, it fails to migrate to host2. Extract from host2 log is in
>>>>>>>>> the
>>>>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
>>>>>>>>>
>>>>>>>>> As far as I could track quickly the vdsm code, there is only call to
>>>>>>>>> lvs
>>>>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
>>>>>>>>> refresh.
>>>>
>>>> lvs should see any change on the shared storage.
>>>>
>>>>>>>>> The only workaround so far has been to restart VDSM on host2, which
>>>>>>>>> makes it refresh all LVM data properly.
>>>>
>>>> When vdsm starts, it calls multipath -r, which ensure that we see all
>>>> physical volumes.
>>>>
>>>>>>>>>
>>>>>>>>> When is host2 supposed to pick up any newly created LVs in the SD
>>>>>>>>> VG?
>>>>>>>>> Any suggestions where the problem might be?
>>>>>>>>
>>>>>>>> When you create a new lv on the shared storage, the new lv should be
>>>>>>>> visible on the other host. Lets start by verifying that you do see
>>>>>>>> the new lv after a disk was created.
>>>>>>>>
>>>>>>>> Try this:
>>>>>>>>
>>>>>>>> 1. Create a new disk, and check the disk uuid in the engine ui
>>>>>>>> 2. On another machine, run this command:
>>>>>>>>
>>>>>>>> lvs -o vg_name,lv_name,tags
>>>>>>>>
>>>>>>>> You can identify the new lv using tags, which should contain the new
>>>>>>>> disk
>>>>>>>> uuid.
>>>>>>>>
>>>>>>>> If you don't see the new lv from the other host, please provide
>>>>>>>> /var/log/messages
>>>>>>>> and /var/log/sanlock.log.
>>>>>>>
>>>>>>> Just tried that. The disk is not visible on the non-SPM node.
>>>>>>
>>>>>> This means that storage is not accessible from this host.
>>>>>
>>>>> Generally, the storage seems accessible ok. For example, if I restart
>>>>> the vdsmd, all volumes get picked up correctly (become visible in lvs
>>>>> output and VMs can be started with them).
>>>>
>>>> Lests repeat this test, but now, if you do not see the new lv, please
>>>> run:
>>>>
>>>>     multipath -r
>>>>
>>>> And report the results.
>>>>
>>>
>>> Running multipath -r helped and the disk was properly picked up by the
>>> second host.
>>>
>>> Is running multipath -r safe while host is not in maintenance mode?
>>
>> It should be safe, vdsm uses in some cases.
>>
>>> If yes, as a temporary workaround I can patch vdsmd to run multipath -r
>>> when e.g. monitoring the storage domain.
>>
>> I suggested running multipath as debugging aid; normally this is not needed.
>>
>> You should see lv on the shared storage without running multipath.
>>
>> Zdenek, can you explain this?
>>
>>>>> One warning that I keep seeing in vdsm logs on both nodes is this:
>>>>>
>>>>> Thread-1617881::WARNING::2014-02-24
>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
>>>>>  critical size: mdasize=134217728 mdafree=0
>>>>
>>>> Can you share the output of the command bellow?
>>>>
>>>>     lvs -o
>>>>     uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>
>>> Here's the output for both hosts.
>>>
>>> host1:
>>> [root at host1 ~]# lvs -o
>>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
>>>   LV UUID                                LV
>>>       Attr      LSize   VFree   Ext     #Ext  Free  LV Tags
>>>
>>> VMdaSize  VMdaFree  #LV #PV
>>>   jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao---   2.00g 114.62g 128.00m
>>> 1596   917
>>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
>>>    128.00m        0   13   2
>>
>> This looks wrong - your vg_mda_free is zero - as vdsm complains.
>>
>> Zdenek, how can we debug this further?
>
> I see same issue in Fedora 19.
>
> Can you share with us the output of:
>
> cat /etc/redhat-release
> uname -a
> lvm version
>
> Nir

$ cat /etc/redhat-release
Fedora release 19 (Schrödinger’s Cat)
$ uname -a
Linux blizzard.mgmt.futurice.com 3.12.6-200.fc19.x86_64.debug #1 SMP 
Mon Dec 23 16:24:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ lvm version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.02.77 (2012-10-15)
  Driver version:  4.26.0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 268 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140304/e3d0491f/attachment-0001.sig>


More information about the Users mailing list