[Users] SD Disk's Logical Volume not visible/activated on some nodes
Boyan Tabakov
blade at alslayer.net
Wed Mar 5 13:38:25 UTC 2014
Hello Nir,
On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
> ----- Original Message -----
>> From: "Boyan Tabakov" <blade at alslayer.net>
>> To: "Nir Soffer" <nsoffer at redhat.com>
>> Cc: users at ovirt.org
>> Sent: Tuesday, March 4, 2014 3:53:24 PM
>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>>
>> On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
>>> ----- Original Message -----
>>>> From: "Nir Soffer" <nsoffer at redhat.com>
>>>> To: "Boyan Tabakov" <blade at alslayer.net>
>>>> Cc: users at ovirt.org, "Zdenek Kabelac" <zkabelac at redhat.com>
>>>> Sent: Monday, March 3, 2014 9:39:47 PM
>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
>>>> some nodes
>>>>
>>>> Hi Zdenek, can you look into this strange incident?
>>>>
>>>> When user creates a disk on one host (create a new lv), the lv is not seen
>>>> on another host in the cluster.
>>>>
>>>> Calling multipath -r cause the new lv to appear on the other host.
>>>>
>>>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but
>>>> unusual.
>>>>
>>>> ----- Original Message -----
>>>>> From: "Boyan Tabakov" <blade at alslayer.net>
>>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>>> Cc: users at ovirt.org
>>>>> Sent: Monday, March 3, 2014 9:51:05 AM
>>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
>>>>> some
>>>>> nodes
>>>>>>>>>>> Consequently, when creating/booting
>>>>>>>>>>> a VM with the said disk attached, the VM fails to start on host2,
>>>>>>>>>>> because host2 can't see the LV. Similarly, if the VM is started on
>>>>>>>>>>> host1, it fails to migrate to host2. Extract from host2 log is in
>>>>>>>>>>> the
>>>>>>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
>>>>>>>>>>>
>>>>>>>>>>> As far as I could track quickly the vdsm code, there is only call
>>>>>>>>>>> to
>>>>>>>>>>> lvs
>>>>>>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
>>>>>>>>>>> refresh.
>>>>>>
>>>>>> lvs should see any change on the shared storage.
>>>>>>
>>>>>>>>>>> The only workaround so far has been to restart VDSM on host2, which
>>>>>>>>>>> makes it refresh all LVM data properly.
>>>>>>
>>>>>> When vdsm starts, it calls multipath -r, which ensure that we see all
>>>>>> physical volumes.
>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When is host2 supposed to pick up any newly created LVs in the SD
>>>>>>>>>>> VG?
>>>>>>>>>>> Any suggestions where the problem might be?
>>>>>>>>>>
>>>>>>>>>> When you create a new lv on the shared storage, the new lv should be
>>>>>>>>>> visible on the other host. Lets start by verifying that you do see
>>>>>>>>>> the new lv after a disk was created.
>>>>>>>>>>
>>>>>>>>>> Try this:
>>>>>>>>>>
>>>>>>>>>> 1. Create a new disk, and check the disk uuid in the engine ui
>>>>>>>>>> 2. On another machine, run this command:
>>>>>>>>>>
>>>>>>>>>> lvs -o vg_name,lv_name,tags
>>>>>>>>>>
>>>>>>>>>> You can identify the new lv using tags, which should contain the new
>>>>>>>>>> disk
>>>>>>>>>> uuid.
>>>>>>>>>>
>>>>>>>>>> If you don't see the new lv from the other host, please provide
>>>>>>>>>> /var/log/messages
>>>>>>>>>> and /var/log/sanlock.log.
>>>>>>>>>
>>>>>>>>> Just tried that. The disk is not visible on the non-SPM node.
>>>>>>>>
>>>>>>>> This means that storage is not accessible from this host.
>>>>>>>
>>>>>>> Generally, the storage seems accessible ok. For example, if I restart
>>>>>>> the vdsmd, all volumes get picked up correctly (become visible in lvs
>>>>>>> output and VMs can be started with them).
>>>>>>
>>>>>> Lests repeat this test, but now, if you do not see the new lv, please
>>>>>> run:
>>>>>>
>>>>>> multipath -r
>>>>>>
>>>>>> And report the results.
>>>>>>
>>>>>
>>>>> Running multipath -r helped and the disk was properly picked up by the
>>>>> second host.
>>>>>
>>>>> Is running multipath -r safe while host is not in maintenance mode?
>>>>
>>>> It should be safe, vdsm uses in some cases.
>>>>
>>>>> If yes, as a temporary workaround I can patch vdsmd to run multipath -r
>>>>> when e.g. monitoring the storage domain.
>>>>
>>>> I suggested running multipath as debugging aid; normally this is not
>>>> needed.
>>>>
>>>> You should see lv on the shared storage without running multipath.
>>>>
>>>> Zdenek, can you explain this?
>>>>
>>>>>>> One warning that I keep seeing in vdsm logs on both nodes is this:
>>>>>>>
>>>>>>> Thread-1617881::WARNING::2014-02-24
>>>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>>>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
>>>>>>> critical size: mdasize=134217728 mdafree=0
>>>>>>
>>>>>> Can you share the output of the command bellow?
>>>>>>
>>>>>> lvs -o
>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>
>>>>> Here's the output for both hosts.
>>>>>
>>>>> host1:
>>>>> [root at host1 ~]# lvs -o
>>>>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
>>>>> LV UUID LV
>>>>> Attr LSize VFree Ext #Ext Free LV Tags
>>>>>
>>>>> VMdaSize VMdaFree #LV #PV
>>>>> jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
>>>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m
>>>>> 1596 917
>>>>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
>>>>> 128.00m 0 13 2
>>>>
>>>> This looks wrong - your vg_mda_free is zero - as vdsm complains.
>
> Patch http://gerrit.ovirt.org/25408 should solve this issue.
>
> It may also solve the other issue with the missing lv - I could
> not reproduce it yet.
>
> Can you try to apply this patch and report the results?
>
> Thanks,
> Nir
This patch helped, indeed! I tried it on the non-SPM node (as that's
the node that I can currently easily put in maintenance) and the node
started picking up newly created volumes correctly. I also set the
user_lvmetad to 0 in the main lvm.conf, because without it manually
running e.g. lvs was still using the metadata daemon.
I can't confirm yet that this helps with the metadata volume warning,
as that warning appears only on the SPM. I'll be able to put the SPM
node in maintenance soon and will report later.
This issue on Fedora makes me think - is Fedora still fully supported
platform?
Best regards,
Boyan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 268 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140305/0d05feaf/attachment-0001.sig>
More information about the Users
mailing list