[Users] SD Disk's Logical Volume not visible/activated on some nodes
Boyan Tabakov
blade at alslayer.net
Wed Mar 5 14:44:51 UTC 2014
On 5.3.2014, 16:01, Nir Soffer wrote:
> ----- Original Message -----
>> From: "Boyan Tabakov" <blade at alslayer.net>
>> To: "Nir Soffer" <nsoffer at redhat.com>
>> Cc: users at ovirt.org
>> Sent: Wednesday, March 5, 2014 3:38:25 PM
>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>>
>> Hello Nir,
>>
>> On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
>>> ----- Original Message -----
>>>> From: "Boyan Tabakov" <blade at alslayer.net>
>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>> Cc: users at ovirt.org
>>>> Sent: Tuesday, March 4, 2014 3:53:24 PM
>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
>>>> some nodes
>>>>
>>>> On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Nir Soffer" <nsoffer at redhat.com>
>>>>>> To: "Boyan Tabakov" <blade at alslayer.net>
>>>>>> Cc: users at ovirt.org, "Zdenek Kabelac" <zkabelac at redhat.com>
>>>>>> Sent: Monday, March 3, 2014 9:39:47 PM
>>>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
>>>>>> some nodes
>>>>>>
>>>>>> Hi Zdenek, can you look into this strange incident?
>>>>>>
>>>>>> When user creates a disk on one host (create a new lv), the lv is not
>>>>>> seen
>>>>>> on another host in the cluster.
>>>>>>
>>>>>> Calling multipath -r cause the new lv to appear on the other host.
>>>>>>
>>>>>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but
>>>>>> unusual.
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Boyan Tabakov" <blade at alslayer.net>
>>>>>>> To: "Nir Soffer" <nsoffer at redhat.com>
>>>>>>> Cc: users at ovirt.org
>>>>>>> Sent: Monday, March 3, 2014 9:51:05 AM
>>>>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
>>>>>>> some
>>>>>>> nodes
>>>>>>>>>>>>> Consequently, when creating/booting
>>>>>>>>>>>>> a VM with the said disk attached, the VM fails to start on host2,
>>>>>>>>>>>>> because host2 can't see the LV. Similarly, if the VM is started
>>>>>>>>>>>>> on
>>>>>>>>>>>>> host1, it fails to migrate to host2. Extract from host2 log is in
>>>>>>>>>>>>> the
>>>>>>>>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As far as I could track quickly the vdsm code, there is only call
>>>>>>>>>>>>> to
>>>>>>>>>>>>> lvs
>>>>>>>>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
>>>>>>>>>>>>> refresh.
>>>>>>>>
>>>>>>>> lvs should see any change on the shared storage.
>>>>>>>>
>>>>>>>>>>>>> The only workaround so far has been to restart VDSM on host2,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> makes it refresh all LVM data properly.
>>>>>>>>
>>>>>>>> When vdsm starts, it calls multipath -r, which ensure that we see all
>>>>>>>> physical volumes.
>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> When is host2 supposed to pick up any newly created LVs in the SD
>>>>>>>>>>>>> VG?
>>>>>>>>>>>>> Any suggestions where the problem might be?
>>>>>>>>>>>>
>>>>>>>>>>>> When you create a new lv on the shared storage, the new lv should
>>>>>>>>>>>> be
>>>>>>>>>>>> visible on the other host. Lets start by verifying that you do see
>>>>>>>>>>>> the new lv after a disk was created.
>>>>>>>>>>>>
>>>>>>>>>>>> Try this:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Create a new disk, and check the disk uuid in the engine ui
>>>>>>>>>>>> 2. On another machine, run this command:
>>>>>>>>>>>>
>>>>>>>>>>>> lvs -o vg_name,lv_name,tags
>>>>>>>>>>>>
>>>>>>>>>>>> You can identify the new lv using tags, which should contain the
>>>>>>>>>>>> new
>>>>>>>>>>>> disk
>>>>>>>>>>>> uuid.
>>>>>>>>>>>>
>>>>>>>>>>>> If you don't see the new lv from the other host, please provide
>>>>>>>>>>>> /var/log/messages
>>>>>>>>>>>> and /var/log/sanlock.log.
>>>>>>>>>>>
>>>>>>>>>>> Just tried that. The disk is not visible on the non-SPM node.
>>>>>>>>>>
>>>>>>>>>> This means that storage is not accessible from this host.
>>>>>>>>>
>>>>>>>>> Generally, the storage seems accessible ok. For example, if I restart
>>>>>>>>> the vdsmd, all volumes get picked up correctly (become visible in lvs
>>>>>>>>> output and VMs can be started with them).
>>>>>>>>
>>>>>>>> Lests repeat this test, but now, if you do not see the new lv, please
>>>>>>>> run:
>>>>>>>>
>>>>>>>> multipath -r
>>>>>>>>
>>>>>>>> And report the results.
>>>>>>>>
>>>>>>>
>>>>>>> Running multipath -r helped and the disk was properly picked up by the
>>>>>>> second host.
>>>>>>>
>>>>>>> Is running multipath -r safe while host is not in maintenance mode?
>>>>>>
>>>>>> It should be safe, vdsm uses in some cases.
>>>>>>
>>>>>>> If yes, as a temporary workaround I can patch vdsmd to run multipath -r
>>>>>>> when e.g. monitoring the storage domain.
>>>>>>
>>>>>> I suggested running multipath as debugging aid; normally this is not
>>>>>> needed.
>>>>>>
>>>>>> You should see lv on the shared storage without running multipath.
>>>>>>
>>>>>> Zdenek, can you explain this?
>>>>>>
>>>>>>>>> One warning that I keep seeing in vdsm logs on both nodes is this:
>>>>>>>>>
>>>>>>>>> Thread-1617881::WARNING::2014-02-24
>>>>>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>>>>>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
>>>>>>>>> critical size: mdasize=134217728 mdafree=0
>>>>>>>>
>>>>>>>> Can you share the output of the command bellow?
>>>>>>>>
>>>>>>>> lvs -o
>>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>>
>>>>>>> Here's the output for both hosts.
>>>>>>>
>>>>>>> host1:
>>>>>>> [root at host1 ~]# lvs -o
>>>>>>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
>>>>>>> LV UUID LV
>>>>>>> Attr LSize VFree Ext #Ext Free LV Tags
>>>>>>>
>>>>>>> VMdaSize VMdaFree #LV #PV
>>>>>>> jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
>>>>>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m
>>>>>>> 1596 917
>>>>>>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
>>>>>>> 128.00m 0 13 2
>>>>>>
>>>>>> This looks wrong - your vg_mda_free is zero - as vdsm complains.
>>>
>>> Patch http://gerrit.ovirt.org/25408 should solve this issue.
>>>
>>> It may also solve the other issue with the missing lv - I could
>>> not reproduce it yet.
>>>
>>> Can you try to apply this patch and report the results?
>>>
>>> Thanks,
>>> Nir
>>
>> This patch helped, indeed! I tried it on the non-SPM node (as that's
>> the node that I can currently easily put in maintenance) and the node
>> started picking up newly created volumes correctly. I also set the
>> user_lvmetad to 0 in the main lvm.conf, because without it manually
>> running e.g. lvs was still using the metadata daemon.
>>
>> I can't confirm yet that this helps with the metadata volume warning,
>> as that warning appears only on the SPM. I'll be able to put the SPM
>> node in maintenance soon and will report later.
>>
>> This issue on Fedora makes me think - is Fedora still fully supported
>> platform?
>
> It is supported, but probably not tested properly.
>
> Nir
>
Alright! Thanks a lot for the help!
BR,
Boyan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 268 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140305/311ce2e8/attachment-0001.sig>
More information about the Users
mailing list