Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes

5 Mar 2014


      ----- Original Message -----
...
From: "Boyan Tabakov" <blade@alslayer.net>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: users@ovirt.org
Sent: Wednesday, March 5, 2014 3:38:25 PM
Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello Nir,
On Wed Mar  5 14:37:17 2014, Nir Soffer wrote:
...
----- Original Message -----
...
From: "Boyan Tabakov" <blade@alslayer.net>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: users@ovirt.org
Sent: Tuesday, March 4, 2014 3:53:24 PM
Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
some nodes
On Tue Mar  4 14:46:33 2014, Nir Soffer wrote:
...
----- Original Message -----
...
From: "Nir Soffer" <nsoffer@redhat.com>
To: "Boyan Tabakov" <blade@alslayer.net>
Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com>
Sent: Monday, March 3, 2014 9:39:47 PM
Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not
seen
on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but
unusual.
----- Original Message -----
...
From: "Boyan Tabakov" <blade@alslayer.net>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: users@ovirt.org
Sent: Monday, March 3, 2014 9:51:05 AM
Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
some
nodes
>>>>>> Consequently, when creating/booting
>>>>>> a VM with the said disk attached, the VM fails to start on host2,
>>>>>> because host2 can't see the LV. Similarly, if the VM is started
>>>>>> on
>>>>>> host1, it fails to migrate to host2. Extract from host2 log is in
>>>>>> the
>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
>>>>>>
>>>>>> As far as I could track quickly the vdsm code, there is only call
>>>>>> to
>>>>>> lvs
>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
>>>>>> refresh.
>
> lvs should see any change on the shared storage.
>
>>>>>> The only workaround so far has been to restart VDSM on host2,
>>>>>> which
>>>>>> makes it refresh all LVM data properly.
>
> When vdsm starts, it calls multipath -r, which ensure that we see all
> physical volumes.
>
>>>>>>
>>>>>> When is host2 supposed to pick up any newly created LVs in the SD
>>>>>> VG?
>>>>>> Any suggestions where the problem might be?
>>>>>
>>>>> When you create a new lv on the shared storage, the new lv should
>>>>> be
>>>>> visible on the other host. Lets start by verifying that you do see
>>>>> the new lv after a disk was created.
>>>>>
>>>>> Try this:
>>>>>
>>>>> 1. Create a new disk, and check the disk uuid in the engine ui
>>>>> 2. On another machine, run this command:
>>>>>
>>>>> lvs -o vg_name,lv_name,tags
>>>>>
>>>>> You can identify the new lv using tags, which should contain the
>>>>> new
>>>>> disk
>>>>> uuid.
>>>>>
>>>>> If you don't see the new lv from the other host, please provide
>>>>> /var/log/messages
>>>>> and /var/log/sanlock.log.
>>>>
>>>> Just tried that. The disk is not visible on the non-SPM node.
>>>
>>> This means that storage is not accessible from this host.
>>
>> Generally, the storage seems accessible ok. For example, if I restart
>> the vdsmd, all volumes get picked up correctly (become visible in lvs
>> output and VMs can be started with them).
>
> Lests repeat this test, but now, if you do not see the new lv, please
> run:
>
>     multipath -r
>
> And report the results.
>
Running multipath -r helped and the disk was properly picked up by the
second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
...
If yes, as a temporary workaround I can patch vdsmd to run multipath -r
when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not
needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
...
>> One warning that I keep seeing in vdsm logs on both nodes is this:
>>
>> Thread-1617881::WARNING::2014-02-24
>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
>>  critical size: mdasize=134217728 mdafree=0
>
> Can you share the output of the command bellow?
>
>     lvs -o
>     uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1:
[root@host1 ~]# lvs -o
uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
  LV UUID                                LV
      Attr      LSize   VFree   Ext     #Ext  Free  LV Tags
VMdaSize  VMdaFree  #LV #PV
  jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao---   2.00g 114.62g 128.00m
1596   917
IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
   128.00m        0   13   2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch http://gerrit.ovirt.org/25408 should solve this issue.
It may also solve the other issue with the missing lv - I could
not reproduce it yet.
Can you try to apply this patch and report the results?
Thanks,
Nir
This patch helped, indeed! I tried it on the non-SPM node (as that's
the node that I can currently easily put in maintenance) and the node
started picking up newly created volumes correctly. I also set the
user_lvmetad to 0 in the main lvm.conf, because without it manually
running e.g. lvs was still using the metadata daemon.
I can't confirm yet that this helps with the metadata volume warning,
as that warning appears only on the SPM. I'll be able to put the SPM
node in maintenance soon and will report later.
This issue on Fedora makes me think - is Fedora still fully supported
platform?
It is supported, but probably not tested properly.

Nir