Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not seen
on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
----- Original Message -----
From: "Boyan Tabakov" <blade(a)alslayer.net>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: users(a)ovirt.org
Sent: Monday, March 3, 2014 9:51:05 AM
Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>>>>>> Consequently, when creating/booting
>>>>>> a VM with the said disk attached, the VM fails to start on
host2,
>>>>>> because host2 can't see the LV. Similarly, if the VM is
started on
>>>>>> host1, it fails to migrate to host2. Extract from host2 log is
in the
>>>>>> end. The LV in question is
6b35673e-7062-4716-a6c8-d5bf72fe3280.
>>>>>>
>>>>>> As far as I could track quickly the vdsm code, there is only
call to
>>>>>> lvs
>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
refresh.
>
> lvs should see any change on the shared storage.
>
>>>>>> The only workaround so far has been to restart VDSM on host2,
which
>>>>>> makes it refresh all LVM data properly.
>
> When vdsm starts, it calls multipath -r, which ensure that we see all
> physical volumes.
>
>>>>>>
>>>>>> When is host2 supposed to pick up any newly created LVs in the
SD VG?
>>>>>> Any suggestions where the problem might be?
>>>>>
>>>>> When you create a new lv on the shared storage, the new lv should
be
>>>>> visible on the other host. Lets start by verifying that you do see
>>>>> the new lv after a disk was created.
>>>>>
>>>>> Try this:
>>>>>
>>>>> 1. Create a new disk, and check the disk uuid in the engine ui
>>>>> 2. On another machine, run this command:
>>>>>
>>>>> lvs -o vg_name,lv_name,tags
>>>>>
>>>>> You can identify the new lv using tags, which should contain the
new
>>>>> disk
>>>>> uuid.
>>>>>
>>>>> If you don't see the new lv from the other host, please provide
>>>>> /var/log/messages
>>>>> and /var/log/sanlock.log.
>>>>
>>>> Just tried that. The disk is not visible on the non-SPM node.
>>>
>>> This means that storage is not accessible from this host.
>>
>> Generally, the storage seems accessible ok. For example, if I restart
>> the vdsmd, all volumes get picked up correctly (become visible in lvs
>> output and VMs can be started with them).
>
> Lests repeat this test, but now, if you do not see the new lv, please
> run:
>
> multipath -r
>
> And report the results.
>
Running multipath -r helped and the disk was properly picked up by the
second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath
-r
when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
>> One warning that I keep seeing in vdsm logs on both nodes is
this:
>>
>> Thread-1617881::WARNING::2014-02-24
>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
>> critical size: mdasize=134217728 mdafree=0
>
> Can you share the output of the command bellow?
>
> lvs -o
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1:
[root@host1 ~]# lvs -o
uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
LV UUID LV
Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV
jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m
1596 917
IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Zdenek, how can we debug this further?
Nir