This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--jRwgP1nGWsjExEUgLS6FSLvHqevHtQK23
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
> From: "Nir Soffer" <nsoffer(a)redhat.com>
> To: "Boyan Tabakov" <blade(a)alslayer.net>
> Cc: users(a)ovirt.org, "Zdenek Kabelac" <zkabelac(a)redhat.com>
> Sent: Monday, March 3, 2014 9:39:47 PM
> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on=
some nodes
>
> Hi Zdenek, can you look into this strange incident?
>
> When user creates a disk on one host (create a new lv), the lv is not =
seen
> on another host in the cluster.
>
> Calling multipath -r cause the new lv to appear on the other host.
>
> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but u=
nusual.
>
> ----- Original Message -----
>> From: "Boyan Tabakov" <blade(a)alslayer.net>
>> To: "Nir Soffer" <nsoffer(a)redhat.com>
>> Cc: users(a)ovirt.org
>> Sent: Monday, March 3, 2014 9:51:05 AM
>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated o=
n some
>> nodes
>>>>>>>> Consequently, when creating/booting
>>>>>>>> a VM with the said disk attached, the VM fails to start
on host=
2,
>>>>>>>> because host2 can't see the LV.
Similarly, if the VM is started=
on
>>>>>>>> host1, it fails to migrate to host2.
Extract from host2 log is =
in
>>>>>>>> the
>>>>>>>> end. The LV in question is
6b35673e-7062-4716-a6c8-d5bf72fe3280=
=2E
>>>>>>>>
>>>>>>>> As far as I could track quickly the vdsm code, there is
only ca=
ll to
>>>>>>>> lvs
>>>>>>>> and not to lvscan or lvchange so the host2 LVM
doesn't fully
>>>>>>>> refresh.
>>>
>>> lvs should see any change on the shared storage.
>>>
>>>>>>>> The only workaround so far has been to restart VDSM on
host2, w=
hich
>>>>>>>> makes it refresh all LVM data
properly.
>>>
>>> When vdsm starts, it calls multipath -r, which ensure that we see al=
l
>>> physical volumes.
>>>
>>>>>>>>
>>>>>>>> When is host2 supposed to pick up any newly created LVs
in the =
SD
>>>>>>>> VG?
>>>>>>>> Any suggestions where the problem might be?
>>>>>>>
>>>>>>> When you create a new lv on the shared storage, the new lv
shoul=
d be
>>>>>>> visible on the other host. Lets start by
verifying that you do s=
ee
>>>>>>> the new lv after a disk was created.
>>>>>>>
>>>>>>> Try this:
>>>>>>>
>>>>>>> 1. Create a new disk, and check the disk uuid in the engine
ui
>>>>>>> 2. On another machine, run this command:
>>>>>>>
>>>>>>> lvs -o vg_name,lv_name,tags
>>>>>>>
>>>>>>> You can identify the new lv using tags, which should contain
the=
new
>>>>>>> disk
>>>>>>> uuid.
>>>>>>>
>>>>>>> If you don't see the new lv from the other host, please
provide
>>>>>>> /var/log/messages
>>>>>>> and /var/log/sanlock.log.
>>>>>>
>>>>>> Just tried that. The disk is not visible on the non-SPM node.
>>>>>
>>>>> This means that storage is not accessible from this host.
>>>>
>>>> Generally, the storage seems accessible ok. For example, if I resta=
rt
>>>> the vdsmd, all volumes get picked up correctly
(become visible in l=
vs
>>>> output and VMs can be started with them).
>>>
>>> Lests repeat this test, but now, if you do not see the new lv, pleas=
e
>>> run:
>>>
>>> multipath -r
>>>
>>> And report the results.
>>>
>>
>> Running multipath -r helped and the disk was properly picked up by th=
e
>> second host.
>>
>> Is running multipath -r safe while host is not in maintenance mode?
>
> It should be safe, vdsm uses in some cases.
>
>> If yes, as a temporary workaround I can patch vdsmd to run multipath =
-r
>> when e.g. monitoring the storage domain.
>
> I suggested running multipath as debugging aid; normally this is not n=
eeded.
>
> You should see lv on the shared storage without running multipath.
>
> Zdenek, can you explain this?
>
>>>> One warning that I keep seeing in vdsm logs on both nodes is this:
>>>>
>>>> Thread-1617881::WARNING::2014-02-24
>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
>>>> critical size: mdasize=3D134217728 mdafree=3D0
>>>
>>> Can you share the output of the command bellow?
>>>
>>> lvs -o
>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tag=
s,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>
>> Here's the output for both hosts.
>>
>> host1:
>> [root@host1 ~]# lvs -o
>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_co=
unt,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
>> LV UUID LV
>> Attr LSize VFree Ext #Ext Free LV Tags
>>
>> VMdaSize VMdaFree #LV #PV
>> jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00=
m
>> 1596 917
>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c=
8b-ac73492ac465
>> 128.00m 0 13 2
>
> This looks wrong - your vg_mda_free is zero - as vdsm complains.
>
> Zdenek, how can we debug this further?
I see same issue in Fedora 19.
Can you share with us the output of:
cat /etc/redhat-release
uname -a
lvm version
Nir
$ cat /etc/redhat-release
Fedora release 19 (Schr=C3=B6dinger=E2=80=99s Cat)
$ uname -a
Linux
blizzard.mgmt.futurice.com 3.12.6-200.fc19.x86_64.debug #1 SMP=20
Mon Dec 23 16:24:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ lvm version
LVM version: 2.02.98(2) (2012-10-15)
Library version: 1.02.77 (2012-10-15)
Driver version: 4.26.0
--jRwgP1nGWsjExEUgLS6FSLvHqevHtQK23
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: Using GnuPG with Thunderbird -
http://www.enigmail.net/
iEYEARECAAYFAlMV2tUACgkQXOXFG4fgV75tWgCcDmmo/rkdRNmPuXgfGrMN4VIJ
7z8An0LrJa0bhyS9tLVaqz6U30rM6A+p
=41Ax
-----END PGP SIGNATURE-----
--jRwgP1nGWsjExEUgLS6FSLvHqevHtQK23--