This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--CFI93AnmktNI0BWuXdh0ffiLXcOJQifXa
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Hello Nir,
On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
----- Original Message -----
> From: "Boyan Tabakov" <blade(a)alslayer.net>
> To: "Nir Soffer" <nsoffer(a)redhat.com>
> Cc: users(a)ovirt.org
> Sent: Tuesday, March 4, 2014 3:53:24 PM
> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on=
some nodes
>
> On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
>> ----- Original Message -----
>>> From: "Nir Soffer" <nsoffer(a)redhat.com>
>>> To: "Boyan Tabakov" <blade(a)alslayer.net>
>>> Cc: users(a)ovirt.org, "Zdenek Kabelac" <zkabelac(a)redhat.com>
>>> Sent: Monday, March 3, 2014 9:39:47 PM
>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated =
on
>>> some nodes
>>>
>>> Hi Zdenek, can you look into this strange incident?
>>>
>>> When user creates a disk on one host (create a new lv), the lv is no=
t
seen
>>> on another host in the cluster.
>>>
>>> Calling multipath -r cause the new lv to appear on the other host.
>>>
>>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but=
>>> unusual.
>>>
>>> ----- Original Message -----
>>>> From: "Boyan Tabakov" <blade(a)alslayer.net>
>>>> To: "Nir Soffer" <nsoffer(a)redhat.com>
>>>> Cc: users(a)ovirt.org
>>>> Sent: Monday, March 3, 2014 9:51:05 AM
>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated=
on
>>>> some
>>>> nodes
>>>>>>>>>> Consequently, when creating/booting
>>>>>>>>>> a VM with the said disk attached, the VM fails to
start on ho=
st2,
>>>>>>>>>> because host2 can't see
the LV. Similarly, if the VM is start=
ed on
>>>>>>>>>> host1, it fails to migrate to
host2. Extract from host2 log i=
s in
>>>>>>>>>> the
>>>>>>>>>> end. The LV in question is
6b35673e-7062-4716-a6c8-d5bf72fe32=
80.
>>>>>>>>>>
>>>>>>>>>> As far as I could track quickly the vdsm code,
there is only =
call
>>>>>>>>>> to
>>>>>>>>>> lvs
>>>>>>>>>> and not to lvscan or lvchange so the host2 LVM
doesn't fully
>>>>>>>>>> refresh.
>>>>>
>>>>> lvs should see any change on the shared storage.
>>>>>
>>>>>>>>>> The only workaround so far has been to restart
VDSM on host2,=
which
>>>>>>>>>> makes it refresh all LVM data
properly.
>>>>>
>>>>> When vdsm starts, it calls multipath -r, which ensure that we see =
all
>>>>> physical volumes.
>>>>>
>>>>>>>>>>
>>>>>>>>>> When is host2 supposed to pick up any newly
created LVs in th=
e SD
>>>>>>>>>> VG?
>>>>>>>>>> Any suggestions where the problem might be?
>>>>>>>>>
>>>>>>>>> When you create a new lv on the shared storage, the
new lv sho=
uld be
>>>>>>>>> visible on the other host. Lets
start by verifying that you do=
see
>>>>>>>>> the new lv after a disk was
created.
>>>>>>>>>
>>>>>>>>> Try this:
>>>>>>>>>
>>>>>>>>> 1. Create a new disk, and check the disk uuid in the
engine ui=
>>>>>>>>> 2. On another machine, run this
command:
>>>>>>>>>
>>>>>>>>> lvs -o vg_name,lv_name,tags
>>>>>>>>>
>>>>>>>>> You can identify the new lv using tags, which should
contain t=
he new
>>>>>>>>> disk
>>>>>>>>> uuid.
>>>>>>>>>
>>>>>>>>> If you don't see the new lv from the other host,
please provid=
e
>>>>>>>>> /var/log/messages
>>>>>>>>> and /var/log/sanlock.log.
>>>>>>>>
>>>>>>>> Just tried that. The disk is not visible on the non-SPM
node.
>>>>>>>
>>>>>>> This means that storage is not accessible from this host.
>>>>>>
>>>>>> Generally, the storage seems accessible ok. For example, if I
res=
tart
>>>>>> the vdsmd, all volumes get picked up
correctly (become visible in=
lvs
>>>>>> output and VMs can be started with them).
>>>>>
>>>>> Lests repeat this test, but now, if you do not see the new lv, ple=
ase
>>>>> run:
>>>>>
>>>>> multipath -r
>>>>>
>>>>> And report the results.
>>>>>
>>>>
>>>> Running multipath -r helped and the disk was properly picked up by =
the
>>>> second host.
>>>>
>>>> Is running multipath -r safe while host is not in maintenance mode?=
>>>
>>> It should be safe, vdsm uses in some cases.
>>>
>>>> If yes, as a temporary workaround I can patch vdsmd to run multipat=
h -r
>>>> when e.g. monitoring the storage domain.
>>>
>>> I suggested running multipath as debugging aid; normally this is not=
>>> needed.
>>>
>>> You should see lv on the shared storage without running multipath.
>>>
>>> Zdenek, can you explain this?
>>>
>>>>>> One warning that I keep seeing in vdsm logs on both nodes is
this=
:
>>>>>>
>>>>>> Thread-1617881::WARNING::2014-02-24
>>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
>>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size
exceeded
>>>>>> critical size: mdasize=3D134217728 mdafree=3D0
>>>>>
>>>>> Can you share the output of the command bellow?
>>>>>
>>>>> lvs -o
>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,t=
ags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>
>>>> Here's the output for both hosts.
>>>>
>>>> host1:
>>>> [root@host1 ~]# lvs -o
>>>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_=
count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
>>>> LV UUID LV
>>>> Attr LSize VFree Ext #Ext Free LV Tags
>>>>
>>>> VMdaSize VMdaFree #LV #PV
>>>> jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
>>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.=
00m
>>>> 1596 917
>>>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-=
9c8b-ac73492ac465
>>>> 128.00m 0 13 2
>>>
>>> This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch
http://gerrit.ovirt.org/25408 should solve this issue.
It may also solve the other issue with the missing lv - I could
not reproduce it yet.
Can you try to apply this patch and report the results?
Thanks,
Nir
This patch helped, indeed! I tried it on the non-SPM node (as that's=20
the node that I can currently easily put in maintenance) and the node=20
started picking up newly created volumes correctly. I also set the=20
user_lvmetad to 0 in the main lvm.conf, because without it manually=20
running e.g. lvs was still using the metadata daemon.
I can't confirm yet that this helps with the metadata volume warning,=20
as that warning appears only on the SPM. I'll be able to put the SPM=20
node in maintenance soon and will report later.
This issue on Fedora makes me think - is Fedora still fully supported=20
platform?
Best regards,
Boyan
--CFI93AnmktNI0BWuXdh0ffiLXcOJQifXa
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: Using GnuPG with Thunderbird -
http://www.enigmail.net/
iEYEARECAAYFAlMXKNcACgkQXOXFG4fgV76CZgCgkAj0IDS6sTZr3DyAVmvBO9J+
vEcAnjP/qvyIjx9eR1DkdP6Ccj2VK/4n
=/pXc
-----END PGP SIGNATURE-----
--CFI93AnmktNI0BWuXdh0ffiLXcOJQifXa--