This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--oXn0SGFNv1kLlogwarm5QsNoD8UwRwMnq
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On 5.3.2014, 16:01, Nir Soffer wrote:
----- Original Message -----
> From: "Boyan Tabakov" <blade(a)alslayer.net>
> To: "Nir Soffer" <nsoffer(a)redhat.com>
> Cc: users(a)ovirt.org
> Sent: Wednesday, March 5, 2014 3:38:25 PM
> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on=
some nodes
>
> Hello Nir,
>
> On Wed Mar 5 14:37:17 2014, Nir Soffer wrote:
>> ----- Original Message -----
>>> From: "Boyan Tabakov" <blade(a)alslayer.net>
>>> To: "Nir Soffer" <nsoffer(a)redhat.com>
>>> Cc: users(a)ovirt.org
>>> Sent: Tuesday, March 4, 2014 3:53:24 PM
>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated =
on
>>> some nodes
>>>
>>> On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
>>>> ----- Original Message -----
>>>>> From: "Nir Soffer" <nsoffer(a)redhat.com>
>>>>> To: "Boyan Tabakov" <blade(a)alslayer.net>
>>>>> Cc: users(a)ovirt.org, "Zdenek Kabelac"
<zkabelac(a)redhat.com>
>>>>> Sent: Monday, March 3, 2014 9:39:47 PM
>>>>> Subject: Re: [Users] SD Disk's Logical Volume not
visible/activate=
d on
>>>>> some nodes
>>>>>
>>>>> Hi Zdenek, can you look into this strange incident?
>>>>>
>>>>> When user creates a disk on one host (create a new lv), the lv is =
not
>>>>> seen
>>>>> on another host in the cluster.
>>>>>
>>>>> Calling multipath -r cause the new lv to appear on the other host.=
>>>>>
>>>>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, b=
ut
>>>>> unusual.
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Boyan Tabakov" <blade(a)alslayer.net>
>>>>>> To: "Nir Soffer" <nsoffer(a)redhat.com>
>>>>>> Cc: users(a)ovirt.org
>>>>>> Sent: Monday, March 3, 2014 9:51:05 AM
>>>>>> Subject: Re: [Users] SD Disk's Logical Volume not
visible/activat=
ed on
>>>>>> some
>>>>>> nodes
>>>>>>>>>>>> Consequently, when creating/booting
>>>>>>>>>>>> a VM with the said disk attached, the VM
fails to start on =
host2,
>>>>>>>>>>>> because host2
can't see the LV. Similarly, if the VM is sta=
rted
>>>>>>>>>>>> on
>>>>>>>>>>>> host1, it fails to migrate to host2.
Extract from host2 log=
is in
>>>>>>>>>>>> the
>>>>>>>>>>>> end. The LV in question is
6b35673e-7062-4716-a6c8-d5bf72fe=
3280.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I could track quickly the vdsm
code, there is onl=
y call
>>>>>>>>>>>> to
>>>>>>>>>>>> lvs
>>>>>>>>>>>> and not to lvscan or lvchange so the
host2 LVM doesn't full=
y
>>>>>>>>>>>> refresh.
>>>>>>>
>>>>>>> lvs should see any change on the shared storage.
>>>>>>>
>>>>>>>>>>>> The only workaround so far has been to
restart VDSM on host=
2,
>>>>>>>>>>>> which
>>>>>>>>>>>> makes it refresh all LVM data properly.
>>>>>>>
>>>>>>> When vdsm starts, it calls multipath -r, which ensure that we
se=
e all
>>>>>>> physical volumes.
>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> When is host2 supposed to pick up any
newly created LVs in =
the SD
>>>>>>>>>>>> VG?
>>>>>>>>>>>> Any suggestions where the problem might
be?
>>>>>>>>>>>
>>>>>>>>>>> When you create a new lv on the shared
storage, the new lv s=
hould
>>>>>>>>>>> be
>>>>>>>>>>> visible on the other host. Lets start by
verifying that you =
do see
>>>>>>>>>>> the new lv after a disk
was created.
>>>>>>>>>>>
>>>>>>>>>>> Try this:
>>>>>>>>>>>
>>>>>>>>>>> 1. Create a new disk, and check the disk uuid
in the engine =
ui
>>>>>>>>>>> 2. On another machine,
run this command:
>>>>>>>>>>>
>>>>>>>>>>> lvs -o vg_name,lv_name,tags
>>>>>>>>>>>
>>>>>>>>>>> You can identify the new lv using tags, which
should contain=
the
>>>>>>>>>>> new
>>>>>>>>>>> disk
>>>>>>>>>>> uuid.
>>>>>>>>>>>
>>>>>>>>>>> If you don't see the new lv from the
other host, please prov=
ide
>>>>>>>>>>> /var/log/messages
>>>>>>>>>>> and /var/log/sanlock.log.
>>>>>>>>>>
>>>>>>>>>> Just tried that. The disk is not visible on the
non-SPM node.=
>>>>>>>>>
>>>>>>>>> This means that storage is not accessible from this
host.
>>>>>>>>
>>>>>>>> Generally, the storage seems accessible ok. For example,
if I r=
estart
>>>>>>>> the vdsmd, all volumes get picked up
correctly (become visible =
in lvs
>>>>>>>> output and VMs can be started with
them).
>>>>>>>
>>>>>>> Lests repeat this test, but now, if you do not see the new
lv, p=
lease
>>>>>>> run:
>>>>>>>
>>>>>>> multipath -r
>>>>>>>
>>>>>>> And report the results.
>>>>>>>
>>>>>>
>>>>>> Running multipath -r helped and the disk was properly picked up
b=
y the
>>>>>> second host.
>>>>>>
>>>>>> Is running multipath -r safe while host is not in maintenance
mod=
e?
>>>>>
>>>>> It should be safe, vdsm uses in some cases.
>>>>>
>>>>>> If yes, as a temporary workaround I can patch vdsmd to run
multip=
ath -r
>>>>>> when e.g. monitoring the storage domain.
>>>>>
>>>>> I suggested running multipath as debugging aid; normally this is n=
ot
>>>>> needed.
>>>>>
>>>>> You should see lv on the shared storage without running multipath.=
>>>>>
>>>>> Zdenek, can you explain this?
>>>>>
>>>>>>>> One warning that I keep seeing in vdsm logs on both nodes
is th=
is:
>>>>>>>>
>>>>>>>> Thread-1617881::WARNING::2014-02-24
>>>>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo)
VG
>>>>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size
exceeded
>>>>>>>> critical size: mdasize=3D134217728 mdafree=3D0
>>>>>>>
>>>>>>> Can you share the output of the command bellow?
>>>>>>>
>>>>>>> lvs -o
>>>>>>>
uuid,name,attr,size,free,extent_size,extent_count,free_count=
,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>
>>>>>> Here's the output for both hosts.
>>>>>>
>>>>>> host1:
>>>>>> [root@host1 ~]# lvs -o
>>>>>>
uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_fre=
e_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
>>>>>> LV UUID LV
>>>>>> Attr LSize VFree Ext #Ext Free LV Tags
>>>>>>
>>>>>> VMdaSize VMdaFree #LV #PV
>>>>>> jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
>>>>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g
12=
8.00m
>>>>>> 1596 917
>>>>>>
IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-476=
4-9c8b-ac73492ac465
>>>>>> 128.00m 0 13 2
>>>>>
>>>>> This looks wrong - your vg_mda_free is zero - as vdsm complains.
>>
>> Patch
http://gerrit.ovirt.org/25408 should solve this issue.
>>
>> It may also solve the other issue with the missing lv - I could
>> not reproduce it yet.
>>
>> Can you try to apply this patch and report the results?
>>
>> Thanks,
>> Nir
>
> This patch helped, indeed! I tried it on the non-SPM node (as that's
> the node that I can currently easily put in maintenance) and the node
> started picking up newly created volumes correctly. I also set the
> user_lvmetad to 0 in the main lvm.conf, because without it manually
> running e.g. lvs was still using the metadata daemon.
>
> I can't confirm yet that this helps with the metadata volume warning,
> as that warning appears only on the SPM. I'll be able to put the SPM
> node in maintenance soon and will report later.
>
> This issue on Fedora makes me think - is Fedora still fully supported
> platform?
=20
It is supported, but probably not tested properly.
=20
Nir
=20
Alright! Thanks a lot for the help!
BR,
Boyan
--oXn0SGFNv1kLlogwarm5QsNoD8UwRwMnq
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: Using GnuPG with Thunderbird -
http://www.enigmail.net/
iEYEARECAAYFAlMXOGkACgkQXOXFG4fgV76fggCfdCw3VrqvDCB565NVByNEQ+MG
Aw8AoJu5O1l9PtdOH08mr7YE5RTgg26C
=D5gj
-----END PGP SIGNATURE-----
--oXn0SGFNv1kLlogwarm5QsNoD8UwRwMnq--