[Users] SD Disk's Logical Volume not visible/activated on some nodes

Nir Soffer nsoffer at redhat.com
Wed Mar 5 14:01:08 UTC 2014


----- Original Message -----
> From: "Boyan Tabakov" <blade at alslayer.net>
> To: "Nir Soffer" <nsoffer at redhat.com>
> Cc: users at ovirt.org
> Sent: Wednesday, March 5, 2014 3:38:25 PM
> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
> 
> Hello Nir,
> 
> On Wed Mar  5 14:37:17 2014, Nir Soffer wrote:
> > ----- Original Message -----
> >> From: "Boyan Tabakov" <blade at alslayer.net>
> >> To: "Nir Soffer" <nsoffer at redhat.com>
> >> Cc: users at ovirt.org
> >> Sent: Tuesday, March 4, 2014 3:53:24 PM
> >> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
> >> some nodes
> >>
> >> On Tue Mar  4 14:46:33 2014, Nir Soffer wrote:
> >>> ----- Original Message -----
> >>>> From: "Nir Soffer" <nsoffer at redhat.com>
> >>>> To: "Boyan Tabakov" <blade at alslayer.net>
> >>>> Cc: users at ovirt.org, "Zdenek Kabelac" <zkabelac at redhat.com>
> >>>> Sent: Monday, March 3, 2014 9:39:47 PM
> >>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
> >>>> some nodes
> >>>>
> >>>> Hi Zdenek, can you look into this strange incident?
> >>>>
> >>>> When user creates a disk on one host (create a new lv), the lv is not
> >>>> seen
> >>>> on another host in the cluster.
> >>>>
> >>>> Calling multipath -r cause the new lv to appear on the other host.
> >>>>
> >>>> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but
> >>>> unusual.
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Boyan Tabakov" <blade at alslayer.net>
> >>>>> To: "Nir Soffer" <nsoffer at redhat.com>
> >>>>> Cc: users at ovirt.org
> >>>>> Sent: Monday, March 3, 2014 9:51:05 AM
> >>>>> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on
> >>>>> some
> >>>>> nodes
> >>>>>>>>>>> Consequently, when creating/booting
> >>>>>>>>>>> a VM with the said disk attached, the VM fails to start on host2,
> >>>>>>>>>>> because host2 can't see the LV. Similarly, if the VM is started
> >>>>>>>>>>> on
> >>>>>>>>>>> host1, it fails to migrate to host2. Extract from host2 log is in
> >>>>>>>>>>> the
> >>>>>>>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
> >>>>>>>>>>>
> >>>>>>>>>>> As far as I could track quickly the vdsm code, there is only call
> >>>>>>>>>>> to
> >>>>>>>>>>> lvs
> >>>>>>>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
> >>>>>>>>>>> refresh.
> >>>>>>
> >>>>>> lvs should see any change on the shared storage.
> >>>>>>
> >>>>>>>>>>> The only workaround so far has been to restart VDSM on host2,
> >>>>>>>>>>> which
> >>>>>>>>>>> makes it refresh all LVM data properly.
> >>>>>>
> >>>>>> When vdsm starts, it calls multipath -r, which ensure that we see all
> >>>>>> physical volumes.
> >>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> When is host2 supposed to pick up any newly created LVs in the SD
> >>>>>>>>>>> VG?
> >>>>>>>>>>> Any suggestions where the problem might be?
> >>>>>>>>>>
> >>>>>>>>>> When you create a new lv on the shared storage, the new lv should
> >>>>>>>>>> be
> >>>>>>>>>> visible on the other host. Lets start by verifying that you do see
> >>>>>>>>>> the new lv after a disk was created.
> >>>>>>>>>>
> >>>>>>>>>> Try this:
> >>>>>>>>>>
> >>>>>>>>>> 1. Create a new disk, and check the disk uuid in the engine ui
> >>>>>>>>>> 2. On another machine, run this command:
> >>>>>>>>>>
> >>>>>>>>>> lvs -o vg_name,lv_name,tags
> >>>>>>>>>>
> >>>>>>>>>> You can identify the new lv using tags, which should contain the
> >>>>>>>>>> new
> >>>>>>>>>> disk
> >>>>>>>>>> uuid.
> >>>>>>>>>>
> >>>>>>>>>> If you don't see the new lv from the other host, please provide
> >>>>>>>>>> /var/log/messages
> >>>>>>>>>> and /var/log/sanlock.log.
> >>>>>>>>>
> >>>>>>>>> Just tried that. The disk is not visible on the non-SPM node.
> >>>>>>>>
> >>>>>>>> This means that storage is not accessible from this host.
> >>>>>>>
> >>>>>>> Generally, the storage seems accessible ok. For example, if I restart
> >>>>>>> the vdsmd, all volumes get picked up correctly (become visible in lvs
> >>>>>>> output and VMs can be started with them).
> >>>>>>
> >>>>>> Lests repeat this test, but now, if you do not see the new lv, please
> >>>>>> run:
> >>>>>>
> >>>>>>     multipath -r
> >>>>>>
> >>>>>> And report the results.
> >>>>>>
> >>>>>
> >>>>> Running multipath -r helped and the disk was properly picked up by the
> >>>>> second host.
> >>>>>
> >>>>> Is running multipath -r safe while host is not in maintenance mode?
> >>>>
> >>>> It should be safe, vdsm uses in some cases.
> >>>>
> >>>>> If yes, as a temporary workaround I can patch vdsmd to run multipath -r
> >>>>> when e.g. monitoring the storage domain.
> >>>>
> >>>> I suggested running multipath as debugging aid; normally this is not
> >>>> needed.
> >>>>
> >>>> You should see lv on the shared storage without running multipath.
> >>>>
> >>>> Zdenek, can you explain this?
> >>>>
> >>>>>>> One warning that I keep seeing in vdsm logs on both nodes is this:
> >>>>>>>
> >>>>>>> Thread-1617881::WARNING::2014-02-24
> >>>>>>> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
> >>>>>>> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
> >>>>>>>  critical size: mdasize=134217728 mdafree=0
> >>>>>>
> >>>>>> Can you share the output of the command bellow?
> >>>>>>
> >>>>>>     lvs -o
> >>>>>>     uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> >>>>>
> >>>>> Here's the output for both hosts.
> >>>>>
> >>>>> host1:
> >>>>> [root at host1 ~]# lvs -o
> >>>>> uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
> >>>>>   LV UUID                                LV
> >>>>>       Attr      LSize   VFree   Ext     #Ext  Free  LV Tags
> >>>>>
> >>>>> VMdaSize  VMdaFree  #LV #PV
> >>>>>   jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
> >>>>> 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao---   2.00g 114.62g 128.00m
> >>>>> 1596   917
> >>>>> IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
> >>>>>    128.00m        0   13   2
> >>>>
> >>>> This looks wrong - your vg_mda_free is zero - as vdsm complains.
> >
> > Patch http://gerrit.ovirt.org/25408 should solve this issue.
> >
> > It may also solve the other issue with the missing lv - I could
> > not reproduce it yet.
> >
> > Can you try to apply this patch and report the results?
> >
> > Thanks,
> > Nir
> 
> This patch helped, indeed! I tried it on the non-SPM node (as that's
> the node that I can currently easily put in maintenance) and the node
> started picking up newly created volumes correctly. I also set the
> user_lvmetad to 0 in the main lvm.conf, because without it manually
> running e.g. lvs was still using the metadata daemon.
> 
> I can't confirm yet that this helps with the metadata volume warning,
> as that warning appears only on the SPM. I'll be able to put the SPM
> node in maintenance soon and will report later.
> 
> This issue on Fedora makes me think - is Fedora still fully supported
> platform?

It is supported, but probably not tested properly.

Nir



More information about the Users mailing list