[Users] SD Disk's Logical Volume not visible/activated on some nodes

Nir Soffer nsoffer at redhat.com
Tue Mar 4 12:46:33 UTC 2014


----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Boyan Tabakov" <blade at alslayer.net>
> Cc: users at ovirt.org, "Zdenek Kabelac" <zkabelac at redhat.com>
> Sent: Monday, March 3, 2014 9:39:47 PM
> Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
> 
> Hi Zdenek, can you look into this strange incident?
> 
> When user creates a disk on one host (create a new lv), the lv is not seen
> on another host in the cluster.
> 
> Calling multipath -r cause the new lv to appear on the other host.
> 
> Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
> 
> ----- Original Message -----
> > From: "Boyan Tabakov" <blade at alslayer.net>
> > To: "Nir Soffer" <nsoffer at redhat.com>
> > Cc: users at ovirt.org
> > Sent: Monday, March 3, 2014 9:51:05 AM
> > Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some
> > nodes
> > >>>>>> Consequently, when creating/booting
> > >>>>>> a VM with the said disk attached, the VM fails to start on host2,
> > >>>>>> because host2 can't see the LV. Similarly, if the VM is started on
> > >>>>>> host1, it fails to migrate to host2. Extract from host2 log is in
> > >>>>>> the
> > >>>>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
> > >>>>>>
> > >>>>>> As far as I could track quickly the vdsm code, there is only call to
> > >>>>>> lvs
> > >>>>>> and not to lvscan or lvchange so the host2 LVM doesn't fully
> > >>>>>> refresh.
> > > 
> > > lvs should see any change on the shared storage.
> > > 
> > >>>>>> The only workaround so far has been to restart VDSM on host2, which
> > >>>>>> makes it refresh all LVM data properly.
> > > 
> > > When vdsm starts, it calls multipath -r, which ensure that we see all
> > > physical volumes.
> > > 
> > >>>>>>
> > >>>>>> When is host2 supposed to pick up any newly created LVs in the SD
> > >>>>>> VG?
> > >>>>>> Any suggestions where the problem might be?
> > >>>>>
> > >>>>> When you create a new lv on the shared storage, the new lv should be
> > >>>>> visible on the other host. Lets start by verifying that you do see
> > >>>>> the new lv after a disk was created.
> > >>>>>
> > >>>>> Try this:
> > >>>>>
> > >>>>> 1. Create a new disk, and check the disk uuid in the engine ui
> > >>>>> 2. On another machine, run this command:
> > >>>>>
> > >>>>> lvs -o vg_name,lv_name,tags
> > >>>>>
> > >>>>> You can identify the new lv using tags, which should contain the new
> > >>>>> disk
> > >>>>> uuid.
> > >>>>>
> > >>>>> If you don't see the new lv from the other host, please provide
> > >>>>> /var/log/messages
> > >>>>> and /var/log/sanlock.log.
> > >>>>
> > >>>> Just tried that. The disk is not visible on the non-SPM node.
> > >>>
> > >>> This means that storage is not accessible from this host.
> > >>
> > >> Generally, the storage seems accessible ok. For example, if I restart
> > >> the vdsmd, all volumes get picked up correctly (become visible in lvs
> > >> output and VMs can be started with them).
> > > 
> > > Lests repeat this test, but now, if you do not see the new lv, please
> > > run:
> > > 
> > >     multipath -r
> > > 
> > > And report the results.
> > > 
> > 
> > Running multipath -r helped and the disk was properly picked up by the
> > second host.
> > 
> > Is running multipath -r safe while host is not in maintenance mode?
> 
> It should be safe, vdsm uses in some cases.
> 
> > If yes, as a temporary workaround I can patch vdsmd to run multipath -r
> > when e.g. monitoring the storage domain.
> 
> I suggested running multipath as debugging aid; normally this is not needed.
> 
> You should see lv on the shared storage without running multipath.
> 
> Zdenek, can you explain this?
> 
> > >> One warning that I keep seeing in vdsm logs on both nodes is this:
> > >>
> > >> Thread-1617881::WARNING::2014-02-24
> > >> 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG
> > >> 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded
> > >>  critical size: mdasize=134217728 mdafree=0
> > > 
> > > Can you share the output of the command bellow?
> > > 
> > >     lvs -o
> > >     uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> > 
> > Here's the output for both hosts.
> > 
> > host1:
> > [root at host1 ~]# lvs -o
> > uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count
> >   LV UUID                                LV
> >       Attr      LSize   VFree   Ext     #Ext  Free  LV Tags
> > 
> > VMdaSize  VMdaFree  #LV #PV
> >   jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL
> > 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao---   2.00g 114.62g 128.00m
> > 1596   917
> > IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465
> >    128.00m        0   13   2
> 
> This looks wrong - your vg_mda_free is zero - as vdsm complains.
> 
> Zdenek, how can we debug this further?

I see same issue in Fedora 19.

Can you share with us the output of:

cat /etc/redhat-release
uname -a
lvm version

Nir



More information about the Users mailing list