
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Tuesday, March 4, 2014 3:53:24 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
On Tue Mar 4 14:46:33 2014, Nir Soffer wrote:
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Boyan Tabakov" <blade@alslayer.net> Cc: users@ovirt.org, "Zdenek Kabelac" <zkabelac@redhat.com> Sent: Monday, March 3, 2014 9:39:47 PM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hi Zdenek, can you look into this strange incident?
When user creates a disk on one host (create a new lv), the lv is not seen on another host in the cluster.
Calling multipath -r cause the new lv to appear on the other host.
Finally, lvs tell us that vg_mda_free is zero - maybe unrelated, but unusual.
----- Original Message -----
From: "Boyan Tabakov" <blade@alslayer.net> To: "Nir Soffer" <nsoffer@redhat.com> Cc: users@ovirt.org Sent: Monday, March 3, 2014 9:51:05 AM Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
>>>> Consequently, when creating/booting >>>> a VM with the said disk attached, the VM fails to start on host2, >>>> because host2 can't see the LV. Similarly, if the VM is started on >>>> host1, it fails to migrate to host2. Extract from host2 log is in >>>> the >>>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280. >>>> >>>> As far as I could track quickly the vdsm code, there is only call >>>> to >>>> lvs >>>> and not to lvscan or lvchange so the host2 LVM doesn't fully >>>> refresh.
lvs should see any change on the shared storage.
>>>> The only workaround so far has been to restart VDSM on host2, which >>>> makes it refresh all LVM data properly.
When vdsm starts, it calls multipath -r, which ensure that we see all physical volumes.
>>>> >>>> When is host2 supposed to pick up any newly created LVs in the SD >>>> VG? >>>> Any suggestions where the problem might be? >>> >>> When you create a new lv on the shared storage, the new lv should be >>> visible on the other host. Lets start by verifying that you do see >>> the new lv after a disk was created. >>> >>> Try this: >>> >>> 1. Create a new disk, and check the disk uuid in the engine ui >>> 2. On another machine, run this command: >>> >>> lvs -o vg_name,lv_name,tags >>> >>> You can identify the new lv using tags, which should contain the new >>> disk >>> uuid. >>> >>> If you don't see the new lv from the other host, please provide >>> /var/log/messages >>> and /var/log/sanlock.log. >> >> Just tried that. The disk is not visible on the non-SPM node. > > This means that storage is not accessible from this host.
Generally, the storage seems accessible ok. For example, if I restart the vdsmd, all volumes get picked up correctly (become visible in lvs output and VMs can be started with them).
Lests repeat this test, but now, if you do not see the new lv, please run:
multipath -r
And report the results.
Running multipath -r helped and the disk was properly picked up by the second host.
Is running multipath -r safe while host is not in maintenance mode?
It should be safe, vdsm uses in some cases.
If yes, as a temporary workaround I can patch vdsmd to run multipath -r when e.g. monitoring the storage domain.
I suggested running multipath as debugging aid; normally this is not needed.
You should see lv on the shared storage without running multipath.
Zdenek, can you explain this?
One warning that I keep seeing in vdsm logs on both nodes is this:
Thread-1617881::WARNING::2014-02-24 16:57:50,627::sp::1553::Storage.StoragePool::(getInfo) VG 3307f6fa-dd58-43db-ab23-b1fb299006c7's metadata size exceeded critical size: mdasize=134217728 mdafree=0
Can you share the output of the command bellow?
lvs -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
Here's the output for both hosts.
host1: [root@host1 ~]# lvs -o uuid,name,attr,size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count LV UUID LV Attr LSize VFree Ext #Ext Free LV Tags
VMdaSize VMdaFree #LV #PV jGEpVm-oPW8-XyxI-l2yi-YF4X-qteQ-dm8SqL 3d362bf2-20f4-438d-9ba9-486bd2e8cedf -wi-ao--- 2.00g 114.62g 128.00m 1596 917 IU_0227da98-34b2-4b0c-b083-d42e7b760036,MD_5,PU_f4231952-76c5-4764-9c8b-ac73492ac465 128.00m 0 13 2
This looks wrong - your vg_mda_free is zero - as vdsm complains.
Patch http://gerrit.ovirt.org/25408 should solve this issue. It may also solve the other issue with the missing lv - I could not reproduce it yet. Can you try to apply this patch and report the results? Thanks, Nir