[ovirt-users] VMs becoming non-responsive sporadically

Sun May 1 12:31:07 UTC 2016

El 2016-04-30 23:22, Nir Soffer escribió:
> On Sun, May 1, 2016 at 12:48 AM,  <nicolas at devels.es> wrote:
>> El 2016-04-30 22:37, Nir Soffer escribió:
>>> 
>>> On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer <nsoffer at redhat.com> 
>>> wrote:
>>>> 
>>>> On Sat, Apr 30, 2016 at 7:16 PM,  <nicolas at devels.es> wrote:
>>>>> 
>>>>> El 2016-04-30 16:55, Nir Soffer escribió:
>>>>>> 
>>>>>> 
>>>>>> On Sat, Apr 30, 2016 at 11:33 AM, Nicolás <nicolas at devels.es> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Nir,
>>>>>>> 
>>>>>>> El 29/04/16 a las 22:34, Nir Soffer escribió:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Apr 29, 2016 at 9:17 PM,  <nicolas at devels.es> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> We're running oVirt 3.6.5.3-1 and lately we're experiencing 
>>>>>>>>> some
>>>>>>>>> issues
>>>>>>>>> with
>>>>>>>>> some VMs being paused because they're marked as non-responsive.
>>>>>>>>> Mostly,
>>>>>>>>> after a few seconds they recover, but we want to debug 
>>>>>>>>> precisely
>>>>>>>>> this
>>>>>>>>> problem so we can fix it consistently.
>>>>>>>>> 
>>>>>>>>> Our scenario is the following:
>>>>>>>>> 
>>>>>>>>> ~495 VMs, of which ~120 are constantly up
>>>>>>>>> 3 datastores, all of them iSCSI-based:
>>>>>>>>>    * ds1: 2T, currently has 276 disks
>>>>>>>>>    * ds2: 2T, currently has 179 disks
>>>>>>>>>    * ds3: 500G, currently has 65 disks
>>>>>>>>> 7 hosts: All have mostly the same hardware. CPU and memory are
>>>>>>>>> currently
>>>>>>>>> very lowly used (< 10%).
>>>>>>>>> 
>>>>>>>>>    ds1 and ds2 are physically the same backend which exports 
>>>>>>>>> two 2TB
>>>>>>>>> volumes.
>>>>>>>>> ds3 is a different storage backend where we're currently 
>>>>>>>>> migrating
>>>>>>>>> some
>>>>>>>>> disks from ds1 and ds2.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> What the the storage backend behind ds1 and 2?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> The storage backend for ds1 and ds2 is the iSCSI-based HP 
>>>>>>> LeftHand
>>>>>>> P4000
>>>>>>> G2.
>>>>>>> 
>>>>>>>>> Usually, when VMs become unresponsive, the whole host where 
>>>>>>>>> they run
>>>>>>>>> gets
>>>>>>>>> unresponsive too, so that gives a hint about the problem, my 
>>>>>>>>> bet is
>>>>>>>>> the
>>>>>>>>> culprit is somewhere on the host side and not on the VMs side.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Probably the vm became unresponsive because connection to the 
>>>>>>>> host
>>>>>>>> was
>>>>>>>> lost.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I forgot to mention that less commonly we have situations where 
>>>>>>> the
>>>>>>> host
>>>>>>> doesn't get unresponsive but the VMs on it do and they don't 
>>>>>>> become
>>>>>>> responsive ever again, so we have to forcibly power them off and 
>>>>>>> start
>>>>>>> them
>>>>>>> on a different host. But in this case the connection with the 
>>>>>>> host
>>>>>>> doesn't
>>>>>>> ever get lost (so basically the host is Up, but any VM run on 
>>>>>>> them is
>>>>>>> unresponsive).
>>>>>>> 
>>>>>>> 
>>>>>>>>> When that
>>>>>>>>> happens, the host itself gets non-responsive and only 
>>>>>>>>> recoverable
>>>>>>>>> after
>>>>>>>>> reboot, since it's unable to reconnect.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Piotr, can you check engine log and explain why host is not
>>>>>>>> reconnected?
>>>>>>>> 
>>>>>>>>> I must say this is not specific to
>>>>>>>>> this oVirt version, when we were using v.3.6.4 the same 
>>>>>>>>> happened,
>>>>>>>>> and
>>>>>>>>> it's
>>>>>>>>> also worthy mentioning we've not done any configuration changes 
>>>>>>>>> and
>>>>>>>>> everything had been working quite well for a long time.
>>>>>>>>> 
>>>>>>>>> We were monitoring our ds1 and ds2 physical backend to see
>>>>>>>>> performance
>>>>>>>>> and
>>>>>>>>> we suspect we've run out of IOPS since we're reaching the 
>>>>>>>>> maximum
>>>>>>>>> specified
>>>>>>>>> by the manufacturer, probably at certain times the host cannot
>>>>>>>>> perform
>>>>>>>>> a
>>>>>>>>> storage operation within some time limit and it marks VMs as
>>>>>>>>> unresponsive.
>>>>>>>>> That's why we've set up ds3 and we're migrating ds1 and ds2 to 
>>>>>>>>> ds3.
>>>>>>>>> When
>>>>>>>>> we
>>>>>>>>> run out of space on ds3 we'll create more smaller volumes to 
>>>>>>>>> keep
>>>>>>>>> migrating.
>>>>>>>>> 
>>>>>>>>> On the host side, when this happens, we've run repoplot on the 
>>>>>>>>> vdsm
>>>>>>>>> log
>>>>>>>>> and
>>>>>>>>> I'm attaching the result. Clearly there's a *huge* LVM response 
>>>>>>>>> time
>>>>>>>>> (~30
>>>>>>>>> secs.).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Indeed the log show very slow vgck and vgs commands - these are
>>>>>>>> called
>>>>>>>> every
>>>>>>>> 5 minutes for checking the vg health and refreshing vdsm lvm 
>>>>>>>> cache.
>>>>>>>> 
>>>>>>>> 1. starting vgck
>>>>>>>> 
>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) 
>>>>>>>> /usr/bin/taskset
>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config '
>>>>>>>> devices
>>>>>>>> { pre
>>>>>>>> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
>>>>>>>> write_cache_state=0 disable_after_error_count=3 filter = [
>>>>>>>> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\
>>>>>>>> '', '\''r|.*|'\'' ] }  global {  locking_type=1
>>>>>>>> prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  
>>>>>>>> backup {
>>>>>>>> retain_min = 50  retain_days = 0 } ' 5de4a000-a9c4-48
>>>>>>>> 9c-8eee-10368647c413 (cwd None)
>>>>>>>> 
>>>>>>>> 2. vgck ends after 55 seconds
>>>>>>>> 
>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: 
>>>>>>>> <err> = '
>>>>>>>> WARNING: lvmetad is running but disabled. Restart lvmetad before
>>>>>>>> enabling it!\n'; <rc> = 0
>>>>>>>> 
>>>>>>>> 3. starting vgs
>>>>>>>> 
>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) 
>>>>>>>> /usr/bin/taskset
>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' 
>>>>>>>> devices
>>>>>>>> { pref
>>>>>>>> erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
>>>>>>>> write_cache_state=0 disable_after_error_count=3 filter = [
>>>>>>>> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'',
>>>>>>>> '\''r|.*|'\'' ] }  global {
>>>>>>>>   locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>>>>>>>> use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } '
>>>>>>>> --noheadings --units b --nosuffix --separator '|
>>>>>>>> ' --ignoreskippedcluster -o
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>>> 5de4a000-a9c4-489c-8eee-10368
>>>>>>>> 647c413 (cwd None)
>>>>>>>> 
>>>>>>>> 4. vgs finished after 37 seconds
>>>>>>>> 
>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>> 13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: 
>>>>>>>> <err> = '
>>>>>>>> WARNING: lvmetad is running but disabled. Restart lvmetad before
>>>>>>>> enabling it!\n'; <rc> = 0
>>>>>>>> 
>>>>>>>> Zdenek, how do you suggest to debug this slow lvm commands?
>>>>>>>> 
>>>>>>>> Can you run the following commands on a host in trouble, and on 
>>>>>>>> some
>>>>>>>> other
>>>>>>>> hosts in the same timeframe?
>>>>>>>> 
>>>>>>>> time vgck -vvvv --config ' devices { filter =
>>>>>>>> ['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'',
>>>>>>>> '\''r|.*|'\'' ] }  global {  locking_type=1  
>>>>>>>> prioritise_write_locks=1
>>>>>>>> wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
>>>>>>>> retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>> 
>>>>>>>> time vgs -vvvv --config ' global { locking_type=1
>>>>>>>> prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  
>>>>>>>> backup {
>>>>>>>> retain_min = 50  retain_days = 0 } '
>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>> 
>>>>>>>> Note that I added -vvvv to both commands, this will generate 
>>>>>>>> huge
>>>>>>>> amount
>>>>>>>> of debugging info. Please share the output of these commands.
>>>>>>>> 
>>>>>>>> You may need to fix the commands. You can always copy and paste
>>>>>>>> directly
>>>>>>>> from vdsm log into the shell and add the -vvvv flag.
>>>>>>>> 
>>>>>>> 
>>>>>>> Indeed, there seems to be a big difference on hosts 5 and 6. I'm
>>>>>>> attaching
>>>>>>> the results of the execution of both commands on all hosts. Both
>>>>>>> commands
>>>>>>> show a pretty bigger output on hosts 5 and 6, and also a much 
>>>>>>> bigger
>>>>>>> execution time. Times are also attached in a file called TIMES.
>>>>>>> 
>>>>>>>>> Our host storage network is correctly configured and on a 1G
>>>>>>>>> interface, no errors on the host itself, switches, etc.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 1G serving about 20-70 vms per host? (495 vms, 120 always up, 7
>>>>>>>> hosts)?
>>>>>>>> 
>>>>>>>> Do you have separate network for management and storage, or both
>>>>>>>> use this 1G interface?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Yes, we have separate networks for management, storage and 
>>>>>>> motion.
>>>>>>> Storage
>>>>>>> and motion have 1G each (plus, for storage we use a bond of 2
>>>>>>> interfaces
>>>>>>> in
>>>>>>> ALB-mode (6)). Currently no host has more than 30 VMs at a time.
>>>>>>> 
>>>>>>>>> We've also limited storage in QoS to use 10MB/s and 40 IOPS,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> How did you configure this limit?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> In the Data Center tab, I chose our DC and a QoS sub-tab appears, 
>>>>>>> just
>>>>>>> like
>>>>>>> described here: 
>>>>>>> http://www.ovirt.org/documentation/sla/network-qos/
>>>>>>> 
>>>>>>>> 
>>>>>>>>> but this issue
>>>>>>>>> still happens, which leads me to be concerned whether this is 
>>>>>>>>> not
>>>>>>>>> just
>>>>>>>>> an
>>>>>>>>> IOPS issue; each host handles about cca. 600 LVs. Could this be 
>>>>>>>>> an
>>>>>>>>> issue
>>>>>>>>> too? I remark the LVM response times are low in normal 
>>>>>>>>> conditions
>>>>>>>>> (~1-2
>>>>>>>>> seconds).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We recommend to limit number lvs per vg to 350. If you have 276 
>>>>>>>> disks
>>>>>>>> on
>>>>>>>> ds1, and the disks are using snapshots, you may have too many 
>>>>>>>> lvs,
>>>>>>>> which
>>>>>>>> can cause slowdowns in lvm operations.
>>>>>>>> 
>>>>>>>> Can you share the output of:
>>>>>>>> 
>>>>>>>> vdsm-tool dump-volume-chains 
>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Ok, right now no VG has more than 350 VGs so I guess this is not
>>>>>>> currently
>>>>>>> the problem. Unfortunately, I run the vdsm-tool command but it 
>>>>>>> didn't
>>>>>>> end
>>>>>>> nor provide any output in cca. 1 hour, so I guess it was hanged 
>>>>>>> and I
>>>>>>> stopped it. If you confirm this is a normal execution time I can 
>>>>>>> leave
>>>>>>> it
>>>>>>> running whatever time it takes.
>>>>>>> 
>>>>>>> host5:~# vgs
>>>>>>>   VG                                   #PV #LV #SN Attr   VSize 
>>>>>>> VFree
>>>>>>>   0927c050-6fb6-463c-bb37-8b8da641dcd3   1  63   0 wz--n- 499,62g
>>>>>>> 206,50g
>>>>>>>   5de4a000-a9c4-489c-8eee-10368647c413   1 335   0 wz--n-   2,00t
>>>>>>> 518,62g
>>>>>>>   b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8   1 215   0 wz--n-   2,00t
>>>>>>> 495,25g
>>>>>>>   sys_vg                                 1   6   0 wz--n- 136,44g
>>>>>>> 122,94g
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'm attaching the vdsm.log, engine.log and repoplot PDF;
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This is very useful, thanks. Can you send also the vdsm logs and
>>>>>>>> repoplots
>>>>>>>> from other hosts for the same timeframe?
>>>>>>>> 
>>>>>>>>> if someone could
>>>>>>>>> give a hint on some additional problems in them and shed some 
>>>>>>>>> light
>>>>>>>>> on
>>>>>>>>> the
>>>>>>>>> above thoughts I'd be very grateful.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Do you have sar configured on the host? having sar logs can 
>>>>>>>> reveal
>>>>>>>> more
>>>>>>>> info about this timeframe.
>>>>>>>> 
>>>>>>>> Do you have information about amount of io from vms during this
>>>>>>>> timeframe?
>>>>>>>> amount of io on the storage backend during this timeframe?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Not currently, but I'll be alert for the time this happens again 
>>>>>>> and
>>>>>>> I'll
>>>>>>> check some sar commands related to I/O and I'll provide feedback.
>>>>>>> Nevertheless, by the time I run the commands above no machine was
>>>>>>> unresponsive and I'm still getting such huge execution times. I 
>>>>>>> tried
>>>>>>> running iotop now and I see there are 2 vgck and vgs processes 
>>>>>>> with a
>>>>>>> rate
>>>>>>> of ~500Kb/s each for reading.
>>>>>>> 
>>>>>>> root     24446  1.3  0.0  57008  6824 ?        D<   09:15   0:00
>>>>>>> /usr/sbin/lvm vgs --config  devices { preferred_names =
>>>>>>> ["^/dev/mapper/"]
>>>>>>> ignore_suspended_devices=1 write_cache_state=0
>>>>>>> disable_after_error_count=3
>>>>>>> filter = [
>>>>>>> 
>>>>>>> 
>>>>>>> 'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|',
>>>>>>> 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1
>>>>>>> wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
>>>>>>> retain_days
>>>>>>> = 0
>>>>>>> }  --noheadings --units b --nosuffix --separator |
>>>>>>> --ignoreskippedcluster
>>>>>>> -o
>>>>>>> 
>>>>>>> 
>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>> b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d
>>>>>>> 
>>>>>>> I'm also attaching the iostat output for host 5 in a file called
>>>>>>> iostat-host5.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Interesting, all hosts are checking the same storage, but on host5 
>>>>>> and
>>>>>> host6
>>>>>> the output of vgs and vgck is 10 times bigger - this explain why 
>>>>>> they
>>>>>> take
>>>>>> about
>>>>>> 10 times longer to run.
>>>>>> 
>>>>>> $ ls -lh vgs-*
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 221K Apr 30 08:55 vgs-host1
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 239K Apr 30 08:55 vgs-host2
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 228K Apr 30 08:55 vgs-host3
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 288K Apr 30 08:55 vgs-host4
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host5
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host6
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 232K Apr 30 08:55 vgs-host7
>>>>>> $ ls -lh vgck-*
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 238K Apr 30 08:55 vgck-host1
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 254K Apr 30 08:55 vgck-host2
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 244K Apr 30 08:55 vgck-host3
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 298K Apr 30 08:55 vgck-host4
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host5
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host6
>>>>>> -rw-r--r--. 1 nsoffer nsoffer 248K Apr 30 08:55 vgck-host7
>>>>>> 
>>>>>> Can you do collect the output of "dmsetup table -v" on host 1, 5, 
>>>>>> and
>>>>>> 6?
>>>>>> 
>>>>>> Nir
>>>>> 
>>>>> 
>>>>> 
>>>>> Sure, please find attached results of the command. Once again, for 
>>>>> hosts
>>>>> 5
>>>>> and 6 the size is 10 times bigger than for host 1.
>>>> 
>>>> 
>>>> I think the issue causing slow vgs and vgck commands is stale
>>>> lvs on host 5 and 6. This may be the root caused for paused vms, but 
>>>> I
>>>> cannot
>>>> explain how it is related yet.
>>>> 
>>>> Comparing vgck on host1 and host5 - we can see that vgck opens
>>>> 213 /dev/mapper/* devices, actually 53 uniq devices, most of them
>>>> are opened 4 times. On host5,  vgck opens 2489 devices (622 uniq).
>>>> This explains why the operation takes about 10 times longer.
>>>> 
>>>> Checking dmsteup table output, we can see that host1 has 53 devices,
>>>> and host5 622 devices.
>>>> 
>>>> Checking the device open count, host1 has 15 stale devices
>>>> (Open count: 0), but host5 has 597 stale devices.
>>>> 
>>>> Leaving stale devices is a known issue, but we never had evidence 
>>>> that it
>>>> cause
>>>> trouble except warnings in lvm commands.
>>>> 
>>>> Please open an ovirt bug about this issue, and include all the files 
>>>> you
>>>> sent
>>>> so far in this thread, and all the vdsm logs on host5.
>>>> 
>>>> To remove the stale devices, you can do:
>>>> 
>>>> for name in `dmsetup info -c -o open,name | awk '/ 0 / {print $2}'`; 
>>>> do
>>>>     echo "removing stale device: $name"
>>>>     dmsetup remove $name
>>>> done
>>> 
>>> 
>>> Note that this cleanup is safe only in maintenance mode, since it 
>>> will
>>> also remove
>>> the mapping for vdsm special lvs (ids, leases, inbox, outbox, master,
>>> metatada)
>>> that are accessed by vdsm regularly, and are not stale.
>>> 
>>> This is also not safe if you use hosted engine, hosted engine agent
>>> should maintain
>>> its vg.
>>> 
>>> The safest way to remove stale devices is to restart vdsm - it
>>> deactivates all unused
>>> lvs (except special lvs) during startup.
>>> 
>> 
>> Hm, I run it without even putting the host on maintenance, but I will 
>> next
>> time. Fortunately it seems it didn't do any harm and everything is 
>> working.
>> I'll probably do a script that will put host on maintenance, cleanup 
>> or
>> restart vdsm and start again (this is not a hosted engine) if the 
>> number of
>> stale devices is bigger than a threshold.
> 
> I would not do anything automatic at this point except monitoring this 
> issue.
> 
> For cleanup - you have 2 options:
> 
> - host is up - restart vdsm
> - host in maintenance - remove stale devices
> 
> Based on local testing, I think the issue is this:
> 
> - When connecting to storage, all lvs becomes active - this may be a 
> change
>   in lvm during el 7 development that we did not notice.
> 
> - Each time vdsm is restarted, all active lvs that should not be active 
> are
>   deactivated during vdsm startup
> 
> - When deactivating storage domain, vdsm does not deactivate the lvs
>    leaving stale devices.
> 
> Can you confirm that works like this on your system?
> 
> To test this:
> 

I did these tests on host5. Before switching the host to maintenance, I 
run the vgck command and execution time was: 0m1.347s. Current stale 
devices at that time were:

5de4a000--a9c4--489c--8eee--10368647c413-master
5de4a000--a9c4--489c--8eee--10368647c413-outbox
0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata
5de4a000--a9c4--489c--8eee--10368647c413-ids
0927c050--6fb6--463c--bb37--8b8da641dcd3-leases
5de4a000--a9c4--489c--8eee--10368647c413-inbox
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-ids
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox
5de4a000--a9c4--489c--8eee--10368647c413-leases
0927c050--6fb6--463c--bb37--8b8da641dcd3-master
5de4a000--a9c4--489c--8eee--10368647c413-metadata
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master
0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox
0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox
0927c050--6fb6--463c--bb37--8b8da641dcd3-ids

(seems there were not any new stale devices since yesterday after 
running the loop).

> 1. Put one host to maintenance
> 
> You should see some stale devices - the special lvs
> (ids, leases, master, metadata, inbox, outbox)
> 
> And probably 2 OVF_STORE lvs per vg (128m lvs used to store
> vm ovf)
> 

Still had the same devices as above.

> 2. Cleanup the stale devices
> 

After cleaning the stale devices, those remained:

36000eb3a4f1acbc20000000000000043
360014056f0dc8930d744f83af8ddc709
36000eb3a4f1acbc200000000000000b9

> 3. Activate host
> 
> You should see again about 600 stale devices, and lvm commands
> will probably run much slower as you reported.
> 

Current stale devices were these:

5de4a000--a9c4--489c--8eee--10368647c413-master
5de4a000--a9c4--489c--8eee--10368647c413-outbox
0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata
0927c050--6fb6--463c--bb37--8b8da641dcd3-leases
5de4a000--a9c4--489c--8eee--10368647c413-inbox
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox
5de4a000--a9c4--489c--8eee--10368647c413-leases
0927c050--6fb6--463c--bb37--8b8da641dcd3-master
5de4a000--a9c4--489c--8eee--10368647c413-metadata
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master
0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox
b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox
0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox

> 4. Restart vdsm
> 
> All the regular lvs should be inactive now, no stale devices
> should exists, except the special lvs (expected)
> 

Same result as above. The strange thing is that now I don't see any VGs 
running the 'vgs' command but the system's:

host5:~# vgs
   VG     #PV #LV #SN Attr   VSize   VFree
   sys_vg   1   6   0 wz--n- 136,44g 122,94g

However, machines running on that host have no issues and storage is 
working well. vgscan doesn't do the trick either. I tested this on host6 
too and the same happens: After reactivating host I cannot see VGs with 
the vgs command - however, everything seems to work quite well.

This is CentOS Linux release 7.2.1511 FWIW.

> Nir