[ovirt-users] VMs becoming non-responsive sporadically

Sun May 1 14:04:48 UTC 2016

El 2016-05-01 14:01, Nir Soffer escribió:
> On Sun, May 1, 2016 at 3:31 PM,  <nicolas at devels.es> wrote:
>> El 2016-04-30 23:22, Nir Soffer escribió:
>>> 
>>> On Sun, May 1, 2016 at 12:48 AM,  <nicolas at devels.es> wrote:
>>>> 
>>>> El 2016-04-30 22:37, Nir Soffer escribió:
>>>>> 
>>>>> 
>>>>> On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer <nsoffer at redhat.com> 
>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Sat, Apr 30, 2016 at 7:16 PM,  <nicolas at devels.es> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> El 2016-04-30 16:55, Nir Soffer escribió:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Apr 30, 2016 at 11:33 AM, Nicolás <nicolas at devels.es> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi Nir,
>>>>>>>>> 
>>>>>>>>> El 29/04/16 a las 22:34, Nir Soffer escribió:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Apr 29, 2016 at 9:17 PM,  <nicolas at devels.es> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> We're running oVirt 3.6.5.3-1 and lately we're experiencing 
>>>>>>>>>>> some
>>>>>>>>>>> issues
>>>>>>>>>>> with
>>>>>>>>>>> some VMs being paused because they're marked as 
>>>>>>>>>>> non-responsive.
>>>>>>>>>>> Mostly,
>>>>>>>>>>> after a few seconds they recover, but we want to debug 
>>>>>>>>>>> precisely
>>>>>>>>>>> this
>>>>>>>>>>> problem so we can fix it consistently.
>>>>>>>>>>> 
>>>>>>>>>>> Our scenario is the following:
>>>>>>>>>>> 
>>>>>>>>>>> ~495 VMs, of which ~120 are constantly up
>>>>>>>>>>> 3 datastores, all of them iSCSI-based:
>>>>>>>>>>>    * ds1: 2T, currently has 276 disks
>>>>>>>>>>>    * ds2: 2T, currently has 179 disks
>>>>>>>>>>>    * ds3: 500G, currently has 65 disks
>>>>>>>>>>> 7 hosts: All have mostly the same hardware. CPU and memory 
>>>>>>>>>>> are
>>>>>>>>>>> currently
>>>>>>>>>>> very lowly used (< 10%).
>>>>>>>>>>> 
>>>>>>>>>>>    ds1 and ds2 are physically the same backend which exports 
>>>>>>>>>>> two
>>>>>>>>>>> 2TB
>>>>>>>>>>> volumes.
>>>>>>>>>>> ds3 is a different storage backend where we're currently 
>>>>>>>>>>> migrating
>>>>>>>>>>> some
>>>>>>>>>>> disks from ds1 and ds2.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> What the the storage backend behind ds1 and 2?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The storage backend for ds1 and ds2 is the iSCSI-based HP 
>>>>>>>>> LeftHand
>>>>>>>>> P4000
>>>>>>>>> G2.
>>>>>>>>> 
>>>>>>>>>>> Usually, when VMs become unresponsive, the whole host where 
>>>>>>>>>>> they
>>>>>>>>>>> run
>>>>>>>>>>> gets
>>>>>>>>>>> unresponsive too, so that gives a hint about the problem, my 
>>>>>>>>>>> bet
>>>>>>>>>>> is
>>>>>>>>>>> the
>>>>>>>>>>> culprit is somewhere on the host side and not on the VMs 
>>>>>>>>>>> side.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Probably the vm became unresponsive because connection to the 
>>>>>>>>>> host
>>>>>>>>>> was
>>>>>>>>>> lost.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I forgot to mention that less commonly we have situations where 
>>>>>>>>> the
>>>>>>>>> host
>>>>>>>>> doesn't get unresponsive but the VMs on it do and they don't 
>>>>>>>>> become
>>>>>>>>> responsive ever again, so we have to forcibly power them off 
>>>>>>>>> and
>>>>>>>>> start
>>>>>>>>> them
>>>>>>>>> on a different host. But in this case the connection with the 
>>>>>>>>> host
>>>>>>>>> doesn't
>>>>>>>>> ever get lost (so basically the host is Up, but any VM run on 
>>>>>>>>> them
>>>>>>>>> is
>>>>>>>>> unresponsive).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> When that
>>>>>>>>>>> happens, the host itself gets non-responsive and only 
>>>>>>>>>>> recoverable
>>>>>>>>>>> after
>>>>>>>>>>> reboot, since it's unable to reconnect.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Piotr, can you check engine log and explain why host is not
>>>>>>>>>> reconnected?
>>>>>>>>>> 
>>>>>>>>>>> I must say this is not specific to
>>>>>>>>>>> this oVirt version, when we were using v.3.6.4 the same 
>>>>>>>>>>> happened,
>>>>>>>>>>> and
>>>>>>>>>>> it's
>>>>>>>>>>> also worthy mentioning we've not done any configuration 
>>>>>>>>>>> changes
>>>>>>>>>>> and
>>>>>>>>>>> everything had been working quite well for a long time.
>>>>>>>>>>> 
>>>>>>>>>>> We were monitoring our ds1 and ds2 physical backend to see
>>>>>>>>>>> performance
>>>>>>>>>>> and
>>>>>>>>>>> we suspect we've run out of IOPS since we're reaching the 
>>>>>>>>>>> maximum
>>>>>>>>>>> specified
>>>>>>>>>>> by the manufacturer, probably at certain times the host 
>>>>>>>>>>> cannot
>>>>>>>>>>> perform
>>>>>>>>>>> a
>>>>>>>>>>> storage operation within some time limit and it marks VMs as
>>>>>>>>>>> unresponsive.
>>>>>>>>>>> That's why we've set up ds3 and we're migrating ds1 and ds2 
>>>>>>>>>>> to
>>>>>>>>>>> ds3.
>>>>>>>>>>> When
>>>>>>>>>>> we
>>>>>>>>>>> run out of space on ds3 we'll create more smaller volumes to 
>>>>>>>>>>> keep
>>>>>>>>>>> migrating.
>>>>>>>>>>> 
>>>>>>>>>>> On the host side, when this happens, we've run repoplot on 
>>>>>>>>>>> the
>>>>>>>>>>> vdsm
>>>>>>>>>>> log
>>>>>>>>>>> and
>>>>>>>>>>> I'm attaching the result. Clearly there's a *huge* LVM 
>>>>>>>>>>> response
>>>>>>>>>>> time
>>>>>>>>>>> (~30
>>>>>>>>>>> secs.).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Indeed the log show very slow vgck and vgs commands - these 
>>>>>>>>>> are
>>>>>>>>>> called
>>>>>>>>>> every
>>>>>>>>>> 5 minutes for checking the vg health and refreshing vdsm lvm 
>>>>>>>>>> cache.
>>>>>>>>>> 
>>>>>>>>>> 1. starting vgck
>>>>>>>>>> 
>>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>>> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) 
>>>>>>>>>> /usr/bin/taskset
>>>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config '
>>>>>>>>>> devices
>>>>>>>>>> { pre
>>>>>>>>>> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
>>>>>>>>>> write_cache_state=0 disable_after_error_count=3 filter = [
>>>>>>>>>> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\
>>>>>>>>>> '', '\''r|.*|'\'' ] }  global {  locking_type=1
>>>>>>>>>> prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  
>>>>>>>>>> backup
>>>>>>>>>> {
>>>>>>>>>> retain_min = 50  retain_days = 0 } ' 5de4a000-a9c4-48
>>>>>>>>>> 9c-8eee-10368647c413 (cwd None)
>>>>>>>>>> 
>>>>>>>>>> 2. vgck ends after 55 seconds
>>>>>>>>>> 
>>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>>> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: 
>>>>>>>>>> <err> =
>>>>>>>>>> '
>>>>>>>>>> WARNING: lvmetad is running but disabled. Restart lvmetad 
>>>>>>>>>> before
>>>>>>>>>> enabling it!\n'; <rc> = 0
>>>>>>>>>> 
>>>>>>>>>> 3. starting vgs
>>>>>>>>>> 
>>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>>> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) 
>>>>>>>>>> /usr/bin/taskset
>>>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config '
>>>>>>>>>> devices
>>>>>>>>>> { pref
>>>>>>>>>> erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
>>>>>>>>>> write_cache_state=0 disable_after_error_count=3 filter = [
>>>>>>>>>> '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'',
>>>>>>>>>> '\''r|.*|'\'' ] }  global {
>>>>>>>>>>   locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>>>>>>>>>> use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } 
>>>>>>>>>> '
>>>>>>>>>> --noheadings --units b --nosuffix --separator '|
>>>>>>>>>> ' --ignoreskippedcluster -o
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>>>>> 5de4a000-a9c4-489c-8eee-10368
>>>>>>>>>> 647c413 (cwd None)
>>>>>>>>>> 
>>>>>>>>>> 4. vgs finished after 37 seconds
>>>>>>>>>> 
>>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>>> 13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: 
>>>>>>>>>> <err> =
>>>>>>>>>> '
>>>>>>>>>> WARNING: lvmetad is running but disabled. Restart lvmetad 
>>>>>>>>>> before
>>>>>>>>>> enabling it!\n'; <rc> = 0
>>>>>>>>>> 
>>>>>>>>>> Zdenek, how do you suggest to debug this slow lvm commands?
>>>>>>>>>> 
>>>>>>>>>> Can you run the following commands on a host in trouble, and 
>>>>>>>>>> on
>>>>>>>>>> some
>>>>>>>>>> other
>>>>>>>>>> hosts in the same timeframe?
>>>>>>>>>> 
>>>>>>>>>> time vgck -vvvv --config ' devices { filter =
>>>>>>>>>> ['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'',
>>>>>>>>>> '\''r|.*|'\'' ] }  global {  locking_type=1
>>>>>>>>>> prioritise_write_locks=1
>>>>>>>>>> wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
>>>>>>>>>> retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>>>> 
>>>>>>>>>> time vgs -vvvv --config ' global { locking_type=1
>>>>>>>>>> prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  
>>>>>>>>>> backup
>>>>>>>>>> {
>>>>>>>>>> retain_min = 50  retain_days = 0 } '
>>>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>>>> 
>>>>>>>>>> Note that I added -vvvv to both commands, this will generate 
>>>>>>>>>> huge
>>>>>>>>>> amount
>>>>>>>>>> of debugging info. Please share the output of these commands.
>>>>>>>>>> 
>>>>>>>>>> You may need to fix the commands. You can always copy and 
>>>>>>>>>> paste
>>>>>>>>>> directly
>>>>>>>>>> from vdsm log into the shell and add the -vvvv flag.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Indeed, there seems to be a big difference on hosts 5 and 6. 
>>>>>>>>> I'm
>>>>>>>>> attaching
>>>>>>>>> the results of the execution of both commands on all hosts. 
>>>>>>>>> Both
>>>>>>>>> commands
>>>>>>>>> show a pretty bigger output on hosts 5 and 6, and also a much 
>>>>>>>>> bigger
>>>>>>>>> execution time. Times are also attached in a file called TIMES.
>>>>>>>>> 
>>>>>>>>>>> Our host storage network is correctly configured and on a 1G
>>>>>>>>>>> interface, no errors on the host itself, switches, etc.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 1G serving about 20-70 vms per host? (495 vms, 120 always up, 
>>>>>>>>>> 7
>>>>>>>>>> hosts)?
>>>>>>>>>> 
>>>>>>>>>> Do you have separate network for management and storage, or 
>>>>>>>>>> both
>>>>>>>>>> use this 1G interface?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yes, we have separate networks for management, storage and 
>>>>>>>>> motion.
>>>>>>>>> Storage
>>>>>>>>> and motion have 1G each (plus, for storage we use a bond of 2
>>>>>>>>> interfaces
>>>>>>>>> in
>>>>>>>>> ALB-mode (6)). Currently no host has more than 30 VMs at a 
>>>>>>>>> time.
>>>>>>>>> 
>>>>>>>>>>> We've also limited storage in QoS to use 10MB/s and 40 IOPS,
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> How did you configure this limit?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In the Data Center tab, I chose our DC and a QoS sub-tab 
>>>>>>>>> appears,
>>>>>>>>> just
>>>>>>>>> like
>>>>>>>>> described here: 
>>>>>>>>> http://www.ovirt.org/documentation/sla/network-qos/
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> but this issue
>>>>>>>>>>> still happens, which leads me to be concerned whether this is 
>>>>>>>>>>> not
>>>>>>>>>>> just
>>>>>>>>>>> an
>>>>>>>>>>> IOPS issue; each host handles about cca. 600 LVs. Could this 
>>>>>>>>>>> be an
>>>>>>>>>>> issue
>>>>>>>>>>> too? I remark the LVM response times are low in normal 
>>>>>>>>>>> conditions
>>>>>>>>>>> (~1-2
>>>>>>>>>>> seconds).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We recommend to limit number lvs per vg to 350. If you have 
>>>>>>>>>> 276
>>>>>>>>>> disks
>>>>>>>>>> on
>>>>>>>>>> ds1, and the disks are using snapshots, you may have too many 
>>>>>>>>>> lvs,
>>>>>>>>>> which
>>>>>>>>>> can cause slowdowns in lvm operations.
>>>>>>>>>> 
>>>>>>>>>> Can you share the output of:
>>>>>>>>>> 
>>>>>>>>>> vdsm-tool dump-volume-chains 
>>>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Ok, right now no VG has more than 350 VGs so I guess this is 
>>>>>>>>> not
>>>>>>>>> currently
>>>>>>>>> the problem. Unfortunately, I run the vdsm-tool command but it
>>>>>>>>> didn't
>>>>>>>>> end
>>>>>>>>> nor provide any output in cca. 1 hour, so I guess it was hanged 
>>>>>>>>> and
>>>>>>>>> I
>>>>>>>>> stopped it. If you confirm this is a normal execution time I 
>>>>>>>>> can
>>>>>>>>> leave
>>>>>>>>> it
>>>>>>>>> running whatever time it takes.
>>>>>>>>> 
>>>>>>>>> host5:~# vgs
>>>>>>>>>   VG                                   #PV #LV #SN Attr   VSize
>>>>>>>>> VFree
>>>>>>>>>   0927c050-6fb6-463c-bb37-8b8da641dcd3   1  63   0 wz--n- 
>>>>>>>>> 499,62g
>>>>>>>>> 206,50g
>>>>>>>>>   5de4a000-a9c4-489c-8eee-10368647c413   1 335   0 wz--n-   
>>>>>>>>> 2,00t
>>>>>>>>> 518,62g
>>>>>>>>>   b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8   1 215   0 wz--n-   
>>>>>>>>> 2,00t
>>>>>>>>> 495,25g
>>>>>>>>>   sys_vg                                 1   6   0 wz--n- 
>>>>>>>>> 136,44g
>>>>>>>>> 122,94g
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I'm attaching the vdsm.log, engine.log and repoplot PDF;
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> This is very useful, thanks. Can you send also the vdsm logs 
>>>>>>>>>> and
>>>>>>>>>> repoplots
>>>>>>>>>> from other hosts for the same timeframe?
>>>>>>>>>> 
>>>>>>>>>>> if someone could
>>>>>>>>>>> give a hint on some additional problems in them and shed some
>>>>>>>>>>> light
>>>>>>>>>>> on
>>>>>>>>>>> the
>>>>>>>>>>> above thoughts I'd be very grateful.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Do you have sar configured on the host? having sar logs can 
>>>>>>>>>> reveal
>>>>>>>>>> more
>>>>>>>>>> info about this timeframe.
>>>>>>>>>> 
>>>>>>>>>> Do you have information about amount of io from vms during 
>>>>>>>>>> this
>>>>>>>>>> timeframe?
>>>>>>>>>> amount of io on the storage backend during this timeframe?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Not currently, but I'll be alert for the time this happens 
>>>>>>>>> again and
>>>>>>>>> I'll
>>>>>>>>> check some sar commands related to I/O and I'll provide 
>>>>>>>>> feedback.
>>>>>>>>> Nevertheless, by the time I run the commands above no machine 
>>>>>>>>> was
>>>>>>>>> unresponsive and I'm still getting such huge execution times. I
>>>>>>>>> tried
>>>>>>>>> running iotop now and I see there are 2 vgck and vgs processes 
>>>>>>>>> with
>>>>>>>>> a
>>>>>>>>> rate
>>>>>>>>> of ~500Kb/s each for reading.
>>>>>>>>> 
>>>>>>>>> root     24446  1.3  0.0  57008  6824 ?        D<   09:15   
>>>>>>>>> 0:00
>>>>>>>>> /usr/sbin/lvm vgs --config  devices { preferred_names =
>>>>>>>>> ["^/dev/mapper/"]
>>>>>>>>> ignore_suspended_devices=1 write_cache_state=0
>>>>>>>>> disable_after_error_count=3
>>>>>>>>> filter = [
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|',
>>>>>>>>> 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1
>>>>>>>>> wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50
>>>>>>>>> retain_days
>>>>>>>>> = 0
>>>>>>>>> }  --noheadings --units b --nosuffix --separator |
>>>>>>>>> --ignoreskippedcluster
>>>>>>>>> -o
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>>>> b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d
>>>>>>>>> 
>>>>>>>>> I'm also attaching the iostat output for host 5 in a file 
>>>>>>>>> called
>>>>>>>>> iostat-host5.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Interesting, all hosts are checking the same storage, but on 
>>>>>>>> host5
>>>>>>>> and
>>>>>>>> host6
>>>>>>>> the output of vgs and vgck is 10 times bigger - this explain why 
>>>>>>>> they
>>>>>>>> take
>>>>>>>> about
>>>>>>>> 10 times longer to run.
>>>>>>>> 
>>>>>>>> $ ls -lh vgs-*
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 221K Apr 30 08:55 vgs-host1
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 239K Apr 30 08:55 vgs-host2
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 228K Apr 30 08:55 vgs-host3
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 288K Apr 30 08:55 vgs-host4
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host5
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host6
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 232K Apr 30 08:55 vgs-host7
>>>>>>>> $ ls -lh vgck-*
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 238K Apr 30 08:55 vgck-host1
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 254K Apr 30 08:55 vgck-host2
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 244K Apr 30 08:55 vgck-host3
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 298K Apr 30 08:55 vgck-host4
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host5
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host6
>>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 248K Apr 30 08:55 vgck-host7
>>>>>>>> 
>>>>>>>> Can you do collect the output of "dmsetup table -v" on host 1, 
>>>>>>>> 5, and
>>>>>>>> 6?
>>>>>>>> 
>>>>>>>> Nir
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Sure, please find attached results of the command. Once again, 
>>>>>>> for
>>>>>>> hosts
>>>>>>> 5
>>>>>>> and 6 the size is 10 times bigger than for host 1.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I think the issue causing slow vgs and vgck commands is stale
>>>>>> lvs on host 5 and 6. This may be the root caused for paused vms, 
>>>>>> but I
>>>>>> cannot
>>>>>> explain how it is related yet.
>>>>>> 
>>>>>> Comparing vgck on host1 and host5 - we can see that vgck opens
>>>>>> 213 /dev/mapper/* devices, actually 53 uniq devices, most of them
>>>>>> are opened 4 times. On host5,  vgck opens 2489 devices (622 uniq).
>>>>>> This explains why the operation takes about 10 times longer.
>>>>>> 
>>>>>> Checking dmsteup table output, we can see that host1 has 53 
>>>>>> devices,
>>>>>> and host5 622 devices.
>>>>>> 
>>>>>> Checking the device open count, host1 has 15 stale devices
>>>>>> (Open count: 0), but host5 has 597 stale devices.
>>>>>> 
>>>>>> Leaving stale devices is a known issue, but we never had evidence 
>>>>>> that
>>>>>> it
>>>>>> cause
>>>>>> trouble except warnings in lvm commands.
>>>>>> 
>>>>>> Please open an ovirt bug about this issue, and include all the 
>>>>>> files
>>>>>> you
>>>>>> sent
>>>>>> so far in this thread, and all the vdsm logs on host5.
>>>>>> 
>>>>>> To remove the stale devices, you can do:
>>>>>> 
>>>>>> for name in `dmsetup info -c -o open,name | awk '/ 0 / {print 
>>>>>> $2}'`; do
>>>>>>     echo "removing stale device: $name"
>>>>>>     dmsetup remove $name
>>>>>> done
>>>>> 
>>>>> 
>>>>> 
>>>>> Note that this cleanup is safe only in maintenance mode, since it 
>>>>> will
>>>>> also remove
>>>>> the mapping for vdsm special lvs (ids, leases, inbox, outbox, 
>>>>> master,
>>>>> metatada)
>>>>> that are accessed by vdsm regularly, and are not stale.
>>>>> 
>>>>> This is also not safe if you use hosted engine, hosted engine agent
>>>>> should maintain
>>>>> its vg.
>>>>> 
>>>>> The safest way to remove stale devices is to restart vdsm - it
>>>>> deactivates all unused
>>>>> lvs (except special lvs) during startup.
>>>>> 
>>>> 
>>>> Hm, I run it without even putting the host on maintenance, but I 
>>>> will
>>>> next
>>>> time. Fortunately it seems it didn't do any harm and everything is
>>>> working.
>>>> I'll probably do a script that will put host on maintenance, cleanup 
>>>> or
>>>> restart vdsm and start again (this is not a hosted engine) if the 
>>>> number
>>>> of
>>>> stale devices is bigger than a threshold.
>>> 
>>> 
>>> I would not do anything automatic at this point except monitoring 
>>> this
>>> issue.
>>> 
>>> For cleanup - you have 2 options:
>>> 
>>> - host is up - restart vdsm
>>> - host in maintenance - remove stale devices
>>> 
>>> Based on local testing, I think the issue is this:
>>> 
>>> - When connecting to storage, all lvs becomes active - this may be a
>>> change
>>>   in lvm during el 7 development that we did not notice.
>>> 
>>> - Each time vdsm is restarted, all active lvs that should not be 
>>> active
>>> are
>>>   deactivated during vdsm startup
>>> 
>>> - When deactivating storage domain, vdsm does not deactivate the lvs
>>>    leaving stale devices.
>>> 
>>> Can you confirm that works like this on your system?
>>> 
>>> To test this:
>>> 
>> 
>> I did these tests on host5. Before switching the host to maintenance, 
>> I run
>> the vgck command and execution time was: 0m1.347s. Current stale 
>> devices at
>> that time were:
>> 
>> 5de4a000--a9c4--489c--8eee--10368647c413-master
>> 5de4a000--a9c4--489c--8eee--10368647c413-outbox
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata
>> 5de4a000--a9c4--489c--8eee--10368647c413-ids
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-leases
>> 5de4a000--a9c4--489c--8eee--10368647c413-inbox
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-ids
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox
>> 5de4a000--a9c4--489c--8eee--10368647c413-leases
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-master
>> 5de4a000--a9c4--489c--8eee--10368647c413-metadata
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-ids
>> 
>> (seems there were not any new stale devices since yesterday after 
>> running
>> the loop).
> 
> Expected
> 
> - vdsm access master, metadata, inbox and outbox from time to time, 
> they must
>   be active when host is up
> - sanlock access ids when host is up - this lv should have open count 
> of 1
> 
>> 
>>> 1. Put one host to maintenance
>>> 
>>> You should see some stale devices - the special lvs
>>> (ids, leases, master, metadata, inbox, outbox)
>>> 
>>> And probably 2 OVF_STORE lvs per vg (128m lvs used to store
>>> vm ovf)
>>> 
>> 
>> Still had the same devices as above.
> 
> Expected bug - we don't cleanup lvs properly when entering maintenance.
> 
>> 
>>> 2. Cleanup the stale devices
>>> 
>> 
>> After cleaning the stale devices, those remained:
>> 
>> 36000eb3a4f1acbc20000000000000043
>> 360014056f0dc8930d744f83af8ddc709
>> 36000eb3a4f1acbc200000000000000b9
> 
> Expected bug.
> 
> These are the multipath mappings, running multipath -F should remove 
> them.
> 
> If we clean the lvs properly when deactivating domain during 
> maintenance, these
> mappings will be automatically removed.
> 
>>> 3. Activate host
>>> 
>>> You should see again about 600 stale devices, and lvm commands
>>> will probably run much slower as you reported.
>>> 
>> 
>> Current stale devices were these:
>> 
>> 5de4a000--a9c4--489c--8eee--10368647c413-master
>> 5de4a000--a9c4--489c--8eee--10368647c413-outbox
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-leases
>> 5de4a000--a9c4--489c--8eee--10368647c413-inbox
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox
>> 5de4a000--a9c4--489c--8eee--10368647c413-leases
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-master
>> 5de4a000--a9c4--489c--8eee--10368647c413-metadata
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox
>> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox
>> 0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox
> 
> Strange, on my test setup (RHEL 7.2), *all* the lvs in the vg get
> activated.
> 
> Did you check the output of lvs?
> 
> pvscan --cache
> lvs
> 
> I guess you will see about 600 lvs active (-wi-a-----)
> 
> Maybe you did not wait until host connected to all storage domains?
> 

Ok, that was the problem, I didn't run the 'pvscan --cache' command 
prior to see the output. After activating the host again, I see 620 LVs. 
However, in my case all LVs except the special ones are inactive:

host5:~# lvscan | grep -v inactive
   ACTIVE            '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/metadata' 
[512,00 MiB] inherit
   ACTIVE            '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/outbox' 
[128,00 MiB] inherit
   ACTIVE            '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/leases' 
[2,00 GiB] inherit
   ACTIVE            '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/ids' 
[128,00 MiB] inherit
   ACTIVE            '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/inbox' 
[128,00 MiB] inherit
   ACTIVE            '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/master' 
[1,00 GiB] inherit
   ACTIVE            '/dev/sys_vg/audit' [512,00 MiB] inherit
   ACTIVE            '/dev/sys_vg/tmp' [1,00 GiB] inherit
   ACTIVE            '/dev/sys_vg/var' [4,00 GiB] inherit
   ACTIVE            '/dev/sys_vg/home' [1,00 GiB] inherit
   ACTIVE            '/dev/sys_vg/root' [6,00 GiB] inherit
   ACTIVE            '/dev/sys_vg/swap' [1,00 GiB] inherit
   ACTIVE            '/dev/5de4a000-a9c4-489c-8eee-10368647c413/metadata' 
[512,00 MiB] inherit
   ACTIVE            '/dev/5de4a000-a9c4-489c-8eee-10368647c413/outbox' 
[128,00 MiB] inherit
   ACTIVE            '/dev/5de4a000-a9c4-489c-8eee-10368647c413/leases' 
[2,00 GiB] inherit
   ACTIVE            '/dev/5de4a000-a9c4-489c-8eee-10368647c413/ids' 
[128,00 MiB] inherit
   ACTIVE            '/dev/5de4a000-a9c4-489c-8eee-10368647c413/inbox' 
[128,00 MiB] inherit
   ACTIVE            '/dev/5de4a000-a9c4-489c-8eee-10368647c413/master' 
[1,00 GiB] inherit
   ACTIVE            '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/metadata' 
[512,00 MiB] inherit
   ACTIVE            '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/outbox' 
[128,00 MiB] inherit
   ACTIVE            '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/leases' 
[2,00 GiB] inherit
   ACTIVE            '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/ids' 
[128,00 MiB] inherit
   ACTIVE            '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/inbox' 
[128,00 MiB] inherit
   ACTIVE            '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/master' 
[1,00 GiB] inherit

Others are all 'inactive':

   inactive          
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/d256c965-25dd-40f2-aeae-4a3ae10befcf' 
[3,12 GiB] inherit
   inactive          
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/6cfafa3a-c335-47bf-805e-bddb850dc914' 
[3,12 GiB] inherit
   inactive          
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/0d60a745-6952-42e7-8bcf-a12d36b2c39e' 
[5,25 GiB] inherit
   inactive          
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/38d94545-b4b0-4384-ae3a-fa26c40e40e3' 
[5,25 GiB] inherit

I tested this on both hosts (5 and 6) with same result.

>>> 4. Restart vdsm
>>> 
>>> All the regular lvs should be inactive now, no stale devices
>>> should exists, except the special lvs (expected)
>>> 
>> 
>> Same result as above. The strange thing is that now I don't see any 
>> VGs
>> running the 'vgs' command but the system's:
>> 
>> host5:~# vgs
>>   VG     #PV #LV #SN Attr   VSize   VFree
>>   sys_vg   1   6   0 wz--n- 136,44g 122,94g
> 
> Vdsm bypass lvmetads, so all lvm commands from the shell show incorrect
> results. You must update lvmetad about changes made by vdsm using
> 
> pvscan --cache
> vgs
> 
> Or using vdsm way:
> 
> vgs --config 'global {use_lvmetad=0}'
> 

(Same here, after using 'pvscan --cache' I can see the VGs).

>> However, machines running on that host have no issues and storage is 
>> working
>> well. vgscan doesn't do the trick either. I tested this on host6 too 
>> and the
>> same happens: After reactivating host I cannot see VGs with the vgs 
>> command
>> - however, everything seems to work quite well.
>> 
>> This is CentOS Linux release 7.2.1511 FWIW.
> 
> Nir