On Sun, May 1, 2016 at 3:31 PM, <nicolas(a)devels.es> wrote:
> El 2016-04-30 23:22, Nir Soffer escribió:
>>
>> On Sun, May 1, 2016 at 12:48 AM, <nicolas(a)devels.es> wrote:
>>>
>>> El 2016-04-30 22:37, Nir Soffer escribió:
>>>>
>>>>
>>>> On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer <nsoffer(a)redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Sat, Apr 30, 2016 at 7:16 PM, <nicolas(a)devels.es> wrote:
>>>>>>
>>>>>>
>>>>>> El 2016-04-30 16:55, Nir Soffer escribió:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Apr 30, 2016 at 11:33 AM, Nicolás
<nicolas(a)devels.es>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Nir,
>>>>>>>>
>>>>>>>> El 29/04/16 a las 22:34, Nir Soffer escribió:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Apr 29, 2016 at 9:17 PM,
<nicolas(a)devels.es> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We're running oVirt 3.6.5.3-1 and lately
we're experiencing
>>>>>>>>>> some
>>>>>>>>>> issues
>>>>>>>>>> with
>>>>>>>>>> some VMs being paused because they're marked
as
>>>>>>>>>> non-responsive.
>>>>>>>>>> Mostly,
>>>>>>>>>> after a few seconds they recover, but we want to
debug
>>>>>>>>>> precisely
>>>>>>>>>> this
>>>>>>>>>> problem so we can fix it consistently.
>>>>>>>>>>
>>>>>>>>>> Our scenario is the following:
>>>>>>>>>>
>>>>>>>>>> ~495 VMs, of which ~120 are constantly up
>>>>>>>>>> 3 datastores, all of them iSCSI-based:
>>>>>>>>>> * ds1: 2T, currently has 276 disks
>>>>>>>>>> * ds2: 2T, currently has 179 disks
>>>>>>>>>> * ds3: 500G, currently has 65 disks
>>>>>>>>>> 7 hosts: All have mostly the same hardware. CPU
and memory
>>>>>>>>>> are
>>>>>>>>>> currently
>>>>>>>>>> very lowly used (< 10%).
>>>>>>>>>>
>>>>>>>>>> ds1 and ds2 are physically the same backend
which exports
>>>>>>>>>> two
>>>>>>>>>> 2TB
>>>>>>>>>> volumes.
>>>>>>>>>> ds3 is a different storage backend where
we're currently
>>>>>>>>>> migrating
>>>>>>>>>> some
>>>>>>>>>> disks from ds1 and ds2.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What the the storage backend behind ds1 and 2?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The storage backend for ds1 and ds2 is the iSCSI-based HP
>>>>>>>> LeftHand
>>>>>>>> P4000
>>>>>>>> G2.
>>>>>>>>
>>>>>>>>>> Usually, when VMs become unresponsive, the whole
host where
>>>>>>>>>> they
>>>>>>>>>> run
>>>>>>>>>> gets
>>>>>>>>>> unresponsive too, so that gives a hint about the
problem, my
>>>>>>>>>> bet
>>>>>>>>>> is
>>>>>>>>>> the
>>>>>>>>>> culprit is somewhere on the host side and not on
the VMs
>>>>>>>>>> side.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Probably the vm became unresponsive because
connection to the
>>>>>>>>> host
>>>>>>>>> was
>>>>>>>>> lost.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I forgot to mention that less commonly we have situations
where
>>>>>>>> the
>>>>>>>> host
>>>>>>>> doesn't get unresponsive but the VMs on it do and
they don't
>>>>>>>> become
>>>>>>>> responsive ever again, so we have to forcibly power them
off
>>>>>>>> and
>>>>>>>> start
>>>>>>>> them
>>>>>>>> on a different host. But in this case the connection with
the
>>>>>>>> host
>>>>>>>> doesn't
>>>>>>>> ever get lost (so basically the host is Up, but any VM
run on
>>>>>>>> them
>>>>>>>> is
>>>>>>>> unresponsive).
>>>>>>>>
>>>>>>>>
>>>>>>>>>> When that
>>>>>>>>>> happens, the host itself gets non-responsive and
only
>>>>>>>>>> recoverable
>>>>>>>>>> after
>>>>>>>>>> reboot, since it's unable to reconnect.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Piotr, can you check engine log and explain why host
is not
>>>>>>>>> reconnected?
>>>>>>>>>
>>>>>>>>>> I must say this is not specific to
>>>>>>>>>> this oVirt version, when we were using v.3.6.4
the same
>>>>>>>>>> happened,
>>>>>>>>>> and
>>>>>>>>>> it's
>>>>>>>>>> also worthy mentioning we've not done any
configuration
>>>>>>>>>> changes
>>>>>>>>>> and
>>>>>>>>>> everything had been working quite well for a long
time.
>>>>>>>>>>
>>>>>>>>>> We were monitoring our ds1 and ds2 physical
backend to see
>>>>>>>>>> performance
>>>>>>>>>> and
>>>>>>>>>> we suspect we've run out of IOPS since
we're reaching the
>>>>>>>>>> maximum
>>>>>>>>>> specified
>>>>>>>>>> by the manufacturer, probably at certain times
the host
>>>>>>>>>> cannot
>>>>>>>>>> perform
>>>>>>>>>> a
>>>>>>>>>> storage operation within some time limit and it
marks VMs as
>>>>>>>>>> unresponsive.
>>>>>>>>>> That's why we've set up ds3 and we're
migrating ds1 and ds2
>>>>>>>>>> to
>>>>>>>>>> ds3.
>>>>>>>>>> When
>>>>>>>>>> we
>>>>>>>>>> run out of space on ds3 we'll create more
smaller volumes to
>>>>>>>>>> keep
>>>>>>>>>> migrating.
>>>>>>>>>>
>>>>>>>>>> On the host side, when this happens, we've
run repoplot on
>>>>>>>>>> the
>>>>>>>>>> vdsm
>>>>>>>>>> log
>>>>>>>>>> and
>>>>>>>>>> I'm attaching the result. Clearly there's
a *huge* LVM
>>>>>>>>>> response
>>>>>>>>>> time
>>>>>>>>>> (~30
>>>>>>>>>> secs.).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Indeed the log show very slow vgck and vgs commands -
these
>>>>>>>>> are
>>>>>>>>> called
>>>>>>>>> every
>>>>>>>>> 5 minutes for checking the vg health and refreshing
vdsm lvm
>>>>>>>>> cache.
>>>>>>>>>
>>>>>>>>> 1. starting vgck
>>>>>>>>>
>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd)
>>>>>>>>> /usr/bin/taskset
>>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck
--config '
>>>>>>>>> devices
>>>>>>>>> { pre
>>>>>>>>> ferred_names = ["^/dev/mapper/"]
ignore_suspended_devices=1
>>>>>>>>> write_cache_state=0 disable_after_error_count=3
filter = [
>>>>>>>>>
'\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\
>>>>>>>>> '', '\''r|.*|'\'' ] }
global { locking_type=1
>>>>>>>>> prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 }
>>>>>>>>> backup
>>>>>>>>> {
>>>>>>>>> retain_min = 50 retain_days = 0 } '
5de4a000-a9c4-48
>>>>>>>>> 9c-8eee-10368647c413 (cwd None)
>>>>>>>>>
>>>>>>>>> 2. vgck ends after 55 seconds
>>>>>>>>>
>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd)
SUCCESS:
>>>>>>>>> <err> =
>>>>>>>>> '
>>>>>>>>> WARNING: lvmetad is running but disabled. Restart
lvmetad
>>>>>>>>> before
>>>>>>>>> enabling it!\n'; <rc> = 0
>>>>>>>>>
>>>>>>>>> 3. starting vgs
>>>>>>>>>
>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd)
>>>>>>>>> /usr/bin/taskset
>>>>>>>>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs
--config '
>>>>>>>>> devices
>>>>>>>>> { pref
>>>>>>>>> erred_names = ["^/dev/mapper/"]
ignore_suspended_devices=1
>>>>>>>>> write_cache_state=0 disable_after_error_count=3
filter = [
>>>>>>>>>
'\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'',
>>>>>>>>> '\''r|.*|'\'' ] } global {
>>>>>>>>> locking_type=1 prioritise_write_locks=1
wait_for_locks=1
>>>>>>>>> use_lvmetad=0 } backup { retain_min = 50
retain_days = 0 }
>>>>>>>>> '
>>>>>>>>> --noheadings --units b --nosuffix --separator '|
>>>>>>>>> ' --ignoreskippedcluster -o
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>>>> 5de4a000-a9c4-489c-8eee-10368
>>>>>>>>> 647c413 (cwd None)
>>>>>>>>>
>>>>>>>>> 4. vgs finished after 37 seconds
>>>>>>>>>
>>>>>>>>> Thread-96::DEBUG::2016-04-29
>>>>>>>>> 13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd)
SUCCESS:
>>>>>>>>> <err> =
>>>>>>>>> '
>>>>>>>>> WARNING: lvmetad is running but disabled. Restart
lvmetad
>>>>>>>>> before
>>>>>>>>> enabling it!\n'; <rc> = 0
>>>>>>>>>
>>>>>>>>> Zdenek, how do you suggest to debug this slow lvm
commands?
>>>>>>>>>
>>>>>>>>> Can you run the following commands on a host in
trouble, and
>>>>>>>>> on
>>>>>>>>> some
>>>>>>>>> other
>>>>>>>>> hosts in the same timeframe?
>>>>>>>>>
>>>>>>>>> time vgck -vvvv --config ' devices { filter =
>>>>>>>>>
['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'',
>>>>>>>>> '\''r|.*|'\'' ] } global {
locking_type=1
>>>>>>>>> prioritise_write_locks=1
>>>>>>>>> wait_for_locks=1 use_lvmetad=0 } backup {
retain_min = 50
>>>>>>>>> retain_days = 0 } '
5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>>>
>>>>>>>>> time vgs -vvvv --config ' global {
locking_type=1
>>>>>>>>> prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 }
>>>>>>>>> backup
>>>>>>>>> {
>>>>>>>>> retain_min = 50 retain_days = 0 } '
>>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>>>
>>>>>>>>> Note that I added -vvvv to both commands, this will
generate
>>>>>>>>> huge
>>>>>>>>> amount
>>>>>>>>> of debugging info. Please share the output of these
commands.
>>>>>>>>>
>>>>>>>>> You may need to fix the commands. You can always copy
and
>>>>>>>>> paste
>>>>>>>>> directly
>>>>>>>>> from vdsm log into the shell and add the -vvvv flag.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Indeed, there seems to be a big difference on hosts 5 and
6.
>>>>>>>> I'm
>>>>>>>> attaching
>>>>>>>> the results of the execution of both commands on all
hosts.
>>>>>>>> Both
>>>>>>>> commands
>>>>>>>> show a pretty bigger output on hosts 5 and 6, and also a
much
>>>>>>>> bigger
>>>>>>>> execution time. Times are also attached in a file called
TIMES.
>>>>>>>>
>>>>>>>>>> Our host storage network is correctly configured
and on a 1G
>>>>>>>>>> interface, no errors on the host itself,
switches, etc.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1G serving about 20-70 vms per host? (495 vms, 120
always up,
>>>>>>>>> 7
>>>>>>>>> hosts)?
>>>>>>>>>
>>>>>>>>> Do you have separate network for management and
storage, or
>>>>>>>>> both
>>>>>>>>> use this 1G interface?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, we have separate networks for management, storage
and
>>>>>>>> motion.
>>>>>>>> Storage
>>>>>>>> and motion have 1G each (plus, for storage we use a bond
of 2
>>>>>>>> interfaces
>>>>>>>> in
>>>>>>>> ALB-mode (6)). Currently no host has more than 30 VMs at
a
>>>>>>>> time.
>>>>>>>>
>>>>>>>>>> We've also limited storage in QoS to use
10MB/s and 40 IOPS,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How did you configure this limit?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> In the Data Center tab, I chose our DC and a QoS sub-tab
>>>>>>>> appears,
>>>>>>>> just
>>>>>>>> like
>>>>>>>> described here:
>>>>>>>>
http://www.ovirt.org/documentation/sla/network-qos/
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> but this issue
>>>>>>>>>> still happens, which leads me to be concerned
whether this is
>>>>>>>>>> not
>>>>>>>>>> just
>>>>>>>>>> an
>>>>>>>>>> IOPS issue; each host handles about cca. 600 LVs.
Could this
>>>>>>>>>> be an
>>>>>>>>>> issue
>>>>>>>>>> too? I remark the LVM response times are low in
normal
>>>>>>>>>> conditions
>>>>>>>>>> (~1-2
>>>>>>>>>> seconds).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We recommend to limit number lvs per vg to 350. If
you have
>>>>>>>>> 276
>>>>>>>>> disks
>>>>>>>>> on
>>>>>>>>> ds1, and the disks are using snapshots, you may have
too many
>>>>>>>>> lvs,
>>>>>>>>> which
>>>>>>>>> can cause slowdowns in lvm operations.
>>>>>>>>>
>>>>>>>>> Can you share the output of:
>>>>>>>>>
>>>>>>>>> vdsm-tool dump-volume-chains
>>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, right now no VG has more than 350 VGs so I guess this
is
>>>>>>>> not
>>>>>>>> currently
>>>>>>>> the problem. Unfortunately, I run the vdsm-tool command
but it
>>>>>>>> didn't
>>>>>>>> end
>>>>>>>> nor provide any output in cca. 1 hour, so I guess it was
hanged
>>>>>>>> and
>>>>>>>> I
>>>>>>>> stopped it. If you confirm this is a normal execution
time I
>>>>>>>> can
>>>>>>>> leave
>>>>>>>> it
>>>>>>>> running whatever time it takes.
>>>>>>>>
>>>>>>>> host5:~# vgs
>>>>>>>> VG #PV #LV #SN Attr
VSize
>>>>>>>> VFree
>>>>>>>> 0927c050-6fb6-463c-bb37-8b8da641dcd3 1 63 0 wz--n-
>>>>>>>> 499,62g
>>>>>>>> 206,50g
>>>>>>>> 5de4a000-a9c4-489c-8eee-10368647c413 1 335 0 wz--n-
>>>>>>>> 2,00t
>>>>>>>> 518,62g
>>>>>>>> b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8 1 215 0 wz--n-
>>>>>>>> 2,00t
>>>>>>>> 495,25g
>>>>>>>> sys_vg 1 6 0 wz--n-
>>>>>>>> 136,44g
>>>>>>>> 122,94g
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'm attaching the vdsm.log, engine.log and
repoplot PDF;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is very useful, thanks. Can you send also the
vdsm logs
>>>>>>>>> and
>>>>>>>>> repoplots
>>>>>>>>> from other hosts for the same timeframe?
>>>>>>>>>
>>>>>>>>>> if someone could
>>>>>>>>>> give a hint on some additional problems in them
and shed some
>>>>>>>>>> light
>>>>>>>>>> on
>>>>>>>>>> the
>>>>>>>>>> above thoughts I'd be very grateful.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you have sar configured on the host? having sar
logs can
>>>>>>>>> reveal
>>>>>>>>> more
>>>>>>>>> info about this timeframe.
>>>>>>>>>
>>>>>>>>> Do you have information about amount of io from vms
during
>>>>>>>>> this
>>>>>>>>> timeframe?
>>>>>>>>> amount of io on the storage backend during this
timeframe?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Not currently, but I'll be alert for the time this
happens
>>>>>>>> again and
>>>>>>>> I'll
>>>>>>>> check some sar commands related to I/O and I'll
provide
>>>>>>>> feedback.
>>>>>>>> Nevertheless, by the time I run the commands above no
machine
>>>>>>>> was
>>>>>>>> unresponsive and I'm still getting such huge
execution times. I
>>>>>>>> tried
>>>>>>>> running iotop now and I see there are 2 vgck and vgs
processes
>>>>>>>> with
>>>>>>>> a
>>>>>>>> rate
>>>>>>>> of ~500Kb/s each for reading.
>>>>>>>>
>>>>>>>> root 24446 1.3 0.0 57008 6824 ? D<
09:15
>>>>>>>> 0:00
>>>>>>>> /usr/sbin/lvm vgs --config devices { preferred_names =
>>>>>>>> ["^/dev/mapper/"]
>>>>>>>> ignore_suspended_devices=1 write_cache_state=0
>>>>>>>> disable_after_error_count=3
>>>>>>>> filter = [
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|',
>>>>>>>> 'r|.*|' ] } global { locking_type=1
prioritise_write_locks=1
>>>>>>>> wait_for_locks=1 use_lvmetad=0 } backup { retain_min =
50
>>>>>>>> retain_days
>>>>>>>> = 0
>>>>>>>> } --noheadings --units b --nosuffix --separator |
>>>>>>>> --ignoreskippedcluster
>>>>>>>> -o
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>>> b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d
>>>>>>>>
>>>>>>>> I'm also attaching the iostat output for host 5 in a
file
>>>>>>>> called
>>>>>>>> iostat-host5.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Interesting, all hosts are checking the same storage, but on
>>>>>>> host5
>>>>>>> and
>>>>>>> host6
>>>>>>> the output of vgs and vgck is 10 times bigger - this explain
why
>>>>>>> they
>>>>>>> take
>>>>>>> about
>>>>>>> 10 times longer to run.
>>>>>>>
>>>>>>> $ ls -lh vgs-*
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 221K Apr 30 08:55 vgs-host1
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 239K Apr 30 08:55 vgs-host2
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 228K Apr 30 08:55 vgs-host3
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 288K Apr 30 08:55 vgs-host4
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host5
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host6
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 232K Apr 30 08:55 vgs-host7
>>>>>>> $ ls -lh vgck-*
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 238K Apr 30 08:55 vgck-host1
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 254K Apr 30 08:55 vgck-host2
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 244K Apr 30 08:55 vgck-host3
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 298K Apr 30 08:55 vgck-host4
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host5
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host6
>>>>>>> -rw-r--r--. 1 nsoffer nsoffer 248K Apr 30 08:55 vgck-host7
>>>>>>>
>>>>>>> Can you do collect the output of "dmsetup table -v"
on host 1,
>>>>>>> 5, and
>>>>>>> 6?
>>>>>>>
>>>>>>> Nir
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sure, please find attached results of the command. Once again,
>>>>>> for
>>>>>> hosts
>>>>>> 5
>>>>>> and 6 the size is 10 times bigger than for host 1.
>>>>>
>>>>>
>>>>>
>>>>> I think the issue causing slow vgs and vgck commands is stale
>>>>> lvs on host 5 and 6. This may be the root caused for paused vms,
>>>>> but I
>>>>> cannot
>>>>> explain how it is related yet.
>>>>>
>>>>> Comparing vgck on host1 and host5 - we can see that vgck opens
>>>>> 213 /dev/mapper/* devices, actually 53 uniq devices, most of them
>>>>> are opened 4 times. On host5, vgck opens 2489 devices (622 uniq).
>>>>> This explains why the operation takes about 10 times longer.
>>>>>
>>>>> Checking dmsteup table output, we can see that host1 has 53
>>>>> devices,
>>>>> and host5 622 devices.
>>>>>
>>>>> Checking the device open count, host1 has 15 stale devices
>>>>> (Open count: 0), but host5 has 597 stale devices.
>>>>>
>>>>> Leaving stale devices is a known issue, but we never had evidence
>>>>> that
>>>>> it
>>>>> cause
>>>>> trouble except warnings in lvm commands.
>>>>>
>>>>> Please open an ovirt bug about this issue, and include all the
>>>>> files
>>>>> you
>>>>> sent
>>>>> so far in this thread, and all the vdsm logs on host5.
>>>>>
>>>>> To remove the stale devices, you can do:
>>>>>
>>>>> for name in `dmsetup info -c -o open,name | awk '/ 0 / {print
>>>>> $2}'`; do
>>>>> echo "removing stale device: $name"
>>>>> dmsetup remove $name
>>>>> done
>>>>
>>>>
>>>>
>>>> Note that this cleanup is safe only in maintenance mode, since it
>>>> will
>>>> also remove
>>>> the mapping for vdsm special lvs (ids, leases, inbox, outbox,
>>>> master,
>>>> metatada)
>>>> that are accessed by vdsm regularly, and are not stale.
>>>>
>>>> This is also not safe if you use hosted engine, hosted engine agent
>>>> should maintain
>>>> its vg.
>>>>
>>>> The safest way to remove stale devices is to restart vdsm - it
>>>> deactivates all unused
>>>> lvs (except special lvs) during startup.
>>>>
>>>
>>> Hm, I run it without even putting the host on maintenance, but I
>>> will
>>> next
>>> time. Fortunately it seems it didn't do any harm and everything is
>>> working.
>>> I'll probably do a script that will put host on maintenance, cleanup
>>> or
>>> restart vdsm and start again (this is not a hosted engine) if the
>>> number
>>> of
>>> stale devices is bigger than a threshold.
>>
>>
>> I would not do anything automatic at this point except monitoring
>> this
>> issue.
>>
>> For cleanup - you have 2 options:
>>
>> - host is up - restart vdsm
>> - host in maintenance - remove stale devices
>>
>> Based on local testing, I think the issue is this:
>>
>> - When connecting to storage, all lvs becomes active - this may be a
>> change
>> in lvm during el 7 development that we did not notice.
>>
>> - Each time vdsm is restarted, all active lvs that should not be
>> active
>> are
>> deactivated during vdsm startup
>>
>> - When deactivating storage domain, vdsm does not deactivate the lvs
>> leaving stale devices.
>>
>> Can you confirm that works like this on your system?
>>
>> To test this:
>>
>
> I did these tests on host5. Before switching the host to maintenance,
> I run
> the vgck command and execution time was: 0m1.347s. Current stale
> devices at
> that time were:
>
> 5de4a000--a9c4--489c--8eee--10368647c413-master
> 5de4a000--a9c4--489c--8eee--10368647c413-outbox
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata
> 5de4a000--a9c4--489c--8eee--10368647c413-ids
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-leases
> 5de4a000--a9c4--489c--8eee--10368647c413-inbox
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-ids
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox
> 5de4a000--a9c4--489c--8eee--10368647c413-leases
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-master
> 5de4a000--a9c4--489c--8eee--10368647c413-metadata
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-ids
>
> (seems there were not any new stale devices since yesterday after
> running
> the loop).
Expected
- vdsm access master, metadata, inbox and outbox from time to time,
they must
be active when host is up
- sanlock access ids when host is up - this lv should have open count
of 1
>
>> 1. Put one host to maintenance
>>
>> You should see some stale devices - the special lvs
>> (ids, leases, master, metadata, inbox, outbox)
>>
>> And probably 2 OVF_STORE lvs per vg (128m lvs used to store
>> vm ovf)
>>
>
> Still had the same devices as above.
Expected bug - we don't cleanup lvs properly when entering maintenance.
>
>> 2. Cleanup the stale devices
>>
>
> After cleaning the stale devices, those remained:
>
> 36000eb3a4f1acbc20000000000000043
> 360014056f0dc8930d744f83af8ddc709
> 36000eb3a4f1acbc200000000000000b9
Expected bug.
These are the multipath mappings, running multipath -F should remove
them.
If we clean the lvs properly when deactivating domain during
maintenance, these
mappings will be automatically removed.
>> 3. Activate host
>>
>> You should see again about 600 stale devices, and lvm commands
>> will probably run much slower as you reported.
>>
>
> Current stale devices were these:
>
> 5de4a000--a9c4--489c--8eee--10368647c413-master
> 5de4a000--a9c4--489c--8eee--10368647c413-outbox
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-leases
> 5de4a000--a9c4--489c--8eee--10368647c413-inbox
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox
> 5de4a000--a9c4--489c--8eee--10368647c413-leases
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-master
> 5de4a000--a9c4--489c--8eee--10368647c413-metadata
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox
> b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox
> 0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox
Strange, on my test setup (RHEL 7.2), *all* the lvs in the vg get
activated.
Did you check the output of lvs?
pvscan --cache
lvs
I guess you will see about 600 lvs active (-wi-a-----)
Maybe you did not wait until host connected to all storage domains?
Ok, that was the problem, I didn't run the 'pvscan --cache' command
prior to see the output. After activating the host again, I see 620 LVs.
However, in my case all LVs except the special ones are inactive:
host5:~# lvscan | grep -v inactive
ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/metadata'
[512,00 MiB] inherit
ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/outbox'
[128,00 MiB] inherit
ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/leases'
[2,00 GiB] inherit
ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/ids'
[128,00 MiB] inherit
ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/inbox'
[128,00 MiB] inherit
ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/master'
[1,00 GiB] inherit
ACTIVE '/dev/sys_vg/audit' [512,00 MiB] inherit
ACTIVE '/dev/sys_vg/tmp' [1,00 GiB] inherit
ACTIVE '/dev/sys_vg/var' [4,00 GiB] inherit
ACTIVE '/dev/sys_vg/home' [1,00 GiB] inherit
ACTIVE '/dev/sys_vg/root' [6,00 GiB] inherit
ACTIVE '/dev/sys_vg/swap' [1,00 GiB] inherit
ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/metadata'
[512,00 MiB] inherit
ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/outbox'
[128,00 MiB] inherit
ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/leases'
[2,00 GiB] inherit
ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/ids'
[128,00 MiB] inherit
ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/inbox'
[128,00 MiB] inherit
ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/master'
[1,00 GiB] inherit
ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/metadata'
[512,00 MiB] inherit
ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/outbox'
[128,00 MiB] inherit
ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/leases'
[2,00 GiB] inherit
ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/ids'
[128,00 MiB] inherit
ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/inbox'
[128,00 MiB] inherit
ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/master'
[1,00 GiB] inherit
Others are all 'inactive':
inactive
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/d256c965-25dd-40f2-aeae-4a3ae10befcf'
[3,12 GiB] inherit
inactive
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/6cfafa3a-c335-47bf-805e-bddb850dc914'
[3,12 GiB] inherit
inactive
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/0d60a745-6952-42e7-8bcf-a12d36b2c39e'
[5,25 GiB] inherit
inactive
'/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/38d94545-b4b0-4384-ae3a-fa26c40e40e3'
[5,25 GiB] inherit
I tested this on both hosts (5 and 6) with same result.
>> 4. Restart vdsm
>>
>> All the regular lvs should be inactive now, no stale devices
>> should exists, except the special lvs (expected)
>>
>
> Same result as above. The strange thing is that now I don't see any
> VGs
> running the 'vgs' command but the system's:
>
> host5:~# vgs
> VG #PV #LV #SN Attr VSize VFree
> sys_vg 1 6 0 wz--n- 136,44g 122,94g
Vdsm bypass lvmetads, so all lvm commands from the shell show incorrect
results. You must update lvmetad about changes made by vdsm using
pvscan --cache
vgs
Or using vdsm way:
vgs --config 'global {use_lvmetad=0}'
> However, machines running on that host have no issues and storage
is
> working
> well. vgscan doesn't do the trick either. I tested this on host6 too
> and the
> same happens: After reactivating host I cannot see VGs with the vgs
> command
> - however, everything seems to work quite well.
>
> This is CentOS Linux release 7.2.1511 FWIW.
Nir