[ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

Yaniv Kaul ykaul at redhat.com
Thu Jan 12 13:03:35 UTC 2017


On Thu, Jan 12, 2017 at 12:02 PM, Mark Greenall <m.greenall at iontrading.com>
wrote:

> Firstly, thanks @Yaniv and thanks @Nir for your responses.
>
> @Yaniv, in answer to this:
>
> >> Why do you have 1 SD per VM?
>
> It's a combination of performance and ease of management. We ran some IO
> tests with various configurations and settled on this one for a balance of
> reduced IO contention and ease of management. If there is a better
> recommended way of handling these then I'm all ears. If you believe having
> a large amount of storage domains adds to the problem then we can also
> review the setup.
>

I don't see how it can improve performance. Having several iSCSI
connections to a (single!) target may help, but certainly not too much.
Just from looking at your /var/log/messages:
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection1:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-37a238a33-4e21185c70857594-uk1-amd-cluster2-template-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection2:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-37a238a33-4e21185c70857594-uk1-amd-cluster2-template-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection3:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-192238a33-1f71185c70b57598-cuuk1ionhurap02-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection4:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-192238a33-1f71185c70b57598-cuuk1ionhurap02-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection5:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-223238a33-7301185c70e57598-cuuk1ionhurdb02-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection6:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-223238a33-7301185c70e57598-cuuk1ionhurdb02-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection7:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-212238a33-2a61185c719576bd-lnd-ion-anv-test-lin-64-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection8:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-212238a33-2a61185c719576bd-lnd-ion-anv-test-lin-64-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection9:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-ad4238a33-1b31185c75157c7e-lnd-ion-lindev-14-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection10:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-ad4238a33-1b31185c75157c7e-lnd-ion-lindev-14-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection11:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-b99479033-9a788b6aa6857d3b-lnd-anv-sup-03-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection12:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-b99479033-9a788b6aa6857d3b-lnd-anv-sup-03-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection13:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-cd9479033-ffc88b6aa6b57d3b-lnd-linsup-02-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection14:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-cd9479033-ffc88b6aa6b57d3b-lnd-linsup-02-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection15:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-db8479033-96f88b6aa6e57d3b-lnd-linsup-03-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection16:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-db8479033-96f88b6aa6e57d3b-lnd-linsup-03-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection17:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-eae479033-f6588b6aa7157d3b-lnd-linsup-04-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection18:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-eae479033-f6588b6aa7157d3b-lnd-linsup-04-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection19:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-fac479033-bf888b6aa7757d3b-lnd-linsup-u01-dstore01,
portal: 10.100.214.77,3260] through [iface: bond1.10] is operational now
Jan 11 15:07:11 uk1-ion-ovm-08 iscsid: Connection20:0 to [target:
iqn.2001-05.com.equallogic:4-42a846-fac479033-bf888b6aa7757d3b-lnd-linsup-u01-dstore01,
portal: 10.100.214.77,3260] through [iface: default] is operational now


1. There is no point in so many connections.
2. Certainly not the same portal - you really should have more.
3. Note that some go via bond1 - and some via 'default' interface. Is that
intended?
4. Your multipath.conf is using rr_min_io - where it should
use rr_min_io_rq most likely.


Unrelated, your engine.log is quite flooded with:
2017-01-11 15:07:46,085 WARN
 [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder]
(DefaultQuartzScheduler9) [31a71bf5] Invalid or unknown guest architecture
type '' received from guest agent

Any idea what kind of guest you are running?


You have a lot of host devices - we have patches to improve their
enumeration (coming in 4.0.7)
Y.


> >> Can you try and disable (mask) the lvmetad service on the hosts and see
> if it improves matters?
>
> Disabled and masked the lvmetad service and tried again this morning. It
> seemed to be less of a load / quicker getting the initial activation of the
> host working but the end result was still the same. Just under 10 minutes
> later the node went non-operational and the cycle began again. By 09:27 we
> had the high CPU load and repeating lvm cycle.
>
> Host Activation: 09:06
> Host Up: 09:08
> Non-Operational: 09:16
> LVM Load: 09:27
> Host Reboot: 09:30
>
> From yesterday and today I've attached messages, sanlock.log and
> multipath.conf files too. Although I'm not sure the messages file will be
> of much use as it looks like log rate limiting kicked in and supressed
> messages for the duration of the process. I'm booted off the kernel with
> debugging but maybe that's generating too much info? Let me know if you
> want me to change anything here to get additional information.
>
> As added configuration information we also have the following settings
> from the Equallogic and Linux install guide:
>
> /etc/sysctl.conf:
>
> # Prevent ARP Flux for multiple NICs on the same subnet:
> net.ipv4.conf.all.arp_ignore = 1
> net.ipv4.conf.all.arp_announce = 2
> # Loosen RP Filter to alow multiple iSCSI connections
> net.ipv4.conf.all.rp_filter = 2
>
>
> And the following /lib/udev/rules.d/99-eqlsd.rules:
>
> #-----------------------------------------------------------
> ------------------
> #  Copyright (c) 2010-2012 by Dell, Inc.
> #
> # All rights reserved.  This software may not be copied, disclosed,
> # transferred, or used except in accordance with a license granted
> # by Dell, Inc.  This software embodies proprietary information
> # and trade secrets of Dell, Inc.
> #
> #-----------------------------------------------------------
> ------------------
> #
> # Various Settings for Dell Equallogic disks based on Dell Optimizing SAN
> Environment for Linux Guide
> #
> # Modify disk scheduler mode to noop
> ACTION=="add|change", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC",
> RUN+="/bin/sh -c 'echo noop > /sys/${DEVPATH}/queue/scheduler'"
> # Modify disk timeout value to 60 seconds
> ACTION!="remove", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC",
> RUN+="/bin/sh -c 'echo 60 > /sys/%p/device/timeout'"
> # Modify read ahead value to 1024
> ACTION!="remove", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC",
> RUN+="/bin/sh -c 'echo 1024 > /sys/${DEVPATH}/bdi/read_ahead_kb'"
>
> Many Thanks,
> Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170112/67b8cae0/attachment-0001.html>


More information about the Users mailing list