[ovirt-users] Ovirt host activation and lvm looping with high CPU load trying to mount iSCSI storage

Thu Jan 12 23:08:59 UTC 2017

On Thu, Jan 12, 2017 at 12:02 PM, Mark Greenall
<m.greenall at iontrading.com> wrote:
> Firstly, thanks @Yaniv and thanks @Nir for your responses.
>
> @Yaniv, in answer to this:
>
>>> Why do you have 1 SD per VM?
>
> It's a combination of performance and ease of management. We ran some IO tests with various configurations and settled on this one for a balance of reduced IO contention and ease of management. If there is a better recommended way of handling these then I'm all ears. If you believe having a large amount of storage domains adds to the problem then we can also review the setup.
>
>>> Can you try and disable (mask) the lvmetad service on the hosts and see if it improves matters?
>
> Disabled and masked the lvmetad service and tried again this morning. It seemed to be less of a load / quicker getting the initial activation of the host working but the end result was still the same. Just under 10 minutes later the node went non-operational and the cycle began again. By 09:27 we had the high CPU load and repeating lvm cycle.
>
> Host Activation: 09:06
> Host Up: 09:08
> Non-Operational: 09:16
> LVM Load: 09:27
> Host Reboot: 09:30
>
> From yesterday and today I've attached messages, sanlock.log and multipath.conf files too. Although I'm not sure the messages file will be of much use as it looks like log rate limiting kicked in and supressed messages for the duration of the process. I'm booted off the kernel with debugging but maybe that's generating too much info? Let me know if you want me to change anything here to get additional information.
>
> As added configuration information we also have the following settings from the Equallogic and Linux install guide:
>
> /etc/sysctl.conf:
>
> # Prevent ARP Flux for multiple NICs on the same subnet:
> net.ipv4.conf.all.arp_ignore = 1
> net.ipv4.conf.all.arp_announce = 2
> # Loosen RP Filter to alow multiple iSCSI connections
> net.ipv4.conf.all.rp_filter = 2
>
>
> And the following /lib/udev/rules.d/99-eqlsd.rules:
>
> #-----------------------------------------------------------------------------
> #  Copyright (c) 2010-2012 by Dell, Inc.
> #
> # All rights reserved.  This software may not be copied, disclosed,
> # transferred, or used except in accordance with a license granted
> # by Dell, Inc.  This software embodies proprietary information
> # and trade secrets of Dell, Inc.
> #
> #-----------------------------------------------------------------------------
> #
> # Various Settings for Dell Equallogic disks based on Dell Optimizing SAN Environment for Linux Guide
> #
> # Modify disk scheduler mode to noop
> ACTION=="add|change", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC", RUN+="/bin/sh -c 'echo noop > /sys/${DEVPATH}/queue/scheduler'"
> # Modify disk timeout value to 60 seconds
> ACTION!="remove", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC", RUN+="/bin/sh -c 'echo 60 > /sys/%p/device/timeout'"

This timeout may cause large timeouts in vdsm in commands accessing
storage, it may cause timeouts in various flows, and may cause your
domain to become inactive - since you set this for all domains, it may
cause the entire host to become non-operational.

I recommend to remove this rule.

> # Modify read ahead value to 1024
> ACTION!="remove", SUBSYSTEM=="block", ATTRS{vendor}=="EQLOGIC", RUN+="/bin/sh -c 'echo 1024 > /sys/${DEVPATH}/bdi/read_ahead_kb'"

In your multipath.conf, I see that you changed lot of the defaults
recommended by ovirt:

defaults {
    deferred_remove             yes
    dev_loss_tmo                30
    fast_io_fail_tmo            5
    flush_on_last_del           yes
    max_fds                     4096
    no_path_retry               fail
    polling_interval            5
    user_friendly_names         no
}

You are using:

defaults {

You are not using "deferred_remove", so you get the default value ("no").
Do you have any reason to change this?

You are not using "dev_loss_tmo", so you get the default value
Do you have any reason to change this?

You are not using "fast_io_fail_tmo", so you will get the default
value  (hopefully 5).
Do you have any reason to change this?

You are not using "flush_on_last_del " - any reason to change this?

       failback                immediate
       max_fds                 8192
       no_path_retry           fail

I guess these are the settings recommended for your storage?

       path_checker            tur
       path_grouping_policy    multibus
       path_selector           "round-robin 0"

       polling_interval        10

This will means multipathd will check paths every 10-40 seconds.
You should use the default 5, which cause multipathd to check every
5-20 seconds.

       rr_min_io               10
       rr_weight               priorities
       user_friendly_names     no
}

Also you are mixing defaults and settings that you need for your specific
devices.

You should leave the default without change, and create a device section
for your device:

devices {
    device {
        vendor XXX
        product YYY

        # ovirt specific settings
        deferred_remove             yes
        dev_loss_tmo                30
        fast_io_fail_tmo            5
        flush_on_last_del           yes
        no_path_retry               fail
        polling_interval            5
        user_friendly_names         no

       # device specific settings
       max_fds                     8192
       path_checker            tur
       path_grouping_policy    multibus
       path_selector           "round-robin 0"
   }

}

Note that you must copy ovirt defaults into the device section, otherwise
you will get multipathd builtin defaults, which are not the same.

Can you share also the output of:

multipath -ll

In this command you can see the name of vendor and product.
Using these names, find the effective configuration of your
multipath devices using this command:

multipathd show config

If the the device is not listed in the output, you are using
the defaults.

Please share here the configuration for your device or the defaults
from the output of multiapthd show config.

For example, here is my test storage:

# multipath -ll
3600140549f3b93968d440ac9129d124f dm-11 LIO-ORG ,target1-12
size=50G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 22:0:0:12 sdi 8:128 active ready running

multipathd does not have any setting for LIO-ORG, so we get the defaults:

defaults {
        verbosity 2
        polling_interval 5
        max_polling_interval 20
        reassign_maps "yes"
        multipath_dir "/lib64/multipath"
        path_selector "service-time 0"
        path_grouping_policy "failover"
        uid_attribute "ID_SERIAL"
        prio "const"
        prio_args ""
        features "0"
        path_checker "directio"
        alias_prefix "mpath"
        failback "manual"
        rr_min_io 1000
        rr_min_io_rq 1
        max_fds 4096
        rr_weight "uniform"
        no_path_retry "fail"
        queue_without_daemon "no"
        flush_on_last_del "yes"
        user_friendly_names "no"
        fast_io_fail_tmo 5
        dev_loss_tmo 30
        bindings_file "/etc/multipath/bindings"
        wwids_file /etc/multipath/wwids
        log_checker_err always
        find_multipaths no
        retain_attached_hw_handler no
        detect_prio no
        hw_str_match no
        force_sync no
        deferred_remove yes
        ignore_new_boot_devs no
        skip_kpartx no
        config_dir "/etc/multipath/conf.d"
        delay_watch_checks no
        delay_wait_checks no
        retrigger_tries 3
        retrigger_delay 10
        missing_uev_wait_timeout 30
        new_bindings_in_boot no
}

Nir