VM has been paused due to storage I/O problem

31 Jan 2017

      Hello,
my test environment is composed by 2 old HP blades BL685c G1 (ovmsrv05 and
ovmsrv06) and they are connected in a SAN with FC-switches to an old IBM
DS4700 storage array.
Apart from being old, they seem all ok from an hw point of view.
I have configured oVirt 4.0.6 and an FCP storage domain.
The hosts are plain CentOS 7.3 servers fully updated.
It is not an hosted engine environment: the manager is a vm outside of the
cluster.
I have configured power mgmt on both and it works good.

I have at the moment  only one VM for test and it is doing quite nothing.

Starting point: ovmsrv05 is in maintenance (since about 2 days) and the VM
is running on ovmsrv06.
I update qemu-kvm package on ovmsrv05 and then I restart it from web admin
gui:
Power Mgmt --> Restart

Sequence of events in pane and the problem in subject:
Jan 31, 2017 10:29:43 AM Host ovmsrv05 power management was verified
successfully.
Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was set to Up.
Jan 31, 2017 10:29:38 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05 initiated by
admin@internal-authz.
Jan 31, 2017 10:28:05 AM VM ol65 has recovered from paused back to up.
Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to storage I/O problem.
Jan 31, 2017 10:27:55 AM VM ol65 has been paused.
Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by
admin@internal-authz.
Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by admin@internal-authz.
Jan 31, 2017 10:25:52 AM Power management start of Host ovmsrv05 succeeded.
Jan 31, 2017 10:25:50 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:37 AM Executing power management start on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:37 AM Power management start of Host ovmsrv05 initiated.
Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05 was started.
Jan 31, 2017 10:25:37 AM All VMs' status on Non Responsive Host ovmsrv05
were changed to 'Down' by admin@internal-authz
Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by admin@internal-authz.
Jan 31, 2017 10:25:36 AM Power management stop of Host ovmsrv05 succeeded.
Jan 31, 2017 10:25:34 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:15 AM Executing power management stop on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:15 AM Power management stop of Host ovmsrv05 initiated.
Jan 31, 2017 10:25:12 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.

Watching the timestamps, the culprit seems the reboot time of ovmsrv05 that
detects some LUNs in owned state and other ones in unowned
Full messages of both hosts here:
https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU/view?usp=sharin...
and
https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdXTms/view?usp=sharin...

At this time there are 4 LUNs globally seen by the two hosts but only 1 of
them is currently configured as the only storage domain in oVirt cluster.

[root@ovmsrv05 ~]# multipath -l | grep ^36
3600a0b8000299aa80000d08b55014119 dm-5 IBM     ,1814      FAStT
3600a0b80002999020000cd3c5501458f dm-3 IBM     ,1814      FAStT
3600a0b80002999020000ccf855011198 dm-2 IBM     ,1814      FAStT
3600a0b8000299aa80000d08955014098 dm-4 IBM     ,1814      FAStT

the configured one:
[root@ovmsrv05 ~]# multipath -l 3600a0b8000299aa80000d08b55014119
3600a0b8000299aa80000d08b55014119 dm-5 IBM     ,1814      FAStT
size=4.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| |- 0:0:1:3 sdl 8:176 active undef running
| `- 2:0:1:3 sdp 8:240 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  |- 0:0:0:3 sdd 8:48  active undef running
  `- 2:0:0:3 sdi 8:128 active undef running

In mesages of booting node, arounf the problem registered by the storage:
[root@ovmsrv05 ~]# grep owned /var/log/messages
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:1: rdac: LUN 1 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac: LUN 2 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac: LUN 3 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac: LUN 1 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac: LUN 4 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac: LUN 2 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac: LUN 1 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac: LUN 3 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac: LUN 4 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac: LUN 2 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac: LUN 1 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac: LUN 3 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac: LUN 2 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac: LUN 4 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac: LUN 3 (RDAC) (owned)
Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac: LUN 4 (RDAC) (owned)

I don't know exactly the meaning of owned/unowned in the output above..
Possibly it detects the 0:0:1:3 and 2:0:1:3 paths (those of the active
group) as "owned" and this could have created problems with the active node?

On active node strangely I don't loose all the paths, but the VM has been
paused anyway

[root@ovmsrv06 log]# grep "remaining active path" /var/log/messages
Jan 31 10:27:48 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 3
Jan 31 10:27:49 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 2
Jan 31 10:27:56 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 3
Jan 31 10:27:56 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 2
Jan 31 10:27:56 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 1
Jan 31 10:27:57 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 2
Jan 31 10:28:01 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 3
Jan 31 10:28:01 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 4

I'm not an expert of this storage array in particular, and of the rdac
hardware handler in general.

What I see is that multipath.conf on both nodes:

# VDSM REVISION 1.3

defaults {
    polling_interval            5
    no_path_retry               fail
    user_friendly_names         no
    flush_on_last_del           yes
    fast_io_fail_tmo            5
    dev_loss_tmo                30
    max_fds                     4096
}

devices {
    device {
        # These settings overrides built-in devices settings. It does not
apply
        # to devices without built-in settings (these use the settings in
the
        # "defaults" section), or to devices defined in the "devices"
section.
        # Note: This is not available yet on Fedora 21. For more info see
        # https://bugzilla.redhat.com/1253799
        all_devs                yes
        no_path_retry           fail
    }
}

beginning of /proc/scsi/scsi

[root@ovmsrv06 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 01 Id: 00 Lun: 00
  Vendor: HP       Model: LOGICAL VOLUME   Rev: 1.86
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 00 Lun: 01
  Vendor: IBM      Model: 1814      FAStT  Rev: 0916
  Type:   Direct-Access                    ANSI  SCSI revision: 05
...

To get default acquired config for this storage:

multpathd -k
...
show config
I can see:

        device {
                vendor "IBM"
                product "^1814"
                product_blacklist "Universal Xport"
                path_grouping_policy "group_by_prio"
                path_checker "rdac"
                features "0"
                hardware_handler "1 rdac"
                prio "rdac"
                failback immediate
                rr_weight "uniform"
                no_path_retry "fail"
        }

and

defaults {
        verbosity 2
        polling_interval 5
        max_polling_interval 20
        reassign_maps "yes"
        multipath_dir "/lib64/multipath"
        path_selector "service-time 0"
        path_grouping_policy "failover"
        uid_attribute "ID_SERIAL"
        prio "const"
        prio_args ""
        features "0"
        path_checker "directio"
        alias_prefix "mpath"
        failback "manual"
        rr_min_io 1000
        rr_min_io_rq 1
        max_fds 4096
        rr_weight "uniform"
        no_path_retry "fail"
        queue_without_daemon "no"
        flush_on_last_del "yes"
        user_friendly_names "no"
        fast_io_fail_tmo 5
        dev_loss_tmo 30
        bindings_file "/etc/multipath/bindings"
        wwids_file /etc/multipath/wwids
        log_checker_err always
        find_multipaths no
        retain_attached_hw_handler no
        detect_prio no
        hw_str_match no
        force_sync no
        deferred_remove no
        ignore_new_boot_devs no
        skip_kpartx no
        config_dir "/etc/multipath/conf.d"
        delay_watch_checks no
        delay_wait_checks no
        retrigger_tries 3
        retrigger_delay 10
        missing_uev_wait_timeout 30
        new_bindings_in_boot no
}

Any hint on how to tune multipath.conf so that a powering on server doesn't
create problems to running VMs?

Thanks in advance,
Gianluca

Gianluca Cecchi

Nathanaël Blanchet

Gianluca Cecchi

Nir Soffer

Gianluca Cecchi

Gianluca Cecchi

Nir Soffer

Gianluca Cecchi

Yaniv Kaul

Gianluca Cecchi

Nir Soffer

Gianluca Cecchi

Nir Soffer

Gianluca Cecchi

Nir Soffer

Gianluca Cecchi

Nir Soffer

Nir Soffer

Gianluca Cecchi

Benjamin Marzinski

Gianluca Cecchi

Benjamin Marzinski

Michael Watters

tags

participants (6)