[ovirt-users] VM has been paused due to storage I/O problem

Tue Jan 31 14:20:01 UTC 2017

Hello,
my test environment is composed by 2 old HP blades BL685c G1 (ovmsrv05 and
ovmsrv06) and they are connected in a SAN with FC-switches to an old IBM
DS4700 storage array.
Apart from being old, they seem all ok from an hw point of view.
I have configured oVirt 4.0.6 and an FCP storage domain.
The hosts are plain CentOS 7.3 servers fully updated.
It is not an hosted engine environment: the manager is a vm outside of the
cluster.
I have configured power mgmt on both and it works good.

I have at the moment  only one VM for test and it is doing quite nothing.

Starting point: ovmsrv05 is in maintenance (since about 2 days) and the VM
is running on ovmsrv06.
I update qemu-kvm package on ovmsrv05 and then I restart it from web admin
gui:
Power Mgmt --> Restart

Sequence of events in pane and the problem in subject:
Jan 31, 2017 10:29:43 AM Host ovmsrv05 power management was verified
successfully.
Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was set to Up.
Jan 31, 2017 10:29:38 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05 initiated by
admin at internal-authz.
Jan 31, 2017 10:28:05 AM VM ol65 has recovered from paused back to up.
Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to storage I/O problem.
Jan 31, 2017 10:27:55 AM VM ol65 has been paused.
Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by
admin at internal-authz.
Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by admin at internal-authz.
Jan 31, 2017 10:25:52 AM Power management start of Host ovmsrv05 succeeded.
Jan 31, 2017 10:25:50 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:37 AM Executing power management start on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:37 AM Power management start of Host ovmsrv05 initiated.
Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05 was started.
Jan 31, 2017 10:25:37 AM All VMs' status on Non Responsive Host ovmsrv05
were changed to 'Down' by admin at internal-authz
Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by admin at internal-authz.
Jan 31, 2017 10:25:36 AM Power management stop of Host ovmsrv05 succeeded.
Jan 31, 2017 10:25:34 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:15 AM Executing power management stop on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:15 AM Power management stop of Host ovmsrv05 initiated.
Jan 31, 2017 10:25:12 AM Executing power management status on Host ovmsrv05
using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.

Watching the timestamps, the culprit seems the reboot time of ovmsrv05 that
detects some LUNs in owned state and other ones in unowned
Full messages of both hosts here:
https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU/view?usp=sharing
and
https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdXTms/view?usp=sharing

At this time there are 4 LUNs globally seen by the two hosts but only 1 of
them is currently configured as the only storage domain in oVirt cluster.

[root at ovmsrv05 ~]# multipath -l | grep ^36
3600a0b8000299aa80000d08b55014119 dm-5 IBM     ,1814      FAStT
3600a0b80002999020000cd3c5501458f dm-3 IBM     ,1814      FAStT
3600a0b80002999020000ccf855011198 dm-2 IBM     ,1814      FAStT
3600a0b8000299aa80000d08955014098 dm-4 IBM     ,1814      FAStT

the configured one:
[root at ovmsrv05 ~]# multipath -l 3600a0b8000299aa80000d08b55014119
3600a0b8000299aa80000d08b55014119 dm-5 IBM     ,1814      FAStT
size=4.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| |- 0:0:1:3 sdl 8:176 active undef running
| `- 2:0:1:3 sdp 8:240 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  |- 0:0:0:3 sdd 8:48  active undef running
  `- 2:0:0:3 sdi 8:128 active undef running

In mesages of booting node, arounf the problem registered by the storage:
[root at ovmsrv05 ~]# grep owned /var/log/messages
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:1: rdac: LUN 1 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac: LUN 2 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac: LUN 3 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac: LUN 1 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac: LUN 4 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac: LUN 2 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac: LUN 1 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac: LUN 3 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac: LUN 4 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac: LUN 2 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac: LUN 1 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac: LUN 3 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac: LUN 2 (RDAC) (unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac: LUN 4 (RDAC) (owned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac: LUN 3 (RDAC) (owned)
Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac: LUN 4 (RDAC) (owned)

I don't know exactly the meaning of owned/unowned in the output above..
Possibly it detects the 0:0:1:3 and 2:0:1:3 paths (those of the active
group) as "owned" and this could have created problems with the active node?

On active node strangely I don't loose all the paths, but the VM has been
paused anyway

[root at ovmsrv06 log]# grep "remaining active path" /var/log/messages
Jan 31 10:27:48 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 3
Jan 31 10:27:49 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 2
Jan 31 10:27:56 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 3
Jan 31 10:27:56 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 2
Jan 31 10:27:56 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 1
Jan 31 10:27:57 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 2
Jan 31 10:28:01 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 3
Jan 31 10:28:01 ovmsrv06 multipathd: 3600a0b8000299aa80000d08b55014119:
remaining active paths: 4

I'm not an expert of this storage array in particular, and of the rdac
hardware handler in general.

What I see is that multipath.conf on both nodes:

# VDSM REVISION 1.3

defaults {
    polling_interval            5
    no_path_retry               fail
    user_friendly_names         no
    flush_on_last_del           yes
    fast_io_fail_tmo            5
    dev_loss_tmo                30
    max_fds                     4096
}

devices {
    device {
        # These settings overrides built-in devices settings. It does not
apply
        # to devices without built-in settings (these use the settings in
the
        # "defaults" section), or to devices defined in the "devices"
section.
        # Note: This is not available yet on Fedora 21. For more info see
        # https://bugzilla.redhat.com/1253799
        all_devs                yes
        no_path_retry           fail
    }
}

beginning of /proc/scsi/scsi

[root at ovmsrv06 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 01 Id: 00 Lun: 00
  Vendor: HP       Model: LOGICAL VOLUME   Rev: 1.86
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 00 Lun: 01
  Vendor: IBM      Model: 1814      FAStT  Rev: 0916
  Type:   Direct-Access                    ANSI  SCSI revision: 05
...

To get default acquired config for this storage:

multpathd -k
> show config

I can see:

        device {
                vendor "IBM"
                product "^1814"
                product_blacklist "Universal Xport"
                path_grouping_policy "group_by_prio"
                path_checker "rdac"
                features "0"
                hardware_handler "1 rdac"
                prio "rdac"
                failback immediate
                rr_weight "uniform"
                no_path_retry "fail"
        }

and

defaults {
        verbosity 2
        polling_interval 5
        max_polling_interval 20
        reassign_maps "yes"
        multipath_dir "/lib64/multipath"
        path_selector "service-time 0"
        path_grouping_policy "failover"
        uid_attribute "ID_SERIAL"
        prio "const"
        prio_args ""
        features "0"
        path_checker "directio"
        alias_prefix "mpath"
        failback "manual"
        rr_min_io 1000
        rr_min_io_rq 1
        max_fds 4096
        rr_weight "uniform"
        no_path_retry "fail"
        queue_without_daemon "no"
        flush_on_last_del "yes"
        user_friendly_names "no"
        fast_io_fail_tmo 5
        dev_loss_tmo 30
        bindings_file "/etc/multipath/bindings"
        wwids_file /etc/multipath/wwids
        log_checker_err always
        find_multipaths no
        retain_attached_hw_handler no
        detect_prio no
        hw_str_match no
        force_sync no
        deferred_remove no
        ignore_new_boot_devs no
        skip_kpartx no
        config_dir "/etc/multipath/conf.d"
        delay_watch_checks no
        delay_wait_checks no
        retrigger_tries 3
        retrigger_delay 10
        missing_uev_wait_timeout 30
        new_bindings_in_boot no
}

Any hint on how to tune multipath.conf so that a powering on server doesn't
create problems to running VMs?

Thanks in advance,
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170131/cecb1b03/attachment.html>