[ovirt-users] VM has been paused due to storage I/O problem
Gianluca Cecchi
gianluca.cecchi at gmail.com
Tue Jan 31 16:09:16 UTC 2017
On Tue, Jan 31, 2017 at 3:23 PM, Nathanaël Blanchet <blanchet at abes.fr>
wrote:
> exactly the same issue by there with FC EMC domain storage...
>
>
I'm trying to mitigate inserting a timeout for my SAN devices but I'm not
sure of its effectiveness as CentOS 7 behavior of "multipathd -k" and then
"show config" seems different from CentOS 6.x
In fact my attempt for multipath.conf is this
# VDSM REVISION 1.3
# VDSM PRIVATE
defaults {
polling_interval 5
no_path_retry fail
user_friendly_names no
flush_on_last_del yes
fast_io_fail_tmo 5
dev_loss_tmo 30
max_fds 4096
}
# Remove devices entries when overrides section is available.
devices {
device {
# These settings overrides built-in devices settings. It does not
apply
# to devices without built-in settings (these use the settings in
the
# "defaults" section), or to devices defined in the "devices"
section.
# Note: This is not available yet on Fedora 21. For more info see
# https://bugzilla.redhat.com/1253799
all_devs yes
no_path_retry fail
}
device {
vendor "IBM"
product "^1814"
product_blacklist "Universal Xport"
path_grouping_policy "group_by_prio"
path_checker "rdac"
features "0"
hardware_handler "1 rdac"
prio "rdac"
failback immediate
rr_weight "uniform"
no_path_retry "12"
}
}
So I put exactly the default device config for my IBM/1814 device but
no_path_retry set to 12.
In CentOS 6.x when you do something like this, "show config" gives you the
modified entry only for your device section.
Instead in CentOS 7.3 it seems I get anyway the default one for IBM/1814
and also the customized one at the end of the output....
Two facts:
- before I could reproduce the problem if I selected
Maintenance
Power Mgmt ---> Restart
(tried 3 times with same behavior)
Instead if I executed in separate steps
Maintenance
Power Mgmt --> Stop
wait a moment
Power Mgmt --> Start
I didn't get problems (tried only one time...)
With this "new" multipath config (to be confirmed if in effect, how?) I
don't get the VM paused problem even with Restart option of Power Mgmt
In active host messages I see these ones when the other reboots:
Jan 31 16:50:01 ovmsrv06 systemd: Started Session 705 of user root.
Jan 31 16:50:01 ovmsrv06 systemd: Starting Session 705 of user root.
Jan 31 16:53:47 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sde
- rdac checker reports path is up
Jan 31 16:53:47 ovmsrv06 multipathd: 8:64: reinstated
Jan 31 16:53:47 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098:
load table [0 41943040 multipath 1 queue_if_no_path 1 rdac 2 1 service-time
0 2 1 8:224 1 65:0 1 service-time 0 2 1 8:64 1 8:160 1]
Jan 31 16:53:47 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sdo
- rdac checker reports path is ghost
Jan 31 16:53:47 ovmsrv06 multipathd: 8:224: reinstated
Jan 31 16:53:47 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sdk
- rdac checker reports path is up
Jan 31 16:53:47 ovmsrv06 multipathd: 8:160: reinstated
Jan 31 16:53:47 ovmsrv06 kernel: sd 0:0:1:4: rdac: array Z1_DS4700, ctlr 1,
queueing MODE_SELECT command
Jan 31 16:53:47 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sdq
- rdac checker reports path is ghost
Jan 31 16:53:47 ovmsrv06 multipathd: 65:0: reinstated
Jan 31 16:53:48 ovmsrv06 kernel: sd 0:0:1:4: rdac: array Z1_DS4700, ctlr 1,
MODE_SELECT returned with sense 05/91/36
Jan 31 16:53:48 ovmsrv06 kernel: sd 0:0:1:4: rdac: array Z1_DS4700, ctlr 1,
queueing MODE_SELECT command
Jan 31 16:53:49 ovmsrv06 kernel: sd 0:0:1:4: rdac: array Z1_DS4700, ctlr 1,
MODE_SELECT returned with sense 05/91/36
Jan 31 16:53:49 ovmsrv06 kernel: sd 0:0:1:4: rdac: array Z1_DS4700, ctlr 1,
queueing MODE_SELECT command
Jan 31 16:53:49 ovmsrv06 kernel: sd 0:0:1:4: rdac: array Z1_DS4700, ctlr 1,
MODE_SELECT completed
Jan 31 16:53:49 ovmsrv06 kernel: sd 2:0:1:4: rdac: array Z1_DS4700, ctlr 1,
queueing MODE_SELECT command
Jan 31 16:53:49 ovmsrv06 kernel: sd 2:0:1:4: rdac: array Z1_DS4700, ctlr 1,
MODE_SELECT completed
Jan 31 16:53:52 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sde
- rdac checker reports path is ghost
Jan 31 16:53:52 ovmsrv06 multipathd: 8:64: reinstated
Jan 31 16:53:52 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098:
load table [0 41943040 multipath 1 queue_if_no_path 1 rdac 2 1 service-time
0 2 1 8:224 1 65:0 1 service-time 0 2 1 8:64 1 8:160 1]
Jan 31 16:53:52 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sdo
- rdac checker reports path is up
Jan 31 16:53:52 ovmsrv06 multipathd: 8:224: reinstated
Jan 31 16:53:52 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sdk
- rdac checker reports path is ghost
Jan 31 16:53:52 ovmsrv06 multipathd: 8:160: reinstated
Jan 31 16:53:52 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098: sdq
- rdac checker reports path is up
Jan 31 16:53:52 ovmsrv06 multipathd: 65:0: reinstated
But they are not related to the multipath device dedicated to oVirt storage
domain in this case....
What lets me be optimistic seems the difference in these lines:
before I got
Jan 31 10:27:47 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098:
load table [0 41943040 multipath 0 1 rdac 2 1 service-time 0 2 1 8:224 1
65:0 1 service-time 0 2 1 8:64 1 8:160 1]
now I get
Jan 31 16:53:52 ovmsrv06 multipathd: 3600a0b8000299aa80000d08955014098:
load table [0 41943040 multipath 1 queue_if_no_path 1 rdac 2 1 service-time
0 2 1 8:224 1 65:0 1 service-time 0 2 1 8:64 1 8:160 1]
multipath 0 1 rdac
vs
multipath 1 queue_if_no_path 1 rdac
Any confirmation?
Thanks in advance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170131/daf97418/attachment.html>
More information about the Users
mailing list