you need the eql hit kit to make it work at least somehow better, but
hit kit requires multipathd to be disabled which is an dependency to ovirt.
so far, no real workaround seems to be known
Am 06.10.2016 um 09:19 schrieb Gary Lloyd:
I asked on the Dell Storage Forum and they recommend the following:
/I recommend not using a numeric value for the "no_path_retry" variable
within /etc/multipath.conf as once that numeric value is reached, if no
healthy LUNs were discovered during that defined time multipath will
disable the I/O queue altogether./
/I do recommend, however, changing the variable value from "12" (or even
"60") to "queue" which will then allow multipathd to continue queing
I/O
until a healthy LUN is discovered (time of fail-over between
controllers) and I/O is allowed to flow once again./
Can you see any issues with this recommendation as far as Ovirt is
concerned ?
Thanks again
/Gary Lloyd/
________________________________________________
I.T. Systems:Keele University
Finance & IT Directorate
Keele:Staffs:IC1 Building:ST5 5NB:UK
+44 1782 733063 <tel:%2B44%201782%20733073>
________________________________________________
On 4 October 2016 at 19:11, Nir Soffer <nsoffer(a)redhat.com
<mailto:nsoffer@redhat.com>> wrote:
On Tue, Oct 4, 2016 at 10:51 AM, Gary Lloyd <g.lloyd(a)keele.ac.uk
<mailto:g.lloyd@keele.ac.uk>> wrote:
Hi
We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct
Luns for all our VMs.
At the weekend during early hours an Equallogic controller
failed over to its standby on one of our arrays and this caused
about 20 of our VMs to be paused due to IO problems.
I have also noticed that this happens during Equallogic firmware
upgrades since we moved onto Ovirt 3.65.
As recommended by Dell disk timeouts within the VMs are set to
60 seconds when they are hosted on an EqualLogic SAN.
Is there any other timeout value that we can configure in
vdsm.conf to stop VMs from getting paused when a controller
fails over ?
You can set the timeout in multipath.conf.
With current multipath configuration (deployed by vdsm), when all
paths to a device
are lost (e.g. you take down all ports on the server during
upgrade), all io will fail
immediately.
If you want to allow 60 seconds gracetime in such case, you can
configure:
no_path_retry 12
This will continue to monitor the paths 12 times, each 5 seconds
(assuming polling_interval=5). If some path recover during this
time, the io
can complete and the vm will not be paused.
If no path is available after these retries, io will fail and vms
with pending io
will pause.
Note that this will also cause delays in vdsm in various flows,
increasing the chance
of timeouts in engine side, or delays in storage domain monitoring.
However, the 60 seconds delay is expected only on the first time all
paths become
faulty. Once the timeout has expired, any access to the device will
fail immediately.
To configure this, you must add the # VDSM PRIVATE tag at the second
line of
multipath.conf, otherwise vdsm will override your configuration in
the next time
you run vdsm-tool configure.
multipath.conf should look like this:
# VDSM REVISION 1.3
# VDSM PRIVATE
defaults {
polling_interval 5
no_path_retry 12
user_friendly_names no
flush_on_last_del yes
fast_io_fail_tmo 5
dev_loss_tmo 30
max_fds 4096
}
devices {
device {
all_devs yes
no_path_retry 12
}
}
This will use 12 retries (60 seconds) timeout for any device. If you
like to
configure only your specific device, you can add a device section for
your specific server instead.
Also is there anything that we can tweak to automatically
unpause the VMs once connectivity with the arrays is
re-established ?
Vdsm will resume the vms when storage monitor detect that storage
became available again.
However we cannot guarantee that storage monitoring will detect that
storage was down.
This should be improved in 4.0.
At the moment we are running a customized version of
storageServer.py, as Ovirt has yet to include iscsi multipath
support for Direct Luns out of the box.
Would you like to share this code?
Nir
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users