On Thu, Oct 6, 2016 at 10:19 AM, Gary Lloyd <g.lloyd@keele.ac.uk> wrote:
I asked on the Dell Storage Forum and they recommend the following:

I recommend not using a numeric value for the "no_path_retry" variable within /etc/multipath.conf as once that numeric value is reached, if no healthy LUNs were discovered during that defined time multipath will disable the I/O queue altogether.

I do recommend, however, changing the variable value from "12" (or even "60") to "queue" which will then allow multipathd to continue queing I/O until a healthy LUN is discovered (time of fail-over between controllers) and I/O is allowed to flow once again.

Can you see any issues with this recommendation as far as Ovirt is concerned ?

Yes, we cannot work with unlimited queue. This will block vdsm for unlimited
time when the next command try to access storage. Because we don't have
good isolation between different storage domains, this may cause other storage
domains to become faulty. Also engine flows that have a timeout will fail with
a timeout.

If you are on 3.x, this will be very painfull, on 4.0 it should be better, but it is not
recommended.

Nir