You can always check the queue_if_no_path  multipath.conf option and give it a try.

But if your system that is queue-ing get's rebooted - that will be data loss -> use on your own risk.

Don't forget that the higher  in I/O chain you go - the higher the timeout is needed, so your VM should also use multipath with that option, in addition to the host.

Still, we can't help you if you use that feature  and you loose data.

Best Regards,
Strahil Nikolov

On Oct 10, 2019 10:55, Francesco Romani <fromani@redhat.com> wrote:
On 10/10/19 9:07 AM, Gianluca Cecchi wrote:

> How is determined the timeout to use to put the VM in pause mode?


The VM is paused immediately as soon as libvirt, through QEMU, reports
IOError, to avoid data corruption. Now, when libvirt reports this error

depends laregly on the timeout set for the storage configuration, which
is done at host level, using system tools (e.g. it is not a Vdsm tunable)


For test I have set this in multipath.conf of host:

devices {
    device {
        all_devs                yes
# Set timeout of queuing of 5*28 = 140 seconds
# similar to vSphere APD timeout
#        no_path_retry           fail
        no_path_retry           28
        polling_interval            5
    }

So it should wait at least 140 seconds before passing error to upper layer correct?


AFAICT yes




> Sometimes I see after clearing the problems that the VM is
> automatically un-paused, sometimes no: how is this managed?


I noticed that if I set disk as virtio-scsi (it seems virtio has no timeout definable and passes suddenly the error to upper layer) and disk timeout of vm disk (through udev rule) to 180 seconds, I can block access to the storage for example for 100 seconds and the host is able to reinstate paths and then vm is always unpaused.
But I would like to prevent VM from pausing at all
What else to tweak?


The only way Vdsm will not pause the VM is if libvirt+qemu never reports any ioerror, which is something I'm not sure is possible and that I'd never recommend anyway.

Vdsm always tries hard to be super-careful with respect possible data corruption.


Bests,


-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh