On Thu, Oct 10, 2019 at 9:56 AM Francesco Romani <fromani@redhat.com> wrote:

The only way Vdsm will not pause the VM is if libvirt+qemu never reports any ioerror, which is something I'm not sure is possible and that I'd never recommend anyway.

Vdsm always tries hard to be super-careful with respect possible data corruption.


OK.
In case of storage not accessible for a bunch of seconds is more a matter of I/O blocked than data corruption.
If no other host powers on the VM I think there is no risk of data corruption itself, or at least no more than when you have a physical server and for some reason the I/O operations to its physical disks (local or on a SAN) are blocked for some tens of seconds.
The host could ever do a poweroff of the VM itself, instead of leaving control to the underlying libvirt+qemu 

I see that by default the qemu-kvm process in my oVirt 4.3.6 is spawned for every disk with the options:
...,werror=stop,rerror=stop,...

Only for the ide channel of the CD device I have:
...,werror=report,rerror=report,readonly=on

and the manual page for qemu-kvm tells:

           werror=action,rerror=action
               Specify which action to take on write and read errors. Valid actions are: "ignore"
               (ignore the error and try to continue), "stop" (pause QEMU), "report" (report the
               error to the guest), "enospc" (pause QEMU only if the host disk is full; report
               the error to the guest otherwise).  The default setting is werror=enospc and
               rerror=report.
 
So I think that if I want in any way to modify behavior I have to change the options so that I keep "report" for both write and read errors on virtual disks.

I'm only experimenting to see possible different options to manage "temporary" problems at storage level, that often resolve without manual actions in tens of seconds, sometimes due to uncorrect operations at levels managed by other teams (network, storage, ecc).
In these circumstances experience told me it is better to "do nothing and wait", instead of trying to taking any action that anyway will fail until the "external" problem has been solved (automatically, thanks to logic outside oVirt control, or manually).

It would be nice to "mimic" the behavior of vSphere in this sense and I'm investigating possible actions to reach it...

Hope I clarified a bit the origin of my actions...
Thanks,
Gianluca