[ovirt-users] Re: owner of vm paused/unpaused operation

10 Oct 2019

      On 10/10/19 10:44 AM, Gianluca Cecchi wrote:
...
On Thu, Oct 10, 2019 at 9:56 AM Francesco Romani <fromani@redhat.com 
<mailto:fromani@redhat.com>> wrote:
The only way Vdsm will not pause the VM is if libvirt+qemu never
    reports any ioerror, which is something I'm not sure is possible
    and that I'd never recommend anyway.
Vdsm always tries hard to be super-careful with respect possible
    data corruption.
OK.
In case of storage not accessible for a bunch of seconds is more a 
matter of I/O blocked than data corruption.
True, but we can know only ex-poste that the storage was just 
temporarily unavailable, don't we?
...
If no other host powers on the VM I think there is no risk of data 
corruption itself, or at least no more than when you have a physical 
server and for some reason the I/O operations to its physical disks 
(local or on a SAN) are blocked for some tens of seconds.
IMO, a storage unresponsive for tens of seconds is something which 
should be uncommon and very alarming in every circumstances, especially 
for physical servers.

What i'm trying to say is that yes, there probabily are ways to sidestep 
this behaviour, but I think this is the wrong direction and adds 
fragility rather than convenience to the system.
...
The host could ever do a poweroff of the VM itself, instead of leaving 
control to the underlying libvirt+qemu
I see that by default the qemu-kvm process in my oVirt 4.3.6 is 
spawned for every disk with the options:
...,werror=stop,rerror=stop,...
Only for the ide channel of the CD device I have:
...,werror=report,rerror=report,readonly=on
and the manual page for qemu-kvm tells:
           werror=action,rerror=action
               Specify which action to take on write and read errors. 
Valid actions are: "ignore"
               (ignore the error and try to continue), "stop" (pause 
QEMU), "report" (report the
               error to the guest), "enospc" (pause QEMU only if the 
host disk is full; report
               the error to the guest otherwise).  The default setting 
is werror=enospc and
               rerror=report.
So I think that if I want in any way to modify behavior I have to 
change the options so that I keep "report" for both write and read 
errors on virtual disks.
Yep. I don't remember what Engine allows. Worst case you can use an 
hook, but once again this is making things a bit more fragile.
...
I'm only experimenting to see possible different options to manage 
"temporary" problems at storage level, that often resolve without 
manual actions in tens of seconds, sometimes due to uncorrect 
operations at levels managed by other teams (network, storage, ecc).
I think the best option is improve the current behaviour: learn why Vdsm 
fails to unpause the VM and improve here.

-- 
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh