On Tue, Oct 8, 2019 at 4:06 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello,
I'm doing some tests related to storage latency or problems manually created to debug and manage reactions of hosts and VMs.
What is the subsystem/process/daemon responsible to pause a VM when problems arise on storage for the host where the VM is running?
How is determined the timeout to use to put the VM in pause mode?
Sometimes I see after clearing the problems that the VM is automatically un-paused, sometimes no: how is this managed? Are there any counters so that if VM has been paused and and problems are not solved in a certain timeframe the unpause can be done only manually by the sysadmin?

Thanks in advance,
Gianluca


I have noticed that when virtual disk is virtio, the VM is not able to be unpaused in storage unreachable for many seconds, while if I have virtio-scsi and set high virtual disk timeout (like vSphere does on VMs when vmware tools have been installed), then VM is able to be resumed.

The udev rule I have put into a CentOS 7 VM inside /etc/udev/rules.d/99-ovirt.rules is this one

# Set timeout of virtio-SCSI disks to 180 secons like vSphere vmware tools
#
ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="QEMU*", ATTRS{model}=="QEMU HARDDISK*", ENV{DEVTYPE}=="disk", RUN+="/bin/sh -c 'echo 180 > /sys$DEVPATH/device/timeout'"

What I have not understood is if it is possible to prevent at all vdsm (is it the responsible?) to suddenly put the VM in paused state.
Eg for experiment I have iSCSI based storage domains and put this in multipath.conf

devices {
    device {
        all_devs                yes
# Set timeout of queuing of 5*28 = 140 seconds
# similar to vSphere APD timeout
#        no_path_retry           fail
        no_path_retry           28
        polling_interval            5
    }

Then I create an iptables rule that for 100 seconds prevents host to reach storage and a dd task that writes on disk inside VM
The effect is that vm is paused and after about 100 seconds 

VM mydbsrv has recovered from paused back to up. 10/9/19 1:59:02 PM
VM mydbsrv has been paused due to storage I/O problem. 10/9/19 1:57:32 PM
VM mydbsrv has been paused. 10/9/19 1:57:32 PM

Any hint on how to prevent action of pausing the VM? 

Thanks,
Gianluca