On 10/8/19 4:06 PM, Gianluca Cecchi wrote:
Hi Gianluca
> Hello,
> I'm doing some tests related to storage latency or problems manually
> created to debug and manage reactions of hosts and VMs.
> What is the subsystem/process/daemon responsible to pause a VM when
> problems arise on storage for the host where the VM is running?
It's Vdsm itself.
ok
> How is determined the timeout to use to put the VM in pause mode?
The VM is paused immediately as soon as libvirt, through QEMU, reports
IOError, to avoid data corruption. Now, when libvirt reports this error
depends laregly on the timeout set for the storage configuration, which
is done at host level, using system tools (e.g. it is not a Vdsm tunable)
For test I have set this in multipath.conf of host:
devices {
device {
all_devs yes
# Set timeout of queuing of 5*28 = 140 seconds
# similar to vSphere APD timeout
# no_path_retry fail
no_path_retry 28
polling_interval 5
}
So it should wait at least 140 seconds before passing error to upper layer correct?
> Sometimes I see after clearing the problems that the VM is
> automatically un-paused, sometimes no: how is this managed?
I noticed that if I set disk as virtio-scsi (it seems virtio has no timeout definable and passes suddenly the error to upper layer) and disk timeout of vm disk (through udev rule) to 180 seconds, I can block access to the storage for example for 100 seconds and the host is able to reinstate paths and then vm is always unpaused.
But I would like to prevent VM from pausing at all
What else to tweak?
Thanks,
Gianluca