On 10/8/19 4:06 PM, Gianluca Cecchi wrote:
Hi Gianluca
Hello,
I'm doing some tests related to storage latency or problems manually
created to debug and manage reactions of hosts and VMs.
What is the subsystem/process/daemon responsible to pause a VM when
problems arise on storage for the host where the VM is running?
It's Vdsm itself.
How is determined the timeoutĀ to use to put the VM in pause mode?
The VM is paused immediately as soon as libvirt, through QEMU, reports
IOError, to avoid data corruption. Now, when libvirt reports this error
depends laregly on the timeout set for the storage configuration, which
is done at host level, using system tools (e.g. it is not a Vdsm tunable)
Sometimes I see after clearing the problems that the VM is
automatically un-paused, sometimes no: how is this managed?
It depends on the error condition that happens. Vdsm tries to recovery
automatically when it is safe to do so. When in doubt, Vdsm always plays
it safe wrt user data
Are there any counters so that if VM has been paused and and
problemsĀ are not solved in a certain timeframe the unpause can be done
only manually by the sysadmin?
AFAIR no, because if Vdsm can't be sure, the only real option is to let
the sysadmin check and decide.
Bests,
--
Francesco Romani
Senior SW Eng., Virtualization R&D
Red Hat
IRC: fromani github: @fromanirh