
On 10/8/19 4:06 PM, Gianluca Cecchi wrote: Hi Gianluca
Hello, I'm doing some tests related to storage latency or problems manually created to debug and manage reactions of hosts and VMs. What is the subsystem/process/daemon responsible to pause a VM when problems arise on storage for the host where the VM is running?
It's Vdsm itself.
How is determined the timeoutĀ to use to put the VM in pause mode?
The VM is paused immediately as soon as libvirt, through QEMU, reports IOError, to avoid data corruption. Now, when libvirt reports this error depends laregly on the timeout set for the storage configuration, which is done at host level, using system tools (e.g. it is not a Vdsm tunable)
Sometimes I see after clearing the problems that the VM is automatically un-paused, sometimes no: how is this managed?
It depends on the error condition that happens. Vdsm tries to recovery automatically when it is safe to do so. When in doubt, Vdsm always plays it safe wrt user data
Are there any counters so that if VM has been paused and and problemsĀ are not solved in a certain timeframe the unpause can be done only manually by the sysadmin?
AFAIR no, because if Vdsm can't be sure, the only real option is to let the sysadmin check and decide. Bests, -- Francesco Romani Senior SW Eng., Virtualization R&D Red Hat IRC: fromani github: @fromanirh