[ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

Thu Jul 10 11:22:22 UTC 2014

----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> Cc: "Francesco Romani" <fromani at redhat.com>, devel at ovirt.org, "Federico Simoncelli" <fsimonce at redhat.com>, "Michal
> Skrivanek" <mskrivan at redhat.com>
> Sent: Monday, July 7, 2014 11:04:37 PM
> Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of	stuck calls

> > We could implement a Timer like class that uses a single preallocated
> > thread.
> > It would only run short operations and queue things in a threadpool in case
> > of long operations. That way it's unrelated to the threadpool and can be
> > used
> > for any use case where we want to queue an event.
> 
> This is exactly what I was thinking about.
> 
> It would be nice if you can review it:
> http://gerrit.ovirt.org/29607

My review is almost done, but I'd like to bring the discussion in a place more friendly and
visible than gerrit.

Scheduler (29607) It is a nice code and could be an useful building block, but some important
pieces are still out of the picture (the biggest: how to react to detect blocked calls? How to
react to them?).
On the other hand, I think I got all the bits in my threadpool patchset concept (29189).
I surely recognize that I did in a very convoluted and unclear manner, but I'm not sure what will
lead to the best outcome between to build on 29607 and add the missing pieces,
or to deentangle 29189 and remove all the cruft from there.

I think it could be useful anyway to map the concepts in play on this effort.
At very least, this can act as summary/recap and can help us to converge on a shared vocabulary
for the further discussion, so let me try.

Tyring to be semi-formal:

* Sampling:
is a snippet of code, usually a method of the Vm class, that updates a set of fields of a Vm object,
describing the state of a running Vm, with fresher data. This involves one or more calls to libvirt.

* VM Samplings:
Synonims "Samplings for a VM" or "VM sampling set" all the samplings that a VM need to fully
update its state. It is worth to be noted that the samplings are
1. periodic
2. with different periods from each other because some data change more frequently than other

* Safeness/Unsafeness
In the context of the libvirt calls, and JUST and ONLY from the PoV of the sampling responsiveness,
is useful to distinguish between
1. safe: except for yet to be seen catastrophic failures, they don't get stuck or block for indefinite
amount of time
2. unsafe: they need to enter in the QEMU monitor and/or touch storage, and we *know* they can block,
likely because it happened in the past.

* Scheduler
Is the agent which decides when to invoke a sampling of a VM.
A Scheduler *must* know about wall time and about samplings intervals.

* Worker
Is the runnable entity (usually, but not necessarily a python thread) which perform a sampling.

In the current solution, we have
1. One thread per VM which acts both as worker and as scheduler.
2. The scheduler logic is implicit and mixed with the worker code
(see vdsm/virt/sampling.py - AdvancedStatsThread.collect() - sampling.py:~432)
3. The worker/scheduler doesn't handle unsafe calls in any way.
4. Each worker/scheduler carries all the VM samplings.
5. As implict side effect of 3 and 4 above, if one sampling blocks, all the VM samplings
block automatically.
There is some good in this behaviour we may want to preserve
6. As side effect of 1 above, all the VM samplings are independent which each other.
If one blocks, the other (try to) continue to run.
This is actually a nice behaviour we want to preserve.

Drawbacks:
a. len(worker) == len(VMs) -> do not scale
b. no detection of unsafe calls blocking
c. no reaction (including just signaling) of unsafe calls blocking

Niceties:
a. VM samplings are indpendent from each other
b. VM samplings are related and part of a group

In our ideal solution should have:
1. Fixed and low-ish (~10?) number of threads. Scalable to hundreds/thousands of VM.
   Must scale roughly like libvirt does (e.g. not be a bottleneck)
2. Explicit scheduler logic, deentangled from the worker
3. Handling of unsafe calls. Detection of a blocked call, and propagate its consequences
   on the VM health. At least mark it as not-responding. Better: stop (unsafe only?)
   sampling until the VM returns healthy.
4. Grouping among sampling to allow extension. If an unsafe sampling failed,
   should all the other samplings for different VMs continue to run? What about
   the other unsafe samplings of the same VM?
   Maybe not immediate need, but nice to have in near future
5. Do not waste libvirt resources nor just shift the burden on it.
   Do not issue calls when we don't know they can fail. Do not deplete the libvirt
   resource pool. Long story short: do not disconnect/reconnect, be nice with libvirt :)

+++

Since my mail is already too long, I'd just stop here and ask if my above analysis
is right, especially the description of the ideal solution, and/or if I missed something big.

Thanks and bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani