Hi everyone,
Over the last few days I've been working on an concept to reduce the number
of threads in VDSM. Currently, one of the biggest source of them is the VM statistics
sampling:
We have one thread per VM in charge to do so, and this is obviously poorly scalable.
I tried for quite some time to solve this at libvirt level, but according to my findings
and
feelings, that whould require a huge patch.
In the process of studying libvirt, I came up with a simple idea which, according to the
initial tests, seems to work quite nicely and I'd like to discuss.
The concept is to start with a thread pool (
http://en.wikipedia.org/wiki/Thread_pool)
and to add the few additions needed by VDSM:
1. to detect and take care of 'stuck' tasks
2. to avoid a 'stuck' task deplenish the worker pool
3. to avoid to start leak threads on a 'stuck' task
I failed to find an existing thread pool which had those additions, so I wrote a new
one from scratch:
https://github.com/mojaves/vdsm/tree/master/lib/threadpool
Then I spawned a new subpackage (a-la zombiereaper) and I consolidated the existed thread
pool
(from the storage subsystem) and added some longer doc. Please note that the storage
had no changes except the trivial import fixes.
Lastly, I've also added a small compatibility module for concurrent.futures, which is
a very
nice python module which provides a convenient interface to asynchronously execute
callables;
this module is included in python 3.2
(
https://docs.python.org/3.3/library/concurrent.futures.html#module-concur...)
and there is a backport for python 2.x. This can also allow us (as in the virt group)
to consolidate all the long running async operations using the same interface and code.
Please note that I reimplemented a thread pool mostly to be able to experiment with
the concepts listed above, which I failed to find implemented elsewhere in existing
packages.
I'm fine to see them reimplemented elsewhere, since I now believe they collectively
provide
a viable solution for us.
The vdsm/virt/sampling.py has been my testbed, and the patch came up nice
This is the bulk of the work
https://github.com/mojaves/vdsm/commit/2b4c96f9ca3566f0c2f1426beff4400d53...
Here all the changes, most of them are small adjustements
https://github.com/mojaves/vdsm/commits/master/vdsm/virt/sampling.py
and here is how it will looks like
https://github.com/mojaves/vdsm/blob/master/vdsm/virt/sampling.py
I'd like the sampling thread mess sorted out in time for 3.6, so please share
your thoughts!
Thanks and best regards,
--
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani