[ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

Sun Jul 6 15:21:35 UTC 2014

----- Original Message -----
> From: "Saggi Mizrahi" <smizrahi at redhat.com>
> To: "Francesco Romani" <fromani at redhat.com>
> Cc: devel at ovirt.org
> Sent: Sunday, July 6, 2014 2:58:38 PM
> Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of	stuck calls
> 
> The more I think about it the more it looks like it's
> purely a libvirt issue. As long as we can make calls
> that get stuck on D state we can't scale under stress.

We can scale, but if all libvirt worker threads are blocked, we won't
reach very far :-)

Does libvirt try to handle such blocked threads?

> 
> In any case, seems like having an interface like.
> 
> libvirtConnectionPool.request(args, callback)
> 
> Would be a better solution than a thread pool.
> 
> It would queue up the request and call the callback
> once it's done.
> 
> pseudo example:
> 
> def collectStats():
>      def callback(resp):
>           doStuff(resp)
>           threading.Timer(4, collectStats)

This starts a new thread for each sample, which is even worse then having
long living thread per vm which Francesco is trying to eliminate.

We have two issues here:
1. Scalability - can be improved regardless of libvirt
2. Handling stuck calls - may require solution on the libvirt side

Did you see this?
https://bugzilla.redhat.com/1112684

We should stop this trend of starting zillions of threads and use more
efficient and maintainable solutions.

> 
>      lcp.listAllDevices(callback)
> 
> We could have the timer queue in a threadpool since it normally
> runs in the main thread but that is orthogonal to the libvirt
> connection issue.
> 
> As for the thread pool itself. As long as it's more than one class
> it's too complicated.
> 
> Things like:
> * Task Cancelling
> * Re-queuing (periodic operations)
> 
> Shouldn't be part of the thread pool.

+1

> 
> Task should just be functions. If we need something
> with a state we could use the __call__() method to make
> an object look like a function.

+1

> I also don't mind doing
> if hasattr(task, "run") or callable(task) to handle it
> using an "interface"

Please don't - there is no reason why related object cannot
have common interface like __call__.

> 
> Re-queuing could be done with the build it threading.Timer
> as in
> 
> threading.Timer(4.5, threadpool.queue, args=(self,))
> 
> That way each operation is responsible for handing
> the details of rescheduling.
> Should we always wait X, should we wait X - timeItTookToCalculate.
> 
> You could also do:
> threading.Timer(2, threadpool.queue, args=(self,))
> threading.Timer(4, self.handleBeingLate)
> 
> Which would handle not getting queued for a certain amount of time.
> 
> Task cancelling can't be done in a generic manner.
> The most I think we could do is have threadpool.stop()
> check hassattr("stop", task) and call it.

Please don't

> 
> 
> ----- Original Message -----
> > From: "Francesco Romani" <fromani at redhat.com>
> > To: devel at ovirt.org
> > Sent: Friday, July 4, 2014 5:48:59 PM
> > Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling of
> > 	stuck calls
> > 
> > Hi,
> > 
> > Nir has begun reviewing my draft patches about the thread pool and sampling
> > refactoring (thanks!),
> > and already suggested quite some improvements which I'd like to summarize
> > 
> > Quick links to the ongoing discussion:
> > http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm
> > http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
> > 
> > Quick summary of the discussion on gerrit so far:
> > 1. extract the scheduling logic from the thread pool. Either add a separate
> > scheduler class
> >    or let the sampling task reschedule themselves after a succesfull
> >    completion.
> >    In any way the concept of  'periodic task', and the added complexity,
> >    isn't needed.
> > 
> > 2. drop all the *queue classes I've added, thus making the package simpler.
> >    They are no longer needed since we remove the concept of periodic task.
> > 
> > 3. have per-task timeout, move the stuck task detection elsewhere, like in
> > the worker thread, ot
> >    maybe better in the aforementioned scheduler.
> >    If the scheduler finds that any task started in the former pass (or even
> >    before!)
> >    has not yet completed, there is no point in keeping this task alive and
> >    it
> >    should be cancelled.
> > 
> > 4. the sampling task (or maybe the scheduler) can be smarter and halting
> > the
> > sample in presence of
> >    not responding calls for a given VM, granted the VM reports its
> >    'health'/responsiveness.
> > 
> > (Hopefully I haven't forgot anything big)
> > 
> > In the draft currently published, I reluctantly added the *queue classes
> > and
> > I agree the periodic
> > task implementation is messy, so I'll be very happy to drop them.
> > 
> > However, a core question still holds: what to do in presence of the stuck
> > task?
> > 
> > I think it is worth to discuss this topic on a medium friendlier than
> > gerrit,
> > as it is the single
> > most important decision to make in the sampling refactoring.
> > 
> > It all boils down to:
> > Should we just keep somewhere stuck threads and wait? Should we cancel
> > stuck
> > tasks?
> > 
> > A. Let's cancel the stuck tasks.
> > If we move toward a libvirt connection pool, and we give each worker thread
> > in the sampling pool
> > a separate libvirt connection, hopefully read-only, then we should be able
> > to
> > cancel stuck task by
> > killing the worker's libvirt connection. We'll still need a (probably much
> > simpler) watchman/supervisor,
> > but no big deal here.
> > Libvirt allows to close a connection from a different thread.
> > I haven't actually tried to unstuck a blocked thread this way, but I have
> > no
> > reason to believe it
> > will not work.
> > 
> > B. Let's keep around blocked threads
> > The code as it is just leaves a blocked libvirt call and the worker thread
> > that carried it frozen.
> > The stuck worker thread can be replaced up to a cap of frozen threads.
> > In this worst case scenario, we end up with one (blocked!) thread per VM,
> > as
> > it is today, and with
> > no sampling data.
> > 
> > I believe that #A has some drawbacks which we risk to overlook, and on the
> > same time #B has some merits.
> > 
> > Let me explain:
> > The hardest case is a call blocked in the kernel in D state. Libvirt has no
> > more room than VDSM
> > to unblock it; and libvirt itself *has* a pool of resources (threads in
> > this
> > case) which can be depleted
> > by stuck calls. Actually, retrying to do a failed task may deplete their
> > pool
> > even faster[1].
> > 
> > I'm not happy to just push this problem down the stack, as it looks to me
> > that we gain
> > very little by doing so. VDSM itself surely stays cleaner, but the
> > VDS/hypervisor hosts on the whole
> > improves just a bit: libvirt scales better, and that gives us some more
> > room.
> > 
> > On the other hand, by avoiding to reissue dangerous calls, I believe we
> > make
> > better use of
> > the host resources in general. Actually, the point of keeping blocked
> > thread
> > around is a side effect
> > of not reattempting blocked calls. Moreover, to keep the blocked thread
> > around has a significant
> > benefit: we can discover at the earliest moment when it is safe again to do
> > the blocked call,
> > because the blocked call itself returns and we can track this event! (and
> > of
> > course drop the
> > now stale result). Otherwise, if we drop the connection, we'll lose this
> > event and we have no
> > more option that trying again and hoping for the best[2]
> > 
> > I know the #B approach is not the cleanest, but I think it has slightly
> > more
> > appeal, especially
> > on the libvirt depletion front.
> > 
> > Thoughts and comments very welcome!
> > 
> > +++
> > 
> > [1] They have extensions to management API to dinamically adjust their
> > thread
> > pool and/or to cancel
> > tasks, but it is in the RHEL7.2 timeframe.
> > [2] A crazy idea would be to do something like
> > http://en.wikipedia.org/wiki/Exponential_backoff
> > which I'm not sure would be beneficial
> > 
> > Bests and thanks,
> > 
> > --
> > Francesco Romani
> > RedHat Engineering Virtualization R & D
> > Phone: 8261328
> > IRC: fromani
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
> > 
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>