[ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

Mon Jul 7 17:03:35 UTC 2014


----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> Cc: "Francesco Romani" <fromani at redhat.com>, devel at ovirt.org, "Federico Simoncelli" <fsimonce at redhat.com>, "Michal
> Skrivanek" <mskrivan at redhat.com>
> Sent: Sunday, July 6, 2014 6:21:35 PM
> Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of	stuck calls
> 
> ----- Original Message -----
> > From: "Saggi Mizrahi" <smizrahi at redhat.com>
> > To: "Francesco Romani" <fromani at redhat.com>
> > Cc: devel at ovirt.org
> > Sent: Sunday, July 6, 2014 2:58:38 PM
> > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling
> > of	stuck calls
> > 
> > The more I think about it the more it looks like it's
> > purely a libvirt issue. As long as we can make calls
> > that get stuck on D state we can't scale under stress.
> 
> We can scale, but if all libvirt worker threads are blocked, we won't
> reach very far :-)
> 
> Does libvirt try to handle such blocked threads?
Nope, that is why we won't scale as we just get stuck on libvirt.
> 
> > 
> > In any case, seems like having an interface like.
> > 
> > libvirtConnectionPool.request(args, callback)
> > 
> > Would be a better solution than a thread pool.
> > 
> > It would queue up the request and call the callback
> > once it's done.
> > 
> > pseudo example:
> > 
> > def collectStats():
> >      def callback(resp):
> >           doStuff(resp)
> >           threading.Timer(4, collectStats)
> 
> This starts a new thread for each sample, which is even worse then having
> long living thread per vm which Francesco is trying to eliminate.
We could implement a Timer like class that uses a single preallocated thread.
It would only run short operations and queue things in a threadpool in case
of long operations. That way it's unrelated to the threadpool and can be used
for any use case where we want to queue an event.
> 
> We have two issues here:
> 1. Scalability - can be improved regardless of libvirt
> 2. Handling stuck calls - may require solution on the libvirt side
> 
> Did you see this?
> https://bugzilla.redhat.com/1112684
> 
> We should stop this trend of starting zillions of threads and use more
> efficient and maintainable solutions.
agreed.
> 
> > 
> >      lcp.listAllDevices(callback)
> > 
> > We could have the timer queue in a threadpool since it normally
> > runs in the main thread but that is orthogonal to the libvirt
> > connection issue.
> > 
> > As for the thread pool itself. As long as it's more than one class
> > it's too complicated.
> > 
> > Things like:
> > * Task Cancelling
> > * Re-queuing (periodic operations)
> > 
> > Shouldn't be part of the thread pool.
> 
> +1
> 
> > 
> > Task should just be functions. If we need something
> > with a state we could use the __call__() method to make
> > an object look like a function.
> 
> +1
> 
> > I also don't mind doing
> > if hasattr(task, "run") or callable(task) to handle it
> > using an "interface"
> 
> Please don't - there is no reason why related object cannot
> have common interface like __call__.
I agree, but some people don't like it and despite my reputation
I do try and avoid pointless bike-shedding
> 
> > 
> > Re-queuing could be done with the build it threading.Timer
> > as in
> > 
> > threading.Timer(4.5, threadpool.queue, args=(self,))
> > 
> > That way each operation is responsible for handing
> > the details of rescheduling.
> > Should we always wait X, should we wait X - timeItTookToCalculate.
> > 
> > You could also do:
> > threading.Timer(2, threadpool.queue, args=(self,))
> > threading.Timer(4, self.handleBeingLate)
> > 
> > Which would handle not getting queued for a certain amount of time.
> > 
> > Task cancelling can't be done in a generic manner.
> > The most I think we could do is have threadpool.stop()
> > check hassattr("stop", task) and call it.
> 
> Please don't
I agree that forcing "jobs" to stop should be handled outside
the threadpool. Again, just trying to find a common ground.
> 
> > 
> > 
> > ----- Original Message -----
> > > From: "Francesco Romani" <fromani at redhat.com>
> > > To: devel at ovirt.org
> > > Sent: Friday, July 4, 2014 5:48:59 PM
> > > Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling
> > > of
> > > 	stuck calls
> > > 
> > > Hi,
> > > 
> > > Nir has begun reviewing my draft patches about the thread pool and
> > > sampling
> > > refactoring (thanks!),
> > > and already suggested quite some improvements which I'd like to summarize
> > > 
> > > Quick links to the ongoing discussion:
> > > http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm
> > > http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
> > > 
> > > Quick summary of the discussion on gerrit so far:
> > > 1. extract the scheduling logic from the thread pool. Either add a
> > > separate
> > > scheduler class
> > >    or let the sampling task reschedule themselves after a succesfull
> > >    completion.
> > >    In any way the concept of  'periodic task', and the added complexity,
> > >    isn't needed.
> > > 
> > > 2. drop all the *queue classes I've added, thus making the package
> > > simpler.
> > >    They are no longer needed since we remove the concept of periodic
> > >    task.
> > > 
> > > 3. have per-task timeout, move the stuck task detection elsewhere, like
> > > in
> > > the worker thread, ot
> > >    maybe better in the aforementioned scheduler.
> > >    If the scheduler finds that any task started in the former pass (or
> > >    even
> > >    before!)
> > >    has not yet completed, there is no point in keeping this task alive
> > >    and
> > >    it
> > >    should be cancelled.
> > > 
> > > 4. the sampling task (or maybe the scheduler) can be smarter and halting
> > > the
> > > sample in presence of
> > >    not responding calls for a given VM, granted the VM reports its
> > >    'health'/responsiveness.
> > > 
> > > (Hopefully I haven't forgot anything big)
> > > 
> > > In the draft currently published, I reluctantly added the *queue classes
> > > and
> > > I agree the periodic
> > > task implementation is messy, so I'll be very happy to drop them.
> > > 
> > > However, a core question still holds: what to do in presence of the stuck
> > > task?
> > > 
> > > I think it is worth to discuss this topic on a medium friendlier than
> > > gerrit,
> > > as it is the single
> > > most important decision to make in the sampling refactoring.
> > > 
> > > It all boils down to:
> > > Should we just keep somewhere stuck threads and wait? Should we cancel
> > > stuck
> > > tasks?
> > > 
> > > A. Let's cancel the stuck tasks.
> > > If we move toward a libvirt connection pool, and we give each worker
> > > thread
> > > in the sampling pool
> > > a separate libvirt connection, hopefully read-only, then we should be
> > > able
> > > to
> > > cancel stuck task by
> > > killing the worker's libvirt connection. We'll still need a (probably
> > > much
> > > simpler) watchman/supervisor,
> > > but no big deal here.
> > > Libvirt allows to close a connection from a different thread.
> > > I haven't actually tried to unstuck a blocked thread this way, but I have
> > > no
> > > reason to believe it
> > > will not work.
> > > 
> > > B. Let's keep around blocked threads
> > > The code as it is just leaves a blocked libvirt call and the worker
> > > thread
> > > that carried it frozen.
> > > The stuck worker thread can be replaced up to a cap of frozen threads.
> > > In this worst case scenario, we end up with one (blocked!) thread per VM,
> > > as
> > > it is today, and with
> > > no sampling data.
> > > 
> > > I believe that #A has some drawbacks which we risk to overlook, and on
> > > the
> > > same time #B has some merits.
> > > 
> > > Let me explain:
> > > The hardest case is a call blocked in the kernel in D state. Libvirt has
> > > no
> > > more room than VDSM
> > > to unblock it; and libvirt itself *has* a pool of resources (threads in
> > > this
> > > case) which can be depleted
> > > by stuck calls. Actually, retrying to do a failed task may deplete their
> > > pool
> > > even faster[1].
> > > 
> > > I'm not happy to just push this problem down the stack, as it looks to me
> > > that we gain
> > > very little by doing so. VDSM itself surely stays cleaner, but the
> > > VDS/hypervisor hosts on the whole
> > > improves just a bit: libvirt scales better, and that gives us some more
> > > room.
> > > 
> > > On the other hand, by avoiding to reissue dangerous calls, I believe we
> > > make
> > > better use of
> > > the host resources in general. Actually, the point of keeping blocked
> > > thread
> > > around is a side effect
> > > of not reattempting blocked calls. Moreover, to keep the blocked thread
> > > around has a significant
> > > benefit: we can discover at the earliest moment when it is safe again to
> > > do
> > > the blocked call,
> > > because the blocked call itself returns and we can track this event! (and
> > > of
> > > course drop the
> > > now stale result). Otherwise, if we drop the connection, we'll lose this
> > > event and we have no
> > > more option that trying again and hoping for the best[2]
> > > 
> > > I know the #B approach is not the cleanest, but I think it has slightly
> > > more
> > > appeal, especially
> > > on the libvirt depletion front.
> > > 
> > > Thoughts and comments very welcome!
> > > 
> > > +++
> > > 
> > > [1] They have extensions to management API to dinamically adjust their
> > > thread
> > > pool and/or to cancel
> > > tasks, but it is in the RHEL7.2 timeframe.
> > > [2] A crazy idea would be to do something like
> > > http://en.wikipedia.org/wiki/Exponential_backoff
> > > which I'm not sure would be beneficial
> > > 
> > > Bests and thanks,
> > > 
> > > --
> > > Francesco Romani
> > > RedHat Engineering Virtualization R & D
> > > Phone: 8261328
> > > IRC: fromani
> > > _______________________________________________
> > > Devel mailing list
> > > Devel at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/devel
> > > 
> > _______________________________________________
> > Devel mailing list
> > Devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/devel
> > 
>