[ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

Nir Soffer nsoffer at redhat.com
Mon Jul 7 21:04:37 UTC 2014



----- Original Message -----
> From: "Saggi Mizrahi" <smizrahi at redhat.com>
> To: "Nir Soffer" <nsoffer at redhat.com>, "Francesco Romani" <fromani at redhat.com>
> Cc: devel at ovirt.org, "Federico Simoncelli" <fsimonce at redhat.com>, "Michal Skrivanek" <mskrivan at redhat.com>
> Sent: Monday, July 7, 2014 8:03:35 PM
> Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of	stuck calls
> 
> 
> 
> ----- Original Message -----
> > From: "Nir Soffer" <nsoffer at redhat.com>
> > To: "Saggi Mizrahi" <smizrahi at redhat.com>
> > Cc: "Francesco Romani" <fromani at redhat.com>, devel at ovirt.org, "Federico
> > Simoncelli" <fsimonce at redhat.com>, "Michal
> > Skrivanek" <mskrivan at redhat.com>
> > Sent: Sunday, July 6, 2014 6:21:35 PM
> > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling
> > of	stuck calls
> > 
> > ----- Original Message -----
> > > From: "Saggi Mizrahi" <smizrahi at redhat.com>
> > > To: "Francesco Romani" <fromani at redhat.com>
> > > Cc: devel at ovirt.org
> > > Sent: Sunday, July 6, 2014 2:58:38 PM
> > > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and
> > > handling
> > > of	stuck calls
> > > 
> > > The more I think about it the more it looks like it's
> > > purely a libvirt issue. As long as we can make calls
> > > that get stuck on D state we can't scale under stress.
> > 
> > We can scale, but if all libvirt worker threads are blocked, we won't
> > reach very far :-)
> > 
> > Does libvirt try to handle such blocked threads?
> Nope, that is why we won't scale as we just get stuck on libvirt.
> > 
> > > 
> > > In any case, seems like having an interface like.
> > > 
> > > libvirtConnectionPool.request(args, callback)
> > > 
> > > Would be a better solution than a thread pool.
> > > 
> > > It would queue up the request and call the callback
> > > once it's done.
> > > 
> > > pseudo example:
> > > 
> > > def collectStats():
> > >      def callback(resp):
> > >           doStuff(resp)
> > >           threading.Timer(4, collectStats)
> > 
> > This starts a new thread for each sample, which is even worse then having
> > long living thread per vm which Francesco is trying to eliminate.
> We could implement a Timer like class that uses a single preallocated thread.
> It would only run short operations and queue things in a threadpool in case
> of long operations. That way it's unrelated to the threadpool and can be used
> for any use case where we want to queue an event.

This is exactly what I was thinking about.

It would be nice if you can review it:
http://gerrit.ovirt.org/29607

> > 
> > We have two issues here:
> > 1. Scalability - can be improved regardless of libvirt
> > 2. Handling stuck calls - may require solution on the libvirt side
> > 
> > Did you see this?
> > https://bugzilla.redhat.com/1112684
> > 
> > We should stop this trend of starting zillions of threads and use more
> > efficient and maintainable solutions.
> agreed.
> > 
> > > 
> > >      lcp.listAllDevices(callback)
> > > 
> > > We could have the timer queue in a threadpool since it normally
> > > runs in the main thread but that is orthogonal to the libvirt
> > > connection issue.
> > > 
> > > As for the thread pool itself. As long as it's more than one class
> > > it's too complicated.
> > > 
> > > Things like:
> > > * Task Cancelling
> > > * Re-queuing (periodic operations)
> > > 
> > > Shouldn't be part of the thread pool.
> > 
> > +1
> > 
> > > 
> > > Task should just be functions. If we need something
> > > with a state we could use the __call__() method to make
> > > an object look like a function.
> > 
> > +1
> > 
> > > I also don't mind doing
> > > if hasattr(task, "run") or callable(task) to handle it
> > > using an "interface"
> > 
> > Please don't - there is no reason why related object cannot
> > have common interface like __call__.
> I agree, but some people don't like it and despite my reputation
> I do try and avoid pointless bike-shedding
> > 
> > > 
> > > Re-queuing could be done with the build it threading.Timer
> > > as in
> > > 
> > > threading.Timer(4.5, threadpool.queue, args=(self,))
> > > 
> > > That way each operation is responsible for handing
> > > the details of rescheduling.
> > > Should we always wait X, should we wait X - timeItTookToCalculate.
> > > 
> > > You could also do:
> > > threading.Timer(2, threadpool.queue, args=(self,))
> > > threading.Timer(4, self.handleBeingLate)
> > > 
> > > Which would handle not getting queued for a certain amount of time.
> > > 
> > > Task cancelling can't be done in a generic manner.
> > > The most I think we could do is have threadpool.stop()
> > > check hassattr("stop", task) and call it.
> > 
> > Please don't
> I agree that forcing "jobs" to stop should be handled outside
> the threadpool. Again, just trying to find a common ground.
> > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Francesco Romani" <fromani at redhat.com>
> > > > To: devel at ovirt.org
> > > > Sent: Friday, July 4, 2014 5:48:59 PM
> > > > Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling
> > > > of
> > > > 	stuck calls
> > > > 
> > > > Hi,
> > > > 
> > > > Nir has begun reviewing my draft patches about the thread pool and
> > > > sampling
> > > > refactoring (thanks!),
> > > > and already suggested quite some improvements which I'd like to
> > > > summarize
> > > > 
> > > > Quick links to the ongoing discussion:
> > > > http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm
> > > > http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
> > > > 
> > > > Quick summary of the discussion on gerrit so far:
> > > > 1. extract the scheduling logic from the thread pool. Either add a
> > > > separate
> > > > scheduler class
> > > >    or let the sampling task reschedule themselves after a succesfull
> > > >    completion.
> > > >    In any way the concept of  'periodic task', and the added
> > > >    complexity,
> > > >    isn't needed.
> > > > 
> > > > 2. drop all the *queue classes I've added, thus making the package
> > > > simpler.
> > > >    They are no longer needed since we remove the concept of periodic
> > > >    task.
> > > > 
> > > > 3. have per-task timeout, move the stuck task detection elsewhere, like
> > > > in
> > > > the worker thread, ot
> > > >    maybe better in the aforementioned scheduler.
> > > >    If the scheduler finds that any task started in the former pass (or
> > > >    even
> > > >    before!)
> > > >    has not yet completed, there is no point in keeping this task alive
> > > >    and
> > > >    it
> > > >    should be cancelled.
> > > > 
> > > > 4. the sampling task (or maybe the scheduler) can be smarter and
> > > > halting
> > > > the
> > > > sample in presence of
> > > >    not responding calls for a given VM, granted the VM reports its
> > > >    'health'/responsiveness.
> > > > 
> > > > (Hopefully I haven't forgot anything big)
> > > > 
> > > > In the draft currently published, I reluctantly added the *queue
> > > > classes
> > > > and
> > > > I agree the periodic
> > > > task implementation is messy, so I'll be very happy to drop them.
> > > > 
> > > > However, a core question still holds: what to do in presence of the
> > > > stuck
> > > > task?
> > > > 
> > > > I think it is worth to discuss this topic on a medium friendlier than
> > > > gerrit,
> > > > as it is the single
> > > > most important decision to make in the sampling refactoring.
> > > > 
> > > > It all boils down to:
> > > > Should we just keep somewhere stuck threads and wait? Should we cancel
> > > > stuck
> > > > tasks?
> > > > 
> > > > A. Let's cancel the stuck tasks.
> > > > If we move toward a libvirt connection pool, and we give each worker
> > > > thread
> > > > in the sampling pool
> > > > a separate libvirt connection, hopefully read-only, then we should be
> > > > able
> > > > to
> > > > cancel stuck task by
> > > > killing the worker's libvirt connection. We'll still need a (probably
> > > > much
> > > > simpler) watchman/supervisor,
> > > > but no big deal here.
> > > > Libvirt allows to close a connection from a different thread.
> > > > I haven't actually tried to unstuck a blocked thread this way, but I
> > > > have
> > > > no
> > > > reason to believe it
> > > > will not work.
> > > > 
> > > > B. Let's keep around blocked threads
> > > > The code as it is just leaves a blocked libvirt call and the worker
> > > > thread
> > > > that carried it frozen.
> > > > The stuck worker thread can be replaced up to a cap of frozen threads.
> > > > In this worst case scenario, we end up with one (blocked!) thread per
> > > > VM,
> > > > as
> > > > it is today, and with
> > > > no sampling data.
> > > > 
> > > > I believe that #A has some drawbacks which we risk to overlook, and on
> > > > the
> > > > same time #B has some merits.
> > > > 
> > > > Let me explain:
> > > > The hardest case is a call blocked in the kernel in D state. Libvirt
> > > > has
> > > > no
> > > > more room than VDSM
> > > > to unblock it; and libvirt itself *has* a pool of resources (threads in
> > > > this
> > > > case) which can be depleted
> > > > by stuck calls. Actually, retrying to do a failed task may deplete
> > > > their
> > > > pool
> > > > even faster[1].
> > > > 
> > > > I'm not happy to just push this problem down the stack, as it looks to
> > > > me
> > > > that we gain
> > > > very little by doing so. VDSM itself surely stays cleaner, but the
> > > > VDS/hypervisor hosts on the whole
> > > > improves just a bit: libvirt scales better, and that gives us some more
> > > > room.
> > > > 
> > > > On the other hand, by avoiding to reissue dangerous calls, I believe we
> > > > make
> > > > better use of
> > > > the host resources in general. Actually, the point of keeping blocked
> > > > thread
> > > > around is a side effect
> > > > of not reattempting blocked calls. Moreover, to keep the blocked thread
> > > > around has a significant
> > > > benefit: we can discover at the earliest moment when it is safe again
> > > > to
> > > > do
> > > > the blocked call,
> > > > because the blocked call itself returns and we can track this event!
> > > > (and
> > > > of
> > > > course drop the
> > > > now stale result). Otherwise, if we drop the connection, we'll lose
> > > > this
> > > > event and we have no
> > > > more option that trying again and hoping for the best[2]
> > > > 
> > > > I know the #B approach is not the cleanest, but I think it has slightly
> > > > more
> > > > appeal, especially
> > > > on the libvirt depletion front.
> > > > 
> > > > Thoughts and comments very welcome!
> > > > 
> > > > +++
> > > > 
> > > > [1] They have extensions to management API to dinamically adjust their
> > > > thread
> > > > pool and/or to cancel
> > > > tasks, but it is in the RHEL7.2 timeframe.
> > > > [2] A crazy idea would be to do something like
> > > > http://en.wikipedia.org/wiki/Exponential_backoff
> > > > which I'm not sure would be beneficial
> > > > 
> > > > Bests and thanks,
> > > > 
> > > > --
> > > > Francesco Romani
> > > > RedHat Engineering Virtualization R & D
> > > > Phone: 8261328
> > > > IRC: fromani
> > > > _______________________________________________
> > > > Devel mailing list
> > > > Devel at ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/devel
> > > > 
> > > _______________________________________________
> > > Devel mailing list
> > > Devel at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/devel
> > > 
> > 
> 



More information about the Devel mailing list