[ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

Sun Jul 13 15:43:28 UTC 2014


----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> Cc: "Francesco Romani" <fromani at redhat.com>, devel at ovirt.org, "Michal Skrivanek" <mskrivan at redhat.com>, "Federico
> Simoncelli" <fsimonce at redhat.com>, "Dan Kenigsberg" <danken at redhat.com>
> Sent: Sunday, July 13, 2014 2:34:50 PM
> Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
> 
> ----- Original Message -----
> > From: "Saggi Mizrahi" <smizrahi at redhat.com>
> > To: "Nir Soffer" <nsoffer at redhat.com>, "Francesco Romani"
> > <fromani at redhat.com>
> > Cc: devel at ovirt.org, "Michal Skrivanek" <mskrivan at redhat.com>, "Federico
> > Simoncelli" <fsimonce at redhat.com>, "Dan
> > Kenigsberg" <danken at redhat.com>
> > Sent: Sunday, July 13, 2014 1:03:20 PM
> > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling
> > of stuck calls
> > 
> > Hi, I will find time to go in depth in both patches
> > but I still think that this is solving the problem in
> > the wrong place.
> > 
> > You want to have the libvirt connection management
> > code to be separate from the sampling.
> 
> It is - we did not suggest changing libvrit connections.
> 
> > You want to have a libvirt connection pull that you can
> > Queue requests and track their process. The connection
> > pool would have, internally, a thread pool with the same
> > amount of threads as libvirt has (read libvirt conf?).
> > The amount of connections is unimportant if I understand
> > libvirt's design correctly.
> 
> If we don't need multiple connections, then what the connection
> pool does? I guess that you don't mean *connection* pool.
> 
> > By giving to the rest of VDSM
> > an Async view of libvirt operations we gain few things:
> > 
> > A) All call to libvirt enjoy these features even if they
> > are not sampling.
> > B) We simplify the sampling code.
> > 
> > So lets assume we have an global libvirtConnectionPool
> > 
> > The api is something like:
> > lcp.<nameOfOperation>(<args>) # returns a promise
> > 
> > for example:
> > promise = lcp.listAllDomains()
> > 
> > if promise.wait(timeout):
> >    try:
> >        promise.cancel()
> >    except InProgress:
> >        # handle
> >    except AlreadyFninished:
> >        pass
> 
> So what you suggest is actually adding a timeout for libvirt calls, without
> heaving such timeout on the libvirt side. If we had this on the libvirt,
> side, we would not have to add anything, right?
yes
> 
> The current code that blocks on libvirt call will block on a condition
> variable, and the real call will block on libvirt in a worker thread.
This is what the callback is for. You would not wait on anything and the
once the call is done the callback will be called.
> 
> If the worker thread finish before the timeout, the calling code will
> get the result, and if it is stuck or finish after the timeout, the calling
> code will get a timeout.
timeout is optional, in the example I gave I just queue a sched event
to put something in the log.
> 
> This is nice, but we are left with the worker thread stuck on libvirt, or
> if it finished after the timeout, we lost the result of the call.
> 
> This design imply 2 threads - the calling thread, and the worker threads,
> so it does not scale and cannot be used for sampling, where we like to have
> only one thread triggering sampling calls, and the worker threads executing
> them.
> 
> For the api calls, where we have a connection/request thread running libvirt
> code, this adds a timeout using small additional cost (few other worker
> threads),
> and a condition variable and lock for each worker thread.
No the point is having a thread in VDSM per available thread in libvirt.
So if for cases where all libvirt threads are stuck, all of our threads
are stuck but the rest of the system can handle it.
Sampling will just log every few minutes just for records sake but other
operations might return an error since it took too long for the operation
to get to libvirt.
> 
> Michal, Francesco: do we want timeout on libvrit calls, leaving threads stuck
> on libvirt, or do we like to wait for libvirt call to return?
> 
> > 
> > Sampling would be done with the scheduler
> > and a separate thread pool would be used
> > to handle the response or lack there of.
> 
> For sampling, we came to the conclusion that we don't want to abort a call
> after
> a timeout, but we want to stop sampling for this vm until the call return.
> So having a timeout is a non-goal.
> 
> > 
> > def sample()
> >    p = lcp.doSampling()
> >    # setCallback() calls the callback immediately
> >    # if p is already done.
> >    p.setCallaback(onDoneSampling, args=(p,))
> >    sched.schedule(timeout, onSamplingTimeout, args=(p,))
> > 
> > def onSamplingTimeout(p)
> >    status = p.getStatus()
> >    if status == InProgress:
> >        # In flight, so it's probably stuck
> >        # maybe one day we would be able to do something
> >        # other than log and weep
> >        log.warning("Sampling of vm X is stuck :(")
> >        sched.schedule(timeout, onDoneSampling=(p,))
> >    elif status Finished:
> >        # Race, finished just in time, will be handled
> >        # in the callback shortly
> >        pass
> >    else:
> >        log.warning("libvirt under heavy load :(")
> > 
> > 
> > def onDoneSampling(p)
> >        threadpool.queue(doParsing(p.response())
> > 
> > 
> > That way the threadpool, sampling, and libvirt connection
> > management are all separate. Also, we can hide how we
> > optimize libvirt connections\threads and have it orthogonal
> > to how the sampling works.
> 
> The current patches do not change libvirt connection management
> this is orthogonal issue. They are only about changing the way
> we do sampling.
As I've been saying, I think the problem is in actually in the
libvirt connection management and not the stats operations. 
> 
> > 
> > We could have more operations free the threadpool thread they
> > are using if they are waiting for libvirt ops.
> 
> - You have a worker thread stuck on libvirt api call.
>   On the libvirt side, libvirt worker thread is blocked on qemu
>   monitor, which is blocked in the kernel in uninterruptible wait.
> - You run code on another thread detecting that the first thread is stuck
>   This is what Frencesco is suggesting - have a supervisor thread that
>   handle stuck threads.
> 
> Now what are you going to do about it?
> 
> > 
> > If we ever get async APIs we could just have the
> > libvirtConnectionManagement layer handle that for us and keep
> > on using the same interface internally.
> 
> I think that what you suggest is orthogonal to the sampling problem that
> Francesco and me are trying to tackle.
> 
> > 
> > ----- Original Message -----
> > > From: "Nir Soffer" <nsoffer at redhat.com>
> > > To: "Francesco Romani" <fromani at redhat.com>
> > > Cc: devel at ovirt.org, "Michal Skrivanek" <mskrivan at redhat.com>, "Federico
> > > Simoncelli" <fsimonce at redhat.com>, "Saggi
> > > Mizrahi" <smizrahi at redhat.com>, "Dan Kenigsberg" <danken at redhat.com>
> > > Sent: Saturday, July 12, 2014 8:07:27 PM
> > > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and
> > > handling
> > > of stuck calls
> > > 
> > > ----- Original Message -----
> > > > From: "Francesco Romani" <fromani at redhat.com>
> > > > To: devel at ovirt.org
> > > > Cc: "Nir Soffer" <nsoffer at redhat.com>, "Michal Skrivanek"
> > > > <mskrivan at redhat.com>, "Federico Simoncelli"
> > > > <fsimonce at redhat.com>, "Saggi Mizrahi" <smizrahi at redhat.com>, "Dan
> > > > Kenigsberg" <danken at redhat.com>
> > > > Sent: Saturday, July 12, 2014 1:59:22 PM
> > > > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and
> > > > handling
> > > > of stuck calls
> > > > 
> > > > (I'm continuing from here but this probably deserves a new thread.
> > > > However.)
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Federico Simoncelli" <fsimonce at redhat.com>
> > > > > To: devel at ovirt.org
> > > > > Cc: "Nir Soffer" <nsoffer at redhat.com>, "Michal Skrivanek"
> > > > > <mskrivan at redhat.com>, "Adam Litke" <alitke at redhat.com>,
> > > > > "Francesco Romani" <fromani at redhat.com>
> > > > > Sent: Wednesday, July 9, 2014 4:57:53 PM
> > > > > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and
> > > > > handling
> > > > > of stuck calls
> > > > 
> > > > > > > So basically the current threading model is the behavior we want?
> > > > > > > 
> > > > > > > If some call get stuck, stop sampling this vm. Continue when the
> > > > > > > call returns.
> > > > > > > 
> > > > > > > Michal? Federico?
> > > > > > 
> > > > > > Yep - but with less threads, and surely with a constant number of
> > > > > > them.
> > > > > > Your schedule library (review in my queue at very high priority) is
> > > > > > indeed
> > > > > > a nice step in this direcation.
> > > > > > 
> > > > > > Waiting for Federico's ack.
> > > > > 
> > > > > That looks good. Now I would like to summarize few things.
> > > > > 
> > > > > We know that when a request gets stuck on a vm also the subsequent
> > > > > ones
> > > > > will
> > > > > get stuck (at least until their timeout is up, except for the first
> > > > > one
> > > > > that
> > > > > could stay there forever).
> > > > > 
> > > > > We want a limited number of threads polling the statistics (trying to
> > > > > match
> > > > > the number of threads that libvirt has).
> > > > > 
> > > > > Given those two assumptions we want a thread pool of workers that are
> > > > > picking
> > > > > up jobs *per-vm*. The jobs should be smart enough to:
> > > > > 
> > > > > - understand what samples they have to take in that cycle (cpu?
> > > > > network?
> > > > > etc.)
> > > > > - resubmit themselves in the queue
> > > > > 
> > > > > Now this will ensure that in the queue there's only one job per-vm
> > > > > and
> > > > > if
> > > > > it
> > > > > gets stuck it is not re-submitted (no other worker will get stuck).
> > > > 
> > > > In the last few days I was thinking really hard and long about our last
> > > > discussions,
> > > > feedback and proposals and how to properly fit all the pieces together.
> > > > Michal and me also had a chat about this topics on Friday, and
> > > > eventually
> > > > I come up with this new draft
> > > > 
> > > > http://gerrit.ovirt.org/#/c/29977
> > > > 
> > > > (yes, that's it, just this) which builds on Nir's Schedule, the
> > > > existing
> > > > Threadpool
> > > > hidden inside vdsm/storage, and which I believe provides a much, much
> > > > better
> > > > ground
> > > > for further development or discussion
> > > > .
> > > > Driving forces behind this new draft:
> > > > - minimize bloat.
> > > > - minimize changes.
> > > > - separate nicely concerns (Scheduler schedules, threadpool executes,
> > > > Sampling
> > > > cares about the actual sampling only).
> > > > - leverage as much as possible existing infrastracture; avoid to
> > > > introduce
> > > > new
> > > > fancy stuff unless absolutely needed.
> > > > 
> > > > And here it is. Almost all the concepts and requirements we discussed
> > > > are
> > > > there.
> > > > The thing which is lacking here is strong isolation about
> > > > VMs/samplings.
> > > > 
> > > > This new concept does nothing to recover stuck worker threads: if the
> > > > pool
> > > > is exausted, everything eventually stops, after a few sampling
> > > > intervals.
> > > > Stuck jobs are detected and the corresponding VMs are marked
> > > > unresponsive
> > > > (leveraging existing infrastructure).
> > > > When (if?) stuck jobs eventually restart working, everything else
> > > > restarts
> > > > as
> > > > well.
> > > > 
> > > > The changes are minimal, and there is still room for refactoring and
> > > > cleanup,
> > > > but I believe the design is nicer and cleaner.
> > > > 
> > > > Further steps:
> > > > * replace existing thread pool with a fancier one which can replace
> > > > stuck threads, or dinamically resize himself, to achieve better
> > > > isolation
> > > > among
> > > > VMs or jobs?
> > > > * Split the new VmStatsCollector class in smaller components?
> > > > * Stale data detection. Planned but not yet there, I just need to get
> > > > how
> > > > to
> > > > properly fit it into the AdvancedStatsFunction windowing sample. Should
> > > > nt be a big deal, however.
> > > > 
> > > > I also have already quite few cleanup patches for the existing
> > > > threadpool
> > > > and
> > > > for
> > > > the sampling code in the queue, some are on gerrit, some are not.
> > > > 
> > > > I think most of them can wait once we agree on the overall design.
> > > > 
> > > > Nir also provided further suggestions (thanks for that!) and possible
> > > > design
> > > > alternatives which I'm now evaluating carefully.
> > > 
> > > I agree with Federico and you - I think this is the way we should
> > > explore.
> > > 
> > > But I don't understand the way you are implementing this using the
> > > scheduler
> > > in http://gerrit.ovirt.org/29977, and it seems that this does not ensure
> > > that
> > > every vm has only one sampling task running at the same time.
> > > 
> > > I started to work on a prototype during the last week and I think that
> > > this
> > > is the right way to implement. Please check this patch:
> > > http://gerrit.ovirt.org/29980
> > > 
> > > This use the current storage thread pool, but I don't think it is good
> > > enough.
> > > I think we should continue with http://gerrit.ovirt.org/29191 so we can
> > > handle
> > > stuck worker thread without decreasing the work force of the thread pool.
> > > 
> > > Nir
> > > 
> > 
>