[ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

Tue Jul 15 15:39:47 UTC 2014

I share most of Nir's answers, let me just add some information here and there

----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Saggi Mizrahi" <smizrahi at redhat.com>
> Cc: "Francesco Romani" <fromani at redhat.com>, devel at ovirt.org, "Michal Skrivanek" <mskrivan at redhat.com>, "Federico
> Simoncelli" <fsimonce at redhat.com>, "Dan Kenigsberg" <danken at redhat.com>
> Sent: Sunday, July 13, 2014 1:34:50 PM
> Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
> 
> ----- Original Message -----
> > From: "Saggi Mizrahi" <smizrahi at redhat.com>
> > To: "Nir Soffer" <nsoffer at redhat.com>, "Francesco Romani"
> > <fromani at redhat.com>
> > Cc: devel at ovirt.org, "Michal Skrivanek" <mskrivan at redhat.com>, "Federico
> > Simoncelli" <fsimonce at redhat.com>, "Dan
> > Kenigsberg" <danken at redhat.com>
> > Sent: Sunday, July 13, 2014 1:03:20 PM
> > Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling
> > of stuck calls
> > 
> > Hi, I will find time to go in depth in both patches
> > but I still think that this is solving the problem in
> > the wrong place.
> > 
> > You want to have the libvirt connection management
> > code to be separate from the sampling.
> 
> It is - we did not suggest changing libvrit connections.

Indeed. My biggest reason for that actually boils down to what
Federico suggested on his last email.
By acknowledging stuck calls in the upper layer I believe
we can manage them better on overall.

I think is more akin to a flow error, cannot be recovered
on the lower layers, nor the connection layer can provide
a reliable error handling, so let's just handle this on
application level.

> > if promise.wait(timeout):
> >    try:
> >        promise.cancel()
> >    except InProgress:
> >        # handle
> >    except AlreadyFninished:
> >        pass
> 
> So what you suggest is actually adding a timeout for libvirt calls, without
> heaving such timeout on the libvirt side. If we had this on the libvirt,
> side, we would not have to add anything, right?
> 
> The current code that blocks on libvirt call will block on a condition
> variable, and the real call will block on libvirt in a worker thread.
> 
> If the worker thread finish before the timeout, the calling code will
> get the result, and if it is stuck or finish after the timeout, the calling
> code will get a timeout.
> 
> This is nice, but we are left with the worker thread stuck on libvirt, or
> if it finished after the timeout, we lost the result of the call.
> 
> This design imply 2 threads - the calling thread, and the worker threads,
> so it does not scale and cannot be used for sampling, where we like to have
> only one thread triggering sampling calls, and the worker threads executing
> them.
> 
> For the api calls, where we have a connection/request thread running libvirt
> code, this adds a timeout using small additional cost (few other worker
> threads),
> and a condition variable and lock for each worker thread.
> 
> Michal, Francesco: do we want timeout on libvrit calls, leaving threads stuck
> on libvirt, or do we like to wait for libvirt call to return?

Well I see little benefit for additional timeout.

And surely (sorry for the repetition, but IMO worth stressing out)
we need to be nice with libvirt, thus wait is doing the less harm, because
we want to avoid to clog libvirt by issuing more call that we will know in advance they
would fail (since one is still stuck holding the QEMU monitor).

If we patiently wait for libvirt call to get unstuck we also know
precisely when is safe to restart sampling/calling libvirt - right after
the call returned. Otherwise we'll have to implement some polling, or
just call and hope (not good).

Having said that, let me mention the timeout machinery
we already have in place: vdsm/virt/vm.py :: NotifyingVirDomain
Please note that this code is not sampling or stuck-handling specific
and was introduced long ago.

This code doesn't change anything we said or did because the timeout
will be raised when
1. a call is already stuck inside the qemu monitor
2. another call tries to enter on the same monitor

The above behaviour (and maybe a timeout) does'nt change the picture,
or the 'be nice with libvirt' requirement (well, is probably more like
'do not blindly drop the ball to libvirt, because this will just make
things worse').
It could just help us to detect earlier that a VM is stuck.

Saggi is right when he says the async-ness is one of the core issues here, but
the another, maybe also more important, is the stuck-ness of calls.
And unfortunately,
1. we will know a call is stuck only *after* it happened (timeout, supervisor,
anything else)
2. there is no reliable way to *cancel* a stuck call/task, even dropping the connection
won't help, as Federico reminded.

In the design we are evolving, we shall
1. for the VM *already* blocked, and somehow detected as that, suspend sampling
   until libvirt unlocks. This state has to be reported to Engine, but that's it.
2. for the *others* VM, replace the stuck worker threads with fresh ones
   (fancy threadpool approach) otherwise everything stops when the pool is exausted,
   even if some VM continues to behave correctly. - http://gerrit.ovirt.org/#/c/29191

Lastly, about to asyncness of the libvirt API, maybe worth (re)mentioning
https://www.redhat.com/archives/libvir-list/2014-July/msg00139.html

relevant snippet:
"[...] The libvirt-gobject binding to libvirt provides async APIs to libvirt
APIs. It does this by using threads internally. [...]"

> > Sampling would be done with the scheduler
> > and a separate thread pool would be used
> > to handle the response or lack there of.
> 
> For sampling, we came to the conclusion that we don't want to abort a call
> after a timeout, but we want to stop sampling for this vm until the call return.
> So having a timeout is a non-goal.

Indeed.
More precisely, until we know the QEMU monitor for a VM is safe again, there is
no point in try to use it again.

> > That way the threadpool, sampling, and libvirt connection
> > management are all separate. Also, we can hide how we
> > optimize libvirt connections\threads and have it orthogonal
> > to how the sampling works.
> 
> The current patches do not change libvirt connection management
> this is orthogonal issue. They are only about changing the way
> we do sampling.

Agreed

> > If we ever get async APIs we could just have the
> > libvirtConnectionManagement layer handle that for us and keep
> > on using the same interface internally.
> 
> I think that what you suggest is orthogonal to the sampling problem that
> Francesco and me are trying to tackle.

Agreed again.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani