
I share most of Nir's answers, let me just add some information here and there ----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com> Cc: "Francesco Romani" <fromani@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Dan Kenigsberg" <danken@redhat.com> Sent: Sunday, July 13, 2014 1:34:50 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
----- Original Message -----
From: "Saggi Mizrahi" <smizrahi@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "Francesco Romani" <fromani@redhat.com> Cc: devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "Federico Simoncelli" <fsimonce@redhat.com>, "Dan Kenigsberg" <danken@redhat.com> Sent: Sunday, July 13, 2014 1:03:20 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
Hi, I will find time to go in depth in both patches but I still think that this is solving the problem in the wrong place.
You want to have the libvirt connection management code to be separate from the sampling.
It is - we did not suggest changing libvrit connections.
Indeed. My biggest reason for that actually boils down to what Federico suggested on his last email. By acknowledging stuck calls in the upper layer I believe we can manage them better on overall. I think is more akin to a flow error, cannot be recovered on the lower layers, nor the connection layer can provide a reliable error handling, so let's just handle this on application level.
if promise.wait(timeout): try: promise.cancel() except InProgress: # handle except AlreadyFninished: pass
So what you suggest is actually adding a timeout for libvirt calls, without heaving such timeout on the libvirt side. If we had this on the libvirt, side, we would not have to add anything, right?
The current code that blocks on libvirt call will block on a condition variable, and the real call will block on libvirt in a worker thread.
If the worker thread finish before the timeout, the calling code will get the result, and if it is stuck or finish after the timeout, the calling code will get a timeout.
This is nice, but we are left with the worker thread stuck on libvirt, or if it finished after the timeout, we lost the result of the call.
This design imply 2 threads - the calling thread, and the worker threads, so it does not scale and cannot be used for sampling, where we like to have only one thread triggering sampling calls, and the worker threads executing them.
For the api calls, where we have a connection/request thread running libvirt code, this adds a timeout using small additional cost (few other worker threads), and a condition variable and lock for each worker thread.
Michal, Francesco: do we want timeout on libvrit calls, leaving threads stuck on libvirt, or do we like to wait for libvirt call to return?
Well I see little benefit for additional timeout. And surely (sorry for the repetition, but IMO worth stressing out) we need to be nice with libvirt, thus wait is doing the less harm, because we want to avoid to clog libvirt by issuing more call that we will know in advance they would fail (since one is still stuck holding the QEMU monitor). If we patiently wait for libvirt call to get unstuck we also know precisely when is safe to restart sampling/calling libvirt - right after the call returned. Otherwise we'll have to implement some polling, or just call and hope (not good). Having said that, let me mention the timeout machinery we already have in place: vdsm/virt/vm.py :: NotifyingVirDomain Please note that this code is not sampling or stuck-handling specific and was introduced long ago. This code doesn't change anything we said or did because the timeout will be raised when 1. a call is already stuck inside the qemu monitor 2. another call tries to enter on the same monitor The above behaviour (and maybe a timeout) does'nt change the picture, or the 'be nice with libvirt' requirement (well, is probably more like 'do not blindly drop the ball to libvirt, because this will just make things worse'). It could just help us to detect earlier that a VM is stuck. Saggi is right when he says the async-ness is one of the core issues here, but the another, maybe also more important, is the stuck-ness of calls. And unfortunately, 1. we will know a call is stuck only *after* it happened (timeout, supervisor, anything else) 2. there is no reliable way to *cancel* a stuck call/task, even dropping the connection won't help, as Federico reminded. In the design we are evolving, we shall 1. for the VM *already* blocked, and somehow detected as that, suspend sampling until libvirt unlocks. This state has to be reported to Engine, but that's it. 2. for the *others* VM, replace the stuck worker threads with fresh ones (fancy threadpool approach) otherwise everything stops when the pool is exausted, even if some VM continues to behave correctly. - http://gerrit.ovirt.org/#/c/29191 Lastly, about to asyncness of the libvirt API, maybe worth (re)mentioning https://www.redhat.com/archives/libvir-list/2014-July/msg00139.html relevant snippet: "[...] The libvirt-gobject binding to libvirt provides async APIs to libvirt APIs. It does this by using threads internally. [...]"
Sampling would be done with the scheduler and a separate thread pool would be used to handle the response or lack there of.
For sampling, we came to the conclusion that we don't want to abort a call after a timeout, but we want to stop sampling for this vm until the call return. So having a timeout is a non-goal.
Indeed. More precisely, until we know the QEMU monitor for a VM is safe again, there is no point in try to use it again.
That way the threadpool, sampling, and libvirt connection management are all separate. Also, we can hide how we optimize libvirt connections\threads and have it orthogonal to how the sampling works.
The current patches do not change libvirt connection management this is orthogonal issue. They are only about changing the way we do sampling.
Agreed
If we ever get async APIs we could just have the libvirtConnectionManagement layer handle that for us and keep on using the same interface internally.
I think that what you suggest is orthogonal to the sampling problem that Francesco and me are trying to tackle.
Agreed again. Bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani