Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

15 Jul 2014

      I share most of Nir's answers, let me just add some information here and there

----- Original Message -----
...
From: "Nir Soffer" <nsoffer@redhat.com>
To: "Saggi Mizrahi" <smizrahi@redhat.com>
Cc: "Francesco Romani" <fromani@redhat.com>, devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "Federico
Simoncelli" <fsimonce@redhat.com>, "Dan Kenigsberg" <danken@redhat.com>
Sent: Sunday, July 13, 2014 1:34:50 PM
Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
----- Original Message -----
...
From: "Saggi Mizrahi" <smizrahi@redhat.com>
To: "Nir Soffer" <nsoffer@redhat.com>, "Francesco Romani"
<fromani@redhat.com>
Cc: devel@ovirt.org, "Michal Skrivanek" <mskrivan@redhat.com>, "Federico
Simoncelli" <fsimonce@redhat.com>, "Dan
Kenigsberg" <danken@redhat.com>
Sent: Sunday, July 13, 2014 1:03:20 PM
Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling
of stuck calls
Hi, I will find time to go in depth in both patches
but I still think that this is solving the problem in
the wrong place.
You want to have the libvirt connection management
code to be separate from the sampling.
It is - we did not suggest changing libvrit connections.
Indeed. My biggest reason for that actually boils down to what
Federico suggested on his last email.
By acknowledging stuck calls in the upper layer I believe
we can manage them better on overall.

I think is more akin to a flow error, cannot be recovered
on the lower layers, nor the connection layer can provide
a reliable error handling, so let's just handle this on
application level.
...
...
if promise.wait(timeout):
   try:
       promise.cancel()
   except InProgress:
       # handle
   except AlreadyFninished:
       pass
So what you suggest is actually adding a timeout for libvirt calls, without
heaving such timeout on the libvirt side. If we had this on the libvirt,
side, we would not have to add anything, right?
The current code that blocks on libvirt call will block on a condition
variable, and the real call will block on libvirt in a worker thread.
If the worker thread finish before the timeout, the calling code will
get the result, and if it is stuck or finish after the timeout, the calling
code will get a timeout.
This is nice, but we are left with the worker thread stuck on libvirt, or
if it finished after the timeout, we lost the result of the call.
This design imply 2 threads - the calling thread, and the worker threads,
so it does not scale and cannot be used for sampling, where we like to have
only one thread triggering sampling calls, and the worker threads executing
them.
For the api calls, where we have a connection/request thread running libvirt
code, this adds a timeout using small additional cost (few other worker
threads),
and a condition variable and lock for each worker thread.
Michal, Francesco: do we want timeout on libvrit calls, leaving threads stuck
on libvirt, or do we like to wait for libvirt call to return?
Well I see little benefit for additional timeout.

And surely (sorry for the repetition, but IMO worth stressing out)
we need to be nice with libvirt, thus wait is doing the less harm, because
we want to avoid to clog libvirt by issuing more call that we will know in advance they
would fail (since one is still stuck holding the QEMU monitor).

If we patiently wait for libvirt call to get unstuck we also know
precisely when is safe to restart sampling/calling libvirt - right after
the call returned. Otherwise we'll have to implement some polling, or
just call and hope (not good).

Having said that, let me mention the timeout machinery
we already have in place: vdsm/virt/vm.py :: NotifyingVirDomain
Please note that this code is not sampling or stuck-handling specific
and was introduced long ago.

This code doesn't change anything we said or did because the timeout
will be raised when
1. a call is already stuck inside the qemu monitor
2. another call tries to enter on the same monitor

The above behaviour (and maybe a timeout) does'nt change the picture,
or the 'be nice with libvirt' requirement (well, is probably more like
'do not blindly drop the ball to libvirt, because this will just make
things worse').
It could just help us to detect earlier that a VM is stuck.

Saggi is right when he says the async-ness is one of the core issues here, but
the another, maybe also more important, is the stuck-ness of calls.
And unfortunately,
1. we will know a call is stuck only *after* it happened (timeout, supervisor,
anything else)
2. there is no reliable way to *cancel* a stuck call/task, even dropping the connection
won't help, as Federico reminded.

In the design we are evolving, we shall
1. for the VM *already* blocked, and somehow detected as that, suspend sampling
   until libvirt unlocks. This state has to be reported to Engine, but that's it.
2. for the *others* VM, replace the stuck worker threads with fresh ones
   (fancy threadpool approach) otherwise everything stops when the pool is exausted,
   even if some VM continues to behave correctly. - http://gerrit.ovirt.org/#/c/29191

Lastly, about to asyncness of the libvirt API, maybe worth (re)mentioning
https://www.redhat.com/archives/libvir-list/2014-July/msg00139.html

relevant snippet:
"[...] The libvirt-gobject binding to libvirt provides async APIs to libvirt
APIs. It does this by using threads internally. [...]"
...
...
Sampling would be done with the scheduler
and a separate thread pool would be used
to handle the response or lack there of.
For sampling, we came to the conclusion that we don't want to abort a call
after a timeout, but we want to stop sampling for this vm until the call return.
So having a timeout is a non-goal.
Indeed.
More precisely, until we know the QEMU monitor for a VM is safe again, there is
no point in try to use it again.
...
...
That way the threadpool, sampling, and libvirt connection
management are all separate. Also, we can hide how we
optimize libvirt connections\threads and have it orthogonal
to how the sampling works.
The current patches do not change libvirt connection management
this is orthogonal issue. They are only about changing the way
we do sampling.
Agreed
...
...
If we ever get async APIs we could just have the
libvirtConnectionManagement layer handle that for us and keep
on using the same interface internally.
I think that what you suggest is orthogonal to the sampling problem that
Francesco and me are trying to tackle.
Agreed again.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani