Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

15 Jul 2014

      ----- Original Message -----
...
From: "Francesco Romani" <fromani@redhat.com>
To: devel@ovirt.org
Cc: "Nir Soffer" <nsoffer@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com>, "Federico Simoncelli"
<fsimonce@redhat.com>, "Dan Kenigsberg" <danken@redhat.com>, "Saggi Mizrahi" <smizrahi@redhat.com>
Sent: Tuesday, July 15, 2014 6:50:45 PM
Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
----- Original Message -----
...
From: "Saggi Mizrahi" <smizrahi@redhat.com>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: "Francesco Romani" <fromani@redhat.com>, devel@ovirt.org, "Michal
Skrivanek" <mskrivan@redhat.com>, "Federico
Simoncelli" <fsimonce@redhat.com>, "Dan Kenigsberg" <danken@redhat.com>
Sent: Sunday, July 13, 2014 5:43:28 PM
Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling
of stuck calls
[...]
...
...
The current patches do not change libvirt connection management
this is orthogonal issue. They are only about changing the way
we do sampling.
As I've been saying, I think the problem is in actually in the
libvirt connection management and not the stats operations.
Well yes, I think to have a better libvirt connection management is
another way to reach the go, granted it could detect and signal to
the upper layer a stuck call.
With that in place, the sampling code is simpler, and no need
for fancy thread pool. Even though we may need something like this
in the connection handling code internals.
Can you explain how do you solve the sampling issue with better
connection management?

Since libvirt does not have async api (yet), it seems that this
would just move the thread pool to the connection layer.
...
*Maybe* a supervisor-like approach like my very first proposal could
work, but a very good point made by Nir is how to tell when something is
'stuck', since only tasks really know their timeout.
For sampling it is easy, is the sampling interval, but how to convey
this timeout in a generic manner to the connection layer?