Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls

The more I think about it the more it looks like it's purely a libvirt issue. As long as we can make calls that get stuck on D state we can't scale under stress. In any case, seems like having an interface like. libvirtConnectionPool.request(args, callback) Would be a better solution than a thread pool. It would queue up the request and call the callback once it's done. pseudo example: def collectStats(): def callback(resp): doStuff(resp) threading.Timer(4, collectStats) lcp.listAllDevices(callback) We could have the timer queue in a threadpool since it normally runs in the main thread but that is orthogonal to the libvirt connection issue. As for the thread pool itself. As long as it's more than one class it's too complicated. Things like: * Task Cancelling * Re-queuing (periodic operations) Shouldn't be part of the thread pool. Task should just be functions. If we need something with a state we could use the __call__() method to make an object look like a function. I also don't mind doing if hasattr(task, "run") or callable(task) to handle it using an "interface" Re-queuing could be done with the build it threading.Timer as in threading.Timer(4.5, threadpool.queue, args=(self,)) That way each operation is responsible for handing the details of rescheduling. Should we always wait X, should we wait X - timeItTookToCalculate. You could also do: threading.Timer(2, threadpool.queue, args=(self,)) threading.Timer(4, self.handleBeingLate) Which would handle not getting queued for a certain amount of time. Task cancelling can't be done in a generic manner. The most I think we could do is have threadpool.stop() check hassattr("stop", task) and call it. ----- Original Message -----
From: "Francesco Romani" <fromani@redhat.com> To: devel@ovirt.org Sent: Friday, July 4, 2014 5:48:59 PM Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
Hi,
Nir has begun reviewing my draft patches about the thread pool and sampling refactoring (thanks!), and already suggested quite some improvements which I'd like to summarize
Quick links to the ongoing discussion: http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
Quick summary of the discussion on gerrit so far: 1. extract the scheduling logic from the thread pool. Either add a separate scheduler class or let the sampling task reschedule themselves after a succesfull completion. In any way the concept of 'periodic task', and the added complexity, isn't needed.
2. drop all the *queue classes I've added, thus making the package simpler. They are no longer needed since we remove the concept of periodic task.
3. have per-task timeout, move the stuck task detection elsewhere, like in the worker thread, ot maybe better in the aforementioned scheduler. If the scheduler finds that any task started in the former pass (or even before!) has not yet completed, there is no point in keeping this task alive and it should be cancelled.
4. the sampling task (or maybe the scheduler) can be smarter and halting the sample in presence of not responding calls for a given VM, granted the VM reports its 'health'/responsiveness.
(Hopefully I haven't forgot anything big)
In the draft currently published, I reluctantly added the *queue classes and I agree the periodic task implementation is messy, so I'll be very happy to drop them.
However, a core question still holds: what to do in presence of the stuck task?
I think it is worth to discuss this topic on a medium friendlier than gerrit, as it is the single most important decision to make in the sampling refactoring.
It all boils down to: Should we just keep somewhere stuck threads and wait? Should we cancel stuck tasks?
A. Let's cancel the stuck tasks. If we move toward a libvirt connection pool, and we give each worker thread in the sampling pool a separate libvirt connection, hopefully read-only, then we should be able to cancel stuck task by killing the worker's libvirt connection. We'll still need a (probably much simpler) watchman/supervisor, but no big deal here. Libvirt allows to close a connection from a different thread. I haven't actually tried to unstuck a blocked thread this way, but I have no reason to believe it will not work.
B. Let's keep around blocked threads The code as it is just leaves a blocked libvirt call and the worker thread that carried it frozen. The stuck worker thread can be replaced up to a cap of frozen threads. In this worst case scenario, we end up with one (blocked!) thread per VM, as it is today, and with no sampling data.
I believe that #A has some drawbacks which we risk to overlook, and on the same time #B has some merits.
Let me explain: The hardest case is a call blocked in the kernel in D state. Libvirt has no more room than VDSM to unblock it; and libvirt itself *has* a pool of resources (threads in this case) which can be depleted by stuck calls. Actually, retrying to do a failed task may deplete their pool even faster[1].
I'm not happy to just push this problem down the stack, as it looks to me that we gain very little by doing so. VDSM itself surely stays cleaner, but the VDS/hypervisor hosts on the whole improves just a bit: libvirt scales better, and that gives us some more room.
On the other hand, by avoiding to reissue dangerous calls, I believe we make better use of the host resources in general. Actually, the point of keeping blocked thread around is a side effect of not reattempting blocked calls. Moreover, to keep the blocked thread around has a significant benefit: we can discover at the earliest moment when it is safe again to do the blocked call, because the blocked call itself returns and we can track this event! (and of course drop the now stale result). Otherwise, if we drop the connection, we'll lose this event and we have no more option that trying again and hoping for the best[2]
I know the #B approach is not the cleanest, but I think it has slightly more appeal, especially on the libvirt depletion front.
Thoughts and comments very welcome!
+++
[1] They have extensions to management API to dinamically adjust their thread pool and/or to cancel tasks, but it is in the RHEL7.2 timeframe. [2] A crazy idea would be to do something like http://en.wikipedia.org/wiki/Exponential_backoff which I'm not sure would be beneficial
Bests and thanks,
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

----- Original Message -----
From: "Saggi Mizrahi" <smizrahi@redhat.com> To: "Francesco Romani" <fromani@redhat.com> Cc: devel@ovirt.org Sent: Sunday, July 6, 2014 2:58:38 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
The more I think about it the more it looks like it's purely a libvirt issue. As long as we can make calls that get stuck on D state we can't scale under stress.
We can scale, but if all libvirt worker threads are blocked, we won't reach very far :-) Does libvirt try to handle such blocked threads?
In any case, seems like having an interface like.
libvirtConnectionPool.request(args, callback)
Would be a better solution than a thread pool.
It would queue up the request and call the callback once it's done.
pseudo example:
def collectStats(): def callback(resp): doStuff(resp) threading.Timer(4, collectStats)
This starts a new thread for each sample, which is even worse then having long living thread per vm which Francesco is trying to eliminate. We have two issues here: 1. Scalability - can be improved regardless of libvirt 2. Handling stuck calls - may require solution on the libvirt side Did you see this? https://bugzilla.redhat.com/1112684 We should stop this trend of starting zillions of threads and use more efficient and maintainable solutions.
lcp.listAllDevices(callback)
We could have the timer queue in a threadpool since it normally runs in the main thread but that is orthogonal to the libvirt connection issue.
As for the thread pool itself. As long as it's more than one class it's too complicated.
Things like: * Task Cancelling * Re-queuing (periodic operations)
Shouldn't be part of the thread pool.
+1
Task should just be functions. If we need something with a state we could use the __call__() method to make an object look like a function.
+1
I also don't mind doing if hasattr(task, "run") or callable(task) to handle it using an "interface"
Please don't - there is no reason why related object cannot have common interface like __call__.
Re-queuing could be done with the build it threading.Timer as in
threading.Timer(4.5, threadpool.queue, args=(self,))
That way each operation is responsible for handing the details of rescheduling. Should we always wait X, should we wait X - timeItTookToCalculate.
You could also do: threading.Timer(2, threadpool.queue, args=(self,)) threading.Timer(4, self.handleBeingLate)
Which would handle not getting queued for a certain amount of time.
Task cancelling can't be done in a generic manner. The most I think we could do is have threadpool.stop() check hassattr("stop", task) and call it.
Please don't
----- Original Message -----
From: "Francesco Romani" <fromani@redhat.com> To: devel@ovirt.org Sent: Friday, July 4, 2014 5:48:59 PM Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
Hi,
Nir has begun reviewing my draft patches about the thread pool and sampling refactoring (thanks!), and already suggested quite some improvements which I'd like to summarize
Quick links to the ongoing discussion: http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
Quick summary of the discussion on gerrit so far: 1. extract the scheduling logic from the thread pool. Either add a separate scheduler class or let the sampling task reschedule themselves after a succesfull completion. In any way the concept of 'periodic task', and the added complexity, isn't needed.
2. drop all the *queue classes I've added, thus making the package simpler. They are no longer needed since we remove the concept of periodic task.
3. have per-task timeout, move the stuck task detection elsewhere, like in the worker thread, ot maybe better in the aforementioned scheduler. If the scheduler finds that any task started in the former pass (or even before!) has not yet completed, there is no point in keeping this task alive and it should be cancelled.
4. the sampling task (or maybe the scheduler) can be smarter and halting the sample in presence of not responding calls for a given VM, granted the VM reports its 'health'/responsiveness.
(Hopefully I haven't forgot anything big)
In the draft currently published, I reluctantly added the *queue classes and I agree the periodic task implementation is messy, so I'll be very happy to drop them.
However, a core question still holds: what to do in presence of the stuck task?
I think it is worth to discuss this topic on a medium friendlier than gerrit, as it is the single most important decision to make in the sampling refactoring.
It all boils down to: Should we just keep somewhere stuck threads and wait? Should we cancel stuck tasks?
A. Let's cancel the stuck tasks. If we move toward a libvirt connection pool, and we give each worker thread in the sampling pool a separate libvirt connection, hopefully read-only, then we should be able to cancel stuck task by killing the worker's libvirt connection. We'll still need a (probably much simpler) watchman/supervisor, but no big deal here. Libvirt allows to close a connection from a different thread. I haven't actually tried to unstuck a blocked thread this way, but I have no reason to believe it will not work.
B. Let's keep around blocked threads The code as it is just leaves a blocked libvirt call and the worker thread that carried it frozen. The stuck worker thread can be replaced up to a cap of frozen threads. In this worst case scenario, we end up with one (blocked!) thread per VM, as it is today, and with no sampling data.
I believe that #A has some drawbacks which we risk to overlook, and on the same time #B has some merits.
Let me explain: The hardest case is a call blocked in the kernel in D state. Libvirt has no more room than VDSM to unblock it; and libvirt itself *has* a pool of resources (threads in this case) which can be depleted by stuck calls. Actually, retrying to do a failed task may deplete their pool even faster[1].
I'm not happy to just push this problem down the stack, as it looks to me that we gain very little by doing so. VDSM itself surely stays cleaner, but the VDS/hypervisor hosts on the whole improves just a bit: libvirt scales better, and that gives us some more room.
On the other hand, by avoiding to reissue dangerous calls, I believe we make better use of the host resources in general. Actually, the point of keeping blocked thread around is a side effect of not reattempting blocked calls. Moreover, to keep the blocked thread around has a significant benefit: we can discover at the earliest moment when it is safe again to do the blocked call, because the blocked call itself returns and we can track this event! (and of course drop the now stale result). Otherwise, if we drop the connection, we'll lose this event and we have no more option that trying again and hoping for the best[2]
I know the #B approach is not the cleanest, but I think it has slightly more appeal, especially on the libvirt depletion front.
Thoughts and comments very welcome!
+++
[1] They have extensions to management API to dinamically adjust their thread pool and/or to cancel tasks, but it is in the RHEL7.2 timeframe. [2] A crazy idea would be to do something like http://en.wikipedia.org/wiki/Exponential_backoff which I'm not sure would be beneficial
Bests and thanks,
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

From: "Nir Soffer" <nsoffer@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com> Cc: "Francesco Romani" <fromani@redhat.com>, devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Sunday, July 6, 2014 6:21:35 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
----- Original Message -----
From: "Saggi Mizrahi" <smizrahi@redhat.com> To: "Francesco Romani" <fromani@redhat.com> Cc: devel@ovirt.org Sent: Sunday, July 6, 2014 2:58:38 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
The more I think about it the more it looks like it's purely a libvirt issue. As long as we can make calls that get stuck on D state we can't scale under stress.
We can scale, but if all libvirt worker threads are blocked, we won't reach very far :-)
Does libvirt try to handle such blocked threads? Nope, that is why we won't scale as we just get stuck on libvirt.
In any case, seems like having an interface like.
libvirtConnectionPool.request(args, callback)
Would be a better solution than a thread pool.
It would queue up the request and call the callback once it's done.
pseudo example:
def collectStats(): def callback(resp): doStuff(resp) threading.Timer(4, collectStats)
This starts a new thread for each sample, which is even worse then having long living thread per vm which Francesco is trying to eliminate. We could implement a Timer like class that uses a single preallocated thread. It would only run short operations and queue things in a threadpool in case of long operations. That way it's unrelated to the threadpool and can be used for any use case where we want to queue an event.
We have two issues here: 1. Scalability - can be improved regardless of libvirt 2. Handling stuck calls - may require solution on the libvirt side
Did you see this? https://bugzilla.redhat.com/1112684
We should stop this trend of starting zillions of threads and use more efficient and maintainable solutions. agreed.
lcp.listAllDevices(callback)
We could have the timer queue in a threadpool since it normally runs in the main thread but that is orthogonal to the libvirt connection issue.
As for the thread pool itself. As long as it's more than one class it's too complicated.
Things like: * Task Cancelling * Re-queuing (periodic operations)
Shouldn't be part of the thread pool.
+1
Task should just be functions. If we need something with a state we could use the __call__() method to make an object look like a function.
+1
I also don't mind doing if hasattr(task, "run") or callable(task) to handle it using an "interface"
Please don't - there is no reason why related object cannot have common interface like __call__. I agree, but some people don't like it and despite my reputation I do try and avoid pointless bike-shedding
Re-queuing could be done with the build it threading.Timer as in
threading.Timer(4.5, threadpool.queue, args=(self,))
That way each operation is responsible for handing the details of rescheduling. Should we always wait X, should we wait X - timeItTookToCalculate.
You could also do: threading.Timer(2, threadpool.queue, args=(self,)) threading.Timer(4, self.handleBeingLate)
Which would handle not getting queued for a certain amount of time.
Task cancelling can't be done in a generic manner. The most I think we could do is have threadpool.stop() check hassattr("stop", task) and call it.
Please don't I agree that forcing "jobs" to stop should be handled outside
----- Original Message ----- the threadpool. Again, just trying to find a common ground.
----- Original Message -----
From: "Francesco Romani" <fromani@redhat.com> To: devel@ovirt.org Sent: Friday, July 4, 2014 5:48:59 PM Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
Hi,
Nir has begun reviewing my draft patches about the thread pool and sampling refactoring (thanks!), and already suggested quite some improvements which I'd like to summarize
Quick links to the ongoing discussion: http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
Quick summary of the discussion on gerrit so far: 1. extract the scheduling logic from the thread pool. Either add a separate scheduler class or let the sampling task reschedule themselves after a succesfull completion. In any way the concept of 'periodic task', and the added complexity, isn't needed.
2. drop all the *queue classes I've added, thus making the package simpler. They are no longer needed since we remove the concept of periodic task.
3. have per-task timeout, move the stuck task detection elsewhere, like in the worker thread, ot maybe better in the aforementioned scheduler. If the scheduler finds that any task started in the former pass (or even before!) has not yet completed, there is no point in keeping this task alive and it should be cancelled.
4. the sampling task (or maybe the scheduler) can be smarter and halting the sample in presence of not responding calls for a given VM, granted the VM reports its 'health'/responsiveness.
(Hopefully I haven't forgot anything big)
In the draft currently published, I reluctantly added the *queue classes and I agree the periodic task implementation is messy, so I'll be very happy to drop them.
However, a core question still holds: what to do in presence of the stuck task?
I think it is worth to discuss this topic on a medium friendlier than gerrit, as it is the single most important decision to make in the sampling refactoring.
It all boils down to: Should we just keep somewhere stuck threads and wait? Should we cancel stuck tasks?
A. Let's cancel the stuck tasks. If we move toward a libvirt connection pool, and we give each worker thread in the sampling pool a separate libvirt connection, hopefully read-only, then we should be able to cancel stuck task by killing the worker's libvirt connection. We'll still need a (probably much simpler) watchman/supervisor, but no big deal here. Libvirt allows to close a connection from a different thread. I haven't actually tried to unstuck a blocked thread this way, but I have no reason to believe it will not work.
B. Let's keep around blocked threads The code as it is just leaves a blocked libvirt call and the worker thread that carried it frozen. The stuck worker thread can be replaced up to a cap of frozen threads. In this worst case scenario, we end up with one (blocked!) thread per VM, as it is today, and with no sampling data.
I believe that #A has some drawbacks which we risk to overlook, and on the same time #B has some merits.
Let me explain: The hardest case is a call blocked in the kernel in D state. Libvirt has no more room than VDSM to unblock it; and libvirt itself *has* a pool of resources (threads in this case) which can be depleted by stuck calls. Actually, retrying to do a failed task may deplete their pool even faster[1].
I'm not happy to just push this problem down the stack, as it looks to me that we gain very little by doing so. VDSM itself surely stays cleaner, but the VDS/hypervisor hosts on the whole improves just a bit: libvirt scales better, and that gives us some more room.
On the other hand, by avoiding to reissue dangerous calls, I believe we make better use of the host resources in general. Actually, the point of keeping blocked thread around is a side effect of not reattempting blocked calls. Moreover, to keep the blocked thread around has a significant benefit: we can discover at the earliest moment when it is safe again to do the blocked call, because the blocked call itself returns and we can track this event! (and of course drop the now stale result). Otherwise, if we drop the connection, we'll lose this event and we have no more option that trying again and hoping for the best[2]
I know the #B approach is not the cleanest, but I think it has slightly more appeal, especially on the libvirt depletion front.
Thoughts and comments very welcome!
+++
[1] They have extensions to management API to dinamically adjust their thread pool and/or to cancel tasks, but it is in the RHEL7.2 timeframe. [2] A crazy idea would be to do something like http://en.wikipedia.org/wiki/Exponential_backoff which I'm not sure would be beneficial
Bests and thanks,
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

----- Original Message -----
From: "Saggi Mizrahi" <smizrahi@redhat.com> To: "Nir Soffer" <nsoffer@redhat.com>, "Francesco Romani" <fromani@redhat.com> Cc: devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Monday, July 7, 2014 8:03:35 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com> Cc: "Francesco Romani" <fromani@redhat.com>, devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Sunday, July 6, 2014 6:21:35 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
----- Original Message -----
From: "Saggi Mizrahi" <smizrahi@redhat.com> To: "Francesco Romani" <fromani@redhat.com> Cc: devel@ovirt.org Sent: Sunday, July 6, 2014 2:58:38 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
The more I think about it the more it looks like it's purely a libvirt issue. As long as we can make calls that get stuck on D state we can't scale under stress.
We can scale, but if all libvirt worker threads are blocked, we won't reach very far :-)
Does libvirt try to handle such blocked threads? Nope, that is why we won't scale as we just get stuck on libvirt.
In any case, seems like having an interface like.
libvirtConnectionPool.request(args, callback)
Would be a better solution than a thread pool.
It would queue up the request and call the callback once it's done.
pseudo example:
def collectStats(): def callback(resp): doStuff(resp) threading.Timer(4, collectStats)
This starts a new thread for each sample, which is even worse then having long living thread per vm which Francesco is trying to eliminate. We could implement a Timer like class that uses a single preallocated thread. It would only run short operations and queue things in a threadpool in case of long operations. That way it's unrelated to the threadpool and can be used for any use case where we want to queue an event.
This is exactly what I was thinking about. It would be nice if you can review it: http://gerrit.ovirt.org/29607
We have two issues here: 1. Scalability - can be improved regardless of libvirt 2. Handling stuck calls - may require solution on the libvirt side
Did you see this? https://bugzilla.redhat.com/1112684
We should stop this trend of starting zillions of threads and use more efficient and maintainable solutions.
agreed.
lcp.listAllDevices(callback)
We could have the timer queue in a threadpool since it normally runs in the main thread but that is orthogonal to the libvirt connection issue.
As for the thread pool itself. As long as it's more than one class it's too complicated.
Things like: * Task Cancelling * Re-queuing (periodic operations)
Shouldn't be part of the thread pool.
+1
Task should just be functions. If we need something with a state we could use the __call__() method to make an object look like a function.
+1
I also don't mind doing if hasattr(task, "run") or callable(task) to handle it using an "interface"
Please don't - there is no reason why related object cannot have common interface like __call__.
I agree, but some people don't like it and despite my reputation I do try and avoid pointless bike-shedding
Re-queuing could be done with the build it threading.Timer as in
threading.Timer(4.5, threadpool.queue, args=(self,))
That way each operation is responsible for handing the details of rescheduling. Should we always wait X, should we wait X - timeItTookToCalculate.
You could also do: threading.Timer(2, threadpool.queue, args=(self,)) threading.Timer(4, self.handleBeingLate)
Which would handle not getting queued for a certain amount of time.
Task cancelling can't be done in a generic manner. The most I think we could do is have threadpool.stop() check hassattr("stop", task) and call it.
Please don't
I agree that forcing "jobs" to stop should be handled outside the threadpool. Again, just trying to find a common ground.
----- Original Message -----
From: "Francesco Romani" <fromani@redhat.com> To: devel@ovirt.org Sent: Friday, July 4, 2014 5:48:59 PM Subject: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
Hi,
Nir has begun reviewing my draft patches about the thread pool and sampling refactoring (thanks!), and already suggested quite some improvements which I'd like to summarize
Quick links to the ongoing discussion: http://gerrit.ovirt.org/#/c/29191/8/lib/threadpool/worker.py,cm http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst,cm
Quick summary of the discussion on gerrit so far: 1. extract the scheduling logic from the thread pool. Either add a separate scheduler class or let the sampling task reschedule themselves after a succesfull completion. In any way the concept of 'periodic task', and the added complexity, isn't needed.
2. drop all the *queue classes I've added, thus making the package simpler. They are no longer needed since we remove the concept of periodic task.
3. have per-task timeout, move the stuck task detection elsewhere, like in the worker thread, ot maybe better in the aforementioned scheduler. If the scheduler finds that any task started in the former pass (or even before!) has not yet completed, there is no point in keeping this task alive and it should be cancelled.
4. the sampling task (or maybe the scheduler) can be smarter and halting the sample in presence of not responding calls for a given VM, granted the VM reports its 'health'/responsiveness.
(Hopefully I haven't forgot anything big)
In the draft currently published, I reluctantly added the *queue classes and I agree the periodic task implementation is messy, so I'll be very happy to drop them.
However, a core question still holds: what to do in presence of the stuck task?
I think it is worth to discuss this topic on a medium friendlier than gerrit, as it is the single most important decision to make in the sampling refactoring.
It all boils down to: Should we just keep somewhere stuck threads and wait? Should we cancel stuck tasks?
A. Let's cancel the stuck tasks. If we move toward a libvirt connection pool, and we give each worker thread in the sampling pool a separate libvirt connection, hopefully read-only, then we should be able to cancel stuck task by killing the worker's libvirt connection. We'll still need a (probably much simpler) watchman/supervisor, but no big deal here. Libvirt allows to close a connection from a different thread. I haven't actually tried to unstuck a blocked thread this way, but I have no reason to believe it will not work.
B. Let's keep around blocked threads The code as it is just leaves a blocked libvirt call and the worker thread that carried it frozen. The stuck worker thread can be replaced up to a cap of frozen threads. In this worst case scenario, we end up with one (blocked!) thread per VM, as it is today, and with no sampling data.
I believe that #A has some drawbacks which we risk to overlook, and on the same time #B has some merits.
Let me explain: The hardest case is a call blocked in the kernel in D state. Libvirt has no more room than VDSM to unblock it; and libvirt itself *has* a pool of resources (threads in this case) which can be depleted by stuck calls. Actually, retrying to do a failed task may deplete their pool even faster[1].
I'm not happy to just push this problem down the stack, as it looks to me that we gain very little by doing so. VDSM itself surely stays cleaner, but the VDS/hypervisor hosts on the whole improves just a bit: libvirt scales better, and that gives us some more room.
On the other hand, by avoiding to reissue dangerous calls, I believe we make better use of the host resources in general. Actually, the point of keeping blocked thread around is a side effect of not reattempting blocked calls. Moreover, to keep the blocked thread around has a significant benefit: we can discover at the earliest moment when it is safe again to do the blocked call, because the blocked call itself returns and we can track this event! (and of course drop the now stale result). Otherwise, if we drop the connection, we'll lose this event and we have no more option that trying again and hoping for the best[2]
I know the #B approach is not the cleanest, but I think it has slightly more appeal, especially on the libvirt depletion front.
Thoughts and comments very welcome!
+++
[1] They have extensions to management API to dinamically adjust their thread pool and/or to cancel tasks, but it is in the RHEL7.2 timeframe. [2] A crazy idea would be to do something like http://en.wikipedia.org/wiki/Exponential_backoff which I'm not sure would be beneficial
Bests and thanks,
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

----- Original Message -----
From: "Nir Soffer" <nsoffer@redhat.com> To: "Saggi Mizrahi" <smizrahi@redhat.com> Cc: "Francesco Romani" <fromani@redhat.com>, devel@ovirt.org, "Federico Simoncelli" <fsimonce@redhat.com>, "Michal Skrivanek" <mskrivan@redhat.com> Sent: Monday, July 7, 2014 11:04:37 PM Subject: Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
We could implement a Timer like class that uses a single preallocated thread. It would only run short operations and queue things in a threadpool in case of long operations. That way it's unrelated to the threadpool and can be used for any use case where we want to queue an event.
This is exactly what I was thinking about.
It would be nice if you can review it: http://gerrit.ovirt.org/29607
My review is almost done, but I'd like to bring the discussion in a place more friendly and visible than gerrit. Scheduler (29607) It is a nice code and could be an useful building block, but some important pieces are still out of the picture (the biggest: how to react to detect blocked calls? How to react to them?). On the other hand, I think I got all the bits in my threadpool patchset concept (29189). I surely recognize that I did in a very convoluted and unclear manner, but I'm not sure what will lead to the best outcome between to build on 29607 and add the missing pieces, or to deentangle 29189 and remove all the cruft from there. I think it could be useful anyway to map the concepts in play on this effort. At very least, this can act as summary/recap and can help us to converge on a shared vocabulary for the further discussion, so let me try. Tyring to be semi-formal: * Sampling: is a snippet of code, usually a method of the Vm class, that updates a set of fields of a Vm object, describing the state of a running Vm, with fresher data. This involves one or more calls to libvirt. * VM Samplings: Synonims "Samplings for a VM" or "VM sampling set" all the samplings that a VM need to fully update its state. It is worth to be noted that the samplings are 1. periodic 2. with different periods from each other because some data change more frequently than other * Safeness/Unsafeness In the context of the libvirt calls, and JUST and ONLY from the PoV of the sampling responsiveness, is useful to distinguish between 1. safe: except for yet to be seen catastrophic failures, they don't get stuck or block for indefinite amount of time 2. unsafe: they need to enter in the QEMU monitor and/or touch storage, and we *know* they can block, likely because it happened in the past. * Scheduler Is the agent which decides when to invoke a sampling of a VM. A Scheduler *must* know about wall time and about samplings intervals. * Worker Is the runnable entity (usually, but not necessarily a python thread) which perform a sampling. In the current solution, we have 1. One thread per VM which acts both as worker and as scheduler. 2. The scheduler logic is implicit and mixed with the worker code (see vdsm/virt/sampling.py - AdvancedStatsThread.collect() - sampling.py:~432) 3. The worker/scheduler doesn't handle unsafe calls in any way. 4. Each worker/scheduler carries all the VM samplings. 5. As implict side effect of 3 and 4 above, if one sampling blocks, all the VM samplings block automatically. There is some good in this behaviour we may want to preserve 6. As side effect of 1 above, all the VM samplings are independent which each other. If one blocks, the other (try to) continue to run. This is actually a nice behaviour we want to preserve. Drawbacks: a. len(worker) == len(VMs) -> do not scale b. no detection of unsafe calls blocking c. no reaction (including just signaling) of unsafe calls blocking Niceties: a. VM samplings are indpendent from each other b. VM samplings are related and part of a group In our ideal solution should have: 1. Fixed and low-ish (~10?) number of threads. Scalable to hundreds/thousands of VM. Must scale roughly like libvirt does (e.g. not be a bottleneck) 2. Explicit scheduler logic, deentangled from the worker 3. Handling of unsafe calls. Detection of a blocked call, and propagate its consequences on the VM health. At least mark it as not-responding. Better: stop (unsafe only?) sampling until the VM returns healthy. 4. Grouping among sampling to allow extension. If an unsafe sampling failed, should all the other samplings for different VMs continue to run? What about the other unsafe samplings of the same VM? Maybe not immediate need, but nice to have in near future 5. Do not waste libvirt resources nor just shift the burden on it. Do not issue calls when we don't know they can fail. Do not deplete the libvirt resource pool. Long story short: do not disconnect/reconnect, be nice with libvirt :) +++ Since my mail is already too long, I'd just stop here and ask if my above analysis is right, especially the description of the ideal solution, and/or if I missed something big. Thanks and bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani
participants (3)
-
Francesco Romani
-
Nir Soffer
-
Saggi Mizrahi