callback hanging making commands executing serial

Hi all, Investigating a bug about serial execution <https://bugzilla.redhat.com/1885286> of the same command more than once shows a problem with our callbacks. The problem is felt when we have synchronized operation (such as running ansible runner service) that takes a bit time to be executed. The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all the commands running on the engine using callbacks. The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation. Some technical details: We have one thread in CommandCallbacksPoller[1] that runs, collects the current command on the engine and processes them. Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. [2] In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. [3] While there are more actions to perform we will do performNextOperation [4], which calls executeNextOperation (in the bug case [5]). When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working. Any idea how to solve this issue? Regards, Liran [1] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [2] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [3] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [4] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [5] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...

Adding Artur ... On Mon, Oct 19, 2020 at 9:06 AM Liran Rotenberg <lrotenbe@redhat.com> wrote:
Hi all, Investigating a bug about serial execution <https://bugzilla.redhat.com/1885286> of the same command more than once shows a problem with our callbacks.
The problem is felt when we have synchronized operation (such as running ansible runner service) that takes a bit time to be executed. The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all the commands running on the engine using callbacks. The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation.
Some technical details: We have one thread in CommandCallbacksPoller[1] that runs, collects the current command on the engine and processes them. Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. [2] In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. [3] While there are more actions to perform we will do performNextOperation [4], which calls executeNextOperation (in the bug case [5]). When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working.
Any idea how to solve this issue?
Regards, Liran
[1] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [2] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [3] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [4] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [5] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

IMHO (without knowing well the ansible related stuff), a sensible solution would be to make CreateOva async, this would require allowing to use AnsibleExecutor in an async manner, but this seems possible since AnsibleExecutor#runCommand already performs polling. So if the polling is extracted and will instead run in a callback of CreateOvaCommand (similar to VM jobs handling in MergeCommand) it should work properly. But then again, I'm not very familiar with the ansible stuff so it might not be feasible. On Mon, Oct 19, 2020 at 10:27 AM Martin Perina <mperina@redhat.com> wrote:
Adding Artur ...
On Mon, Oct 19, 2020 at 9:06 AM Liran Rotenberg <lrotenbe@redhat.com> wrote:
Hi all, Investigating a bug about serial execution of the same command more than once shows a problem with our callbacks.
The problem is felt when we have synchronized operation (such as running ansible runner service) that takes a bit time to be executed. The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all the commands running on the engine using callbacks. The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation.
Some technical details: We have one thread in CommandCallbacksPoller[1] that runs, collects the current command on the engine and processes them. Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. [2] In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. [3] While there are more actions to perform we will do performNextOperation [4], which calls executeNextOperation (in the bug case [5]). When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working.
Any idea how to solve this issue?
Regards, Liran
[1] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [2] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [3] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [4] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [5] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o. _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/GBOXNWVC3ATCCG...

On Mon, Oct 19, 2020 at 10:44 AM Benny Zlotnik <bzlotnik@redhat.com> wrote:
IMHO (without knowing well the ansible related stuff), a sensible solution would be to make CreateOva async, this would require allowing to use AnsibleExecutor in an async manner, but this seems possible since AnsibleExecutor#runCommand already performs polling. So if the polling is extracted and will instead run in a callback of CreateOvaCommand (similar to VM jobs handling in MergeCommand) it should work properly. But then again, I'm not very familiar with the ansible stuff so it might not be feasible.
+1 That's neat! It wasn't possible before when we invoked the playbook with ProcessBuilder but now that we use Ansible-runner, that sounds like the right approach to me (which minimizes the execution time of the 'execute' phase so it could also fix [1]) [1] https://bugzilla.redhat.com/show_bug.cgi?id=1797553
On Mon, Oct 19, 2020 at 10:27 AM Martin Perina <mperina@redhat.com> wrote:
Adding Artur ...
On Mon, Oct 19, 2020 at 9:06 AM Liran Rotenberg <lrotenbe@redhat.com>
Hi all, Investigating a bug about serial execution of the same command more
wrote: than once shows a problem with our callbacks.
The problem is felt when we have synchronized operation (such as
running ansible runner service) that takes a bit time to be executed.
The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all the commands running on the engine using callbacks. The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation.
Some technical details: We have one thread in CommandCallbacksPoller[1] that runs, collects the current command on the engine and processes them. Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. [2] In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. [3] While there are more actions to perform we will do performNextOperation [4], which calls executeNextOperation (in the bug case [5]). When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working.
Any idea how to solve this issue?
Regards, Liran
[1] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [2] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [3] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [4] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [5] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o. _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/GBOXNWVC3ATCCG...
Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/W45G7FUW736ODB...

On Mon, Oct 19, 2020 at 2:58 PM Arik Hadas <ahadas@redhat.com> wrote:
On Mon, Oct 19, 2020 at 10:44 AM Benny Zlotnik <bzlotnik@redhat.com> wrote:
IMHO (without knowing well the ansible related stuff), a sensible solution would be to make CreateOva async, this would require allowing to use AnsibleExecutor in an async manner, but this seems possible since AnsibleExecutor#runCommand already performs polling. So if the polling is extracted and will instead run in a callback of CreateOvaCommand (similar to VM jobs handling in MergeCommand) it should work properly. But then again, I'm not very familiar with the ansible stuff so it might not be feasible.
+1 That's neat! It wasn't possible before when we invoked the playbook with ProcessBuilder but now that we use Ansible-runner, that sounds like the right approach to me (which minimizes the execution time of the 'execute' phase so it could also fix [1])
I will just say that yes - making ansible execute async sounds like a good approach to solve the specific command. I think that doing so is necessary, but it won't solve the callback problem. On each child command that has a synchronized part it will hang the other commands on the engine. If it happens quickly it might be a very short hang, otherwise it can be felt. Even without feeling it as a user, it probably gives less performance from the engine side monitoring parallel commands. I don't have such a case in my head right now(non ansible, sync child command), but it is possible we have some and will have in the future.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1797553
On Mon, Oct 19, 2020 at 10:27 AM Martin Perina <mperina@redhat.com> wrote:
Adding Artur ...
On Mon, Oct 19, 2020 at 9:06 AM Liran Rotenberg <lrotenbe@redhat.com>
Hi all, Investigating a bug about serial execution of the same command more
The problem is felt when we have synchronized operation (such as
running ansible runner service) that takes a bit time to be executed.
The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all
The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation.
Some technical details: We have one thread in CommandCallbacksPoller[1] that runs, collects
Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. [2] In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. [3] While there are more actions to perform we will do
wrote: than once shows a problem with our callbacks. the commands running on the engine using callbacks. the current command on the engine and processes them. performNextOperation [4], which calls executeNextOperation (in the bug case [5]).
When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working.
Any idea how to solve this issue?
Regards, Liran
[1] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [2] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [3] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [4] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [5] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o. _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/GBOXNWVC3ATCCG...
Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/W45G7FUW736ODB...

I think this is a general problem of mixing sync and async operations, not a problem restricted to CoCo. Async frameworks usually have helpers for executing blocking operations to prevent them from blocking the scheduler, it might be useful to have one too, but this should probably be used only when converting to async is not feasible On Tue, Oct 20, 2020 at 3:53 PM Liran Rotenberg <lrotenbe@redhat.com> wrote:
On Mon, Oct 19, 2020 at 2:58 PM Arik Hadas <ahadas@redhat.com> wrote:
On Mon, Oct 19, 2020 at 10:44 AM Benny Zlotnik <bzlotnik@redhat.com> wrote:
IMHO (without knowing well the ansible related stuff), a sensible solution would be to make CreateOva async, this would require allowing to use AnsibleExecutor in an async manner, but this seems possible since AnsibleExecutor#runCommand already performs polling. So if the polling is extracted and will instead run in a callback of CreateOvaCommand (similar to VM jobs handling in MergeCommand) it should work properly. But then again, I'm not very familiar with the ansible stuff so it might not be feasible.
+1 That's neat! It wasn't possible before when we invoked the playbook with ProcessBuilder but now that we use Ansible-runner, that sounds like the right approach to me (which minimizes the execution time of the 'execute' phase so it could also fix [1])
I will just say that yes - making ansible execute async sounds like a good approach to solve the specific command. I think that doing so is necessary, but it won't solve the callback problem. On each child command that has a synchronized part it will hang the other commands on the engine. If it happens quickly it might be a very short hang, otherwise it can be felt. Even without feeling it as a user, it probably gives less performance from the engine side monitoring parallel commands. I don't have such a case in my head right now(non ansible, sync child command), but it is possible we have some and will have in the future.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1797553
On Mon, Oct 19, 2020 at 10:27 AM Martin Perina <mperina@redhat.com> wrote:
Adding Artur ...
On Mon, Oct 19, 2020 at 9:06 AM Liran Rotenberg <lrotenbe@redhat.com>
Hi all, Investigating a bug about serial execution of the same command more
The problem is felt when we have synchronized operation (such as
running ansible runner service) that takes a bit time to be executed.
The problem is, when running multiple commands and one has synchronized long operation within a child command, it will hang out all
The good example is given in the bug, since export OVA command using ansible(sync) and the pack_ova script takes time to finish. This is not really noticeable with short synchronized commands, but it does execute serial for them as well instead of parallel. Running export to OVA command and then other commands with callbacks will get them hanging if the export command callback started and it reached to performNextOperation.
Some technical details: We have one thread in CommandCallbacksPoller[1] that runs, collects
Once we have the above scenario (let's say 2 commands), the first one will go into callback.doPolling in invokeCallbackMethodsImpl. [2] In that case it will go to ChildCommandsCallbackBase::doPolling, eventually to childCommandsExecutingEnded. [3] While there are more actions to perform we will do
wrote: than once shows a problem with our callbacks. the commands running on the engine using callbacks. the current command on the engine and processes them. performNextOperation [4], which calls executeNextOperation (in the bug case [5]).
When the next operation is long and synchronized, this will block the CommandCallbacksPoller thread and only when it finishes the thread is released and the callbacks continue working.
Any idea how to solve this issue?
Regards, Liran
[1] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [2] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [3] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [4] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... [5] - https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o. _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/GBOXNWVC3ATCCG...
Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/W45G7FUW736ODB...
participants (4)
-
Arik Hadas
-
Benny Zlotnik
-
Liran Rotenberg
-
Martin Perina