Health check endpoint in the engine hanging forever on CheckDBConnection

Hi, I'm trying out (I think) the latest build of ovirt-engine in OST [1] and the basic suite fails when we do engine reconfiguration and then restart the service [2]. After restarting we wait on the health check endpoint status here [3]. This however ends with a timeout. I tried running manually: curl -D - http://engine/ovirt-engine/services/health and that command also hangs forever. In the engine log the last related entry seem to be: 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default task-7) [] Incoming locale 'en-US'. Filter determined locale to be 'en-US' 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Health Status servlet: entry 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Calling CheckDBConnection query Has anyone else also encountered that? Regards, Marcin [1] ovirt-engine-4.4.4.3-0.0.master.20201126133903.gitc2c805a2662.el8.noarch [2] https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... [3] https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...

Hi, the health status is pretty stupid simple call to database: https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se... https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da... https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common... So it should definitely not hang forever unless there is some serious issue in the engine start up or PostgreSQL database. Could you please share logs? Especially interesting would server.log and engine.log from /var/log/ovirt-engine Martin On Thu, Nov 26, 2020 at 4:24 PM Marcin Sobczyk <msobczyk@redhat.com> wrote:
Hi,
I'm trying out (I think) the latest build of ovirt-engine in OST [1] and the basic suite fails when we do engine reconfiguration and then restart the service [2]. After restarting we wait on the health check endpoint status here [3]. This however ends with a timeout. I tried running manually:
curl -D - http://engine/ovirt-engine/services/health
and that command also hangs forever. In the engine log the last related entry seem to be:
2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default task-7) [] Incoming locale 'en-US'. Filter determined locale to be 'en-US' 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Health Status servlet: entry 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Calling CheckDBConnection query
Has anyone else also encountered that?
Regards, Marcin
[1] ovirt-engine-4.4.4.3-0.0.master.20201126133903.gitc2c805a2662.el8.noarch [2]
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... [3]
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

Hi,
the health status is pretty stupid simple call to database:
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/services/src/main/java/org/ovirt/engine/core/services/HealthStatus.java> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/CheckDBConnectionQuery.java#L21> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/dal/src/main/java/org/ovirt/engine/core/dal/dbbroker/DbConnectionUtil.java#L33> https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common... <https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common_sp.sql#L421>
So it should definitely not hang forever unless there is some serious issue in the engine start up or PostgreSQL database. Could you please share logs? Especially interesting would server.log and engine.log from /var/log/ovirt-engine Well, after some discussion with Artur and trying some workarounds, the
On 11/27/20 11:24 AM, Martin Perina wrote: problem magically disappeared on my servers, but there's one OST gating run in CI that suffered from the same problem: https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-system-tests_gate... https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui... https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui...
Martin
On Thu, Nov 26, 2020 at 4:24 PM Marcin Sobczyk <msobczyk@redhat.com <mailto:msobczyk@redhat.com>> wrote:
Hi,
I'm trying out (I think) the latest build of ovirt-engine in OST [1] and the basic suite fails when we do engine reconfiguration and then restart the service [2]. After restarting we wait on the health check endpoint status here [3]. This however ends with a timeout. I tried running manually:
curl -D - http://engine/ovirt-engine/services/health <http://engine/ovirt-engine/services/health>
and that command also hangs forever. In the engine log the last related entry seem to be:
2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default task-7) [] Incoming locale 'en-US'. Filter determined locale to be 'en-US' 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Health Status servlet: entry 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Calling CheckDBConnection query
Has anyone else also encountered that?
Regards, Marcin
[1] ovirt-engine-4.4.4.3-0.0.master.20201126133903.gitc2c805a2662.el8.noarch [2] https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9cda2d94e36e2be/basic-suite-master/test-scenarios/test_001_initialize_engine.py#L88> [3] https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9cda2d94e36e2be/ost_utils/ost_utils/pytest/fixtures/engine.py#L160>
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.

I can see something I was recently touching. Removal of reactive stream from vdsm-jsonrpc-java. Checking... Artur On Fri, Nov 27, 2020 at 11:47 AM Marcin Sobczyk <msobczyk@redhat.com> wrote:
On 11/27/20 11:24 AM, Martin Perina wrote:
Hi,
the health status is pretty stupid simple call to database:
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se...
< https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se...
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
< https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da...
< https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da...
< https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common...
So it should definitely not hang forever unless there is some serious issue in the engine start up or PostgreSQL database. Could you please share logs? Especially interesting would server.log and engine.log from /var/log/ovirt-engine Well, after some discussion with Artur and trying some workarounds, the
https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common... problem magically disappeared on my servers, but there's one OST gating run in CI that suffered from the same problem:
https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-system-tests_gate...
https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui...
https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui...
Martin
On Thu, Nov 26, 2020 at 4:24 PM Marcin Sobczyk <msobczyk@redhat.com <mailto:msobczyk@redhat.com>> wrote:
Hi,
I'm trying out (I think) the latest build of ovirt-engine in OST [1] and the basic suite fails when we do engine reconfiguration and then restart the service [2]. After restarting we wait on the health check endpoint status here
[3].
This however ends with a timeout. I tried running manually:
curl -D - http://engine/ovirt-engine/services/health <http://engine/ovirt-engine/services/health>
and that command also hangs forever. In the engine log the last related entry seem to be:
2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default task-7) [] Incoming locale 'en-US'. Filter determined locale to be 'en-US' 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Health Status servlet: entry 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Calling CheckDBConnection query
Has anyone else also encountered that?
Regards, Marcin
[1]
ovirt-engine-4.4.4.3-0.0.master.20201126133903.gitc2c805a2662.el8.noarch
[2]
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
<
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
[3]
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
<
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
-- Artur Socha Senior Software Engineer, RHV Red Hat

In this run there is ovirt-engine-* used with git c2c805a2662, however, vdsm-jsonrpc-client is in version 1.6.0 which has a breaking change of removing reactive streams support in favor to java.util.concurrent.FLOW [1] which is handled by engine's commit ff3aa4da956 [2] [1] https://gerrit.ovirt.org/#/c/vdsm-jsonrpc-java/+/109916/ [2] https://gerrit.ovirt.org/#/c/ovirt-engine/+/112347/ Artur On Fri, Nov 27, 2020 at 11:49 AM Artur Socha <asocha@redhat.com> wrote:
I can see something I was recently touching. Removal of reactive stream from vdsm-jsonrpc-java. Checking... Artur
On Fri, Nov 27, 2020 at 11:47 AM Marcin Sobczyk <msobczyk@redhat.com> wrote:
On 11/27/20 11:24 AM, Martin Perina wrote:
Hi,
the health status is pretty stupid simple call to database:
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se...
< https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se...
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
< https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl...
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da...
< https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da...
< https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common...
So it should definitely not hang forever unless there is some serious issue in the engine start up or PostgreSQL database. Could you please share logs? Especially interesting would server.log and engine.log from /var/log/ovirt-engine Well, after some discussion with Artur and trying some workarounds, the
https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common... problem magically disappeared on my servers, but there's one OST gating run in CI that suffered from the same problem:
https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-system-tests_gate...
https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui...
https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui...
Martin
On Thu, Nov 26, 2020 at 4:24 PM Marcin Sobczyk <msobczyk@redhat.com <mailto:msobczyk@redhat.com>> wrote:
Hi,
I'm trying out (I think) the latest build of ovirt-engine in OST [1] and the basic suite fails when we do engine reconfiguration and then restart the service [2]. After restarting we wait on the health check endpoint status here
[3].
This however ends with a timeout. I tried running manually:
curl -D - http://engine/ovirt-engine/services/health <http://engine/ovirt-engine/services/health>
and that command also hangs forever. In the engine log the last related entry seem to be:
2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default task-7) [] Incoming locale 'en-US'. Filter determined locale to be 'en-US' 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Health Status servlet: entry 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Calling CheckDBConnection query
Has anyone else also encountered that?
Regards, Marcin
[1]
ovirt-engine-4.4.4.3-0.0.master.20201126133903.gitc2c805a2662.el8.noarch
[2]
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
<
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
[3]
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
<
https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
-- Artur Socha Senior Software Engineer, RHV Red Hat
-- Artur Socha Senior Software Engineer, RHV Red Hat

It would be good to notify on devel list when there are breaking changes across multiple components, development envs usually do not update everything. If there is an actual incompatibility like in this case, it would also be good to not only require new version, but properly conflict with an old one. Currently official composes fail too because old engines have vdsm-jsonrpc-java >= 1.5.4 which pulls in 1.6.0 just fine Thanks, michal
On 27 Nov 2020, at 12:04, Artur Socha <asocha@redhat.com> wrote:
In this run there is ovirt-engine-* used with git c2c805a2662, however, vdsm-jsonrpc-client is in version 1.6.0 which has a breaking change of removing reactive streams support in favor to java.util.concurrent.FLOW [1] which is handled by engine's commit ff3aa4da956 [2]
[1] https://gerrit.ovirt.org/#/c/vdsm-jsonrpc-java/+/109916/ <https://gerrit.ovirt.org/#/c/vdsm-jsonrpc-java/+/109916/> [2] https://gerrit.ovirt.org/#/c/ovirt-engine/+/112347/ <https://gerrit.ovirt.org/#/c/ovirt-engine/+/112347/>
Artur
On Fri, Nov 27, 2020 at 11:49 AM Artur Socha <asocha@redhat.com <mailto:asocha@redhat.com>> wrote: I can see something I was recently touching. Removal of reactive stream from vdsm-jsonrpc-java. Checking... Artur
On Fri, Nov 27, 2020 at 11:47 AM Marcin Sobczyk <msobczyk@redhat.com <mailto:msobczyk@redhat.com>> wrote:
Hi,
the health status is pretty stupid simple call to database:
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/services/src/main/java/org/ovirt/engine/core/services/HealthStatus.java> <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/se... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/services/src/main/java/org/ovirt/engine/core/services/HealthStatus.java>> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/CheckDBConnectionQuery.java#L21> <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bl... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/CheckDBConnectionQuery.java#L21>> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/dal/src/main/java/org/ovirt/engine/core/dal/dbbroker/DbConnectionUtil.java#L33> <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/da... <https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/dal/src/main/java/org/ovirt/engine/core/dal/dbbroker/DbConnectionUtil.java#L33>> https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common... <https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common_sp.sql#L421> <https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common... <https://github.com/oVirt/ovirt-engine/blob/master/packaging/dbscripts/common_sp.sql#L421>>
So it should definitely not hang forever unless there is some serious issue in the engine start up or PostgreSQL database. Could you please share logs? Especially interesting would server.log and engine.log from /var/log/ovirt-engine Well, after some discussion with Artur and trying some workarounds, the
On 11/27/20 11:24 AM, Martin Perina wrote: problem magically disappeared on my servers, but there's one OST gating run in CI that suffered from the same problem:
https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-system-tests_gate... <https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-system-tests_gate/detail/ovirt-system-tests_gate/937/pipeline#step-240-log-1226>
https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui... <https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-suit-master.el7.x86_64/test_logs/basic-suite-master/lago-basic-suite-master-engine/_var_log/ovirt-engine/server.log/*view*/>
https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-sui... <https://jenkins.ovirt.org/job/ovirt-system-tests_gate/937/artifact/basic-suit-master.el7.x86_64/test_logs/basic-suite-master/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log/*view*/>
Martin
On Thu, Nov 26, 2020 at 4:24 PM Marcin Sobczyk <msobczyk@redhat.com <mailto:msobczyk@redhat.com> <mailto:msobczyk@redhat.com <mailto:msobczyk@redhat.com>>> wrote:
Hi,
I'm trying out (I think) the latest build of ovirt-engine in OST [1] and the basic suite fails when we do engine reconfiguration and then restart the service [2]. After restarting we wait on the health check endpoint status here [3]. This however ends with a timeout. I tried running manually:
curl -D - http://engine/ovirt-engine/services/health <http://engine/ovirt-engine/services/health> <http://engine/ovirt-engine/services/health <http://engine/ovirt-engine/services/health>>
and that command also hangs forever. In the engine log the last related entry seem to be:
2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.utils.servlet.LocaleFilter] (default task-7) [] Incoming locale 'en-US'. Filter determined locale to be 'en-US' 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Health Status servlet: entry 2020-11-26 16:20:15,761+01 DEBUG [org.ovirt.engine.core.services.HealthStatus] (default task-7) [] Calling CheckDBConnection query
Has anyone else also encountered that?
Regards, Marcin
[1] ovirt-engine-4.4.4.3-0.0.master.20201126133903.gitc2c805a2662.el8.noarch [2] https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9cda2d94e36e2be/basic-suite-master/test-scenarios/test_001_initialize_engine.py#L88> <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9cda2d94e36e2be/basic-suite-master/test-scenarios/test_001_initialize_engine.py#L88>> [3] https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9cda2d94e36e2be/ost_utils/ost_utils/pytest/fixtures/engine.py#L160> <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9c... <https://github.com/oVirt/ovirt-system-tests/blob/3e2fc267b376a12eda131fa0e9cda2d94e36e2be/ost_utils/ost_utils/pytest/fixtures/engine.py#L160>>
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
-- Artur Socha Senior Software Engineer, RHV Red Hat
-- Artur Socha Senior Software Engineer, RHV Red Hat _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/DTY2FJHJ3YBI4Z...
participants (4)
-
Artur Socha
-
Marcin Sobczyk
-
Martin Perina
-
Michal Skrivanek