WFLYCTL0348: Timeout after [300] seconds waiting for service container stability

Hi all, Can someone please help debug this failure? It happens to me when deploying hosted-engine --ansible using ovirt-system-tests. So far happened to me once when tried manually and once in jenkins [1] (/var/log/ovirt-engine) [2] (server.log): 2018-01-01 08:19:14,375-05 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 57) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container 2018-01-01 08:23:49,407-05 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[ ("core-service" => "management"), ("management-interface" => "native-interface") ]' On my own system, HE-HA-agent gave up, stopped the machine, started it, and then the engine did start well. In jenkins, we do not give it enough time for this. Now pushed a patch [3] to increase the timeout to 20 minutes (from 10), but this is just to see if it works then. For now, at least. Thanks, [1] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_6... [2] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_6... [3] https://gerrit.ovirt.org/85856 -- Didi

Please ignore for now, I think I found the reason. Sorry for the noise. On Mon, Jan 1, 2018 at 3:47 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
Can someone please help debug this failure?
It happens to me when deploying hosted-engine --ansible using ovirt-system-tests. So far happened to me once when tried manually and once in jenkins [1] (/var/log/ovirt-engine) [2] (server.log):
2018-01-01 08:19:14,375-05 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 57) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container 2018-01-01 08:23:49,407-05 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[ ("core-service" => "management"), ("management-interface" => "native-interface") ]'
On my own system, HE-HA-agent gave up, stopped the machine, started it, and then the engine did start well. In jenkins, we do not give it enough time for this. Now pushed a patch [3] to increase the timeout to 20 minutes (from 10), but this is just to see if it works then. For now, at least.
Thanks,
[1] http://jenkins.ovirt.org/job/ovirt-system-tests_master_ check-patch-el7-x86_64/3223/artifact/exported-artifacts/ he-basic-ansible-suite-master__logs/test_logs/he-basic- ansible-suite-master/post-002_bootstrap.py/lago-he-basic- ansible-suite-master-engine/_var_log/ovirt-engine/
[2] http://jenkins.ovirt.org/job/ovirt-system-tests_master_ check-patch-el7-x86_64/3223/artifact/exported-artifacts/ he-basic-ansible-suite-master__logs/test_logs/he-basic- ansible-suite-master/post-002_bootstrap.py/lago-he-basic- ansible-suite-master-engine/_var_log/ovirt-engine/server.log
[3] https://gerrit.ovirt.org/85856 -- Didi
-- Didi

No, not fixed yet. Can someone please have a look? Thanks. On Mon, Jan 1, 2018 at 3:55 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Please ignore for now, I think I found the reason. Sorry for the noise.
On Mon, Jan 1, 2018 at 3:47 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
Can someone please help debug this failure?
It happens to me when deploying hosted-engine --ansible using ovirt-system-tests. So far happened to me once when tried manually and once in jenkins [1] (/var/log/ovirt-engine) [2] (server.log):
2018-01-01 08:19:14,375-05 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 57) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container 2018-01-01 08:23:49,407-05 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[ ("core-service" => "management"), ("management-interface" => "native-interface") ]'
On my own system, HE-HA-agent gave up, stopped the machine, started it, and then the engine did start well. In jenkins, we do not give it enough time for this. Now pushed a patch [3] to increase the timeout to 20 minutes (from 10), but this is just to see if it works then. For now, at least.
Thanks,
[1] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check -patch-el7-x86_64/3223/artifact/exported-artifacts/he-basic- ansible-suite-master__logs/test_logs/he-basic-ansible- suite-master/post-002_bootstrap.py/lago-he-basic-ansible- suite-master-engine/_var_log/ovirt-engine/
[2] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check -patch-el7-x86_64/3223/artifact/exported-artifacts/he-basic- ansible-suite-master__logs/test_logs/he-basic-ansible- suite-master/post-002_bootstrap.py/lago-he-basic-ansible- suite-master-engine/_var_log/ovirt-engine/server.log
[3] https://gerrit.ovirt.org/85856 -- Didi
-- Didi
-- Didi

Tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1528292 On Tue, Jan 2, 2018 at 10:51 AM, Yedidyah Bar David <didi@redhat.com> wrote:
No, not fixed yet. Can someone please have a look? Thanks.
On Mon, Jan 1, 2018 at 3:55 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Please ignore for now, I think I found the reason. Sorry for the noise.
On Mon, Jan 1, 2018 at 3:47 PM, Yedidyah Bar David <didi@redhat.com> wrote:
Hi all,
Can someone please help debug this failure?
It happens to me when deploying hosted-engine --ansible using ovirt-system-tests. So far happened to me once when tried manually and once in jenkins [1] (/var/log/ovirt-engine) [2] (server.log):
2018-01-01 08:19:14,375-05 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 57) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container 2018-01-01 08:23:49,407-05 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[ ("core-service" => "management"), ("management-interface" => "native-interface") ]'
On my own system, HE-HA-agent gave up, stopped the machine, started it, and then the engine did start well. In jenkins, we do not give it enough time for this. Now pushed a patch [3] to increase the timeout to 20 minutes (from 10), but this is just to see if it works then. For now, at least.
Thanks,
[1] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check -patch-el7-x86_64/3223/artifact/exported-artifacts/he-basic- ansible-suite-master__logs/test_logs/he-basic-ansible-suite- master/post-002_bootstrap.py/lago-he-basic-ansible-suite- master-engine/_var_log/ovirt-engine/
[2] http://jenkins.ovirt.org/job/ovirt-system-tests_master_check -patch-el7-x86_64/3223/artifact/exported-artifacts/he-basic- ansible-suite-master__logs/test_logs/he-basic-ansible-suite- master/post-002_bootstrap.py/lago-he-basic-ansible-suite- master-engine/_var_log/ovirt-engine/server.log
[3] https://gerrit.ovirt.org/85856 -- Didi
-- Didi
-- Didi
_______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
-- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o.
participants (2)
-
Martin Perina
-
Yedidyah Bar David