WFLYCTL0348: Timeout after [300] seconds waiting for service container stability

1 Jan 2018

      Hi all,

Can someone please help debug this failure?

It happens to me when deploying hosted-engine --ansible using
ovirt-system-tests. So far happened to me once when tried manually and once
in jenkins [1] (/var/log/ovirt-engine) [2] (server.log):

2018-01-01 08:19:14,375-05 INFO  [org.jboss.as.clustering.infinispan]
(ServerService Thread Pool -- 57) WFLYCLINF0002: Started timeout-base
cache from ovirt-engine container
2018-01-01 08:23:49,407-05 ERROR
[org.jboss.as.controller.management-operation] (Controller Boot
Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service
container stability. Operation will roll back. Step that first updated
the service container was 'add' at address '[
    ("core-service" => "management"),
    ("management-interface" => "native-interface")
]'

On my own system, HE-HA-agent gave up, stopped the machine, started it, and
then the engine did start well. In jenkins, we do not give it enough time
for this. Now pushed a patch [3] to increase the timeout to 20 minutes
(from 10), but this is just to see if it works then. For now, at least.

Thanks,

[1]
http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_6...

[2]
http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_6...

[3] https://gerrit.ovirt.org/85856
-- 
Didi

Yedidyah Bar David

Yedidyah Bar David

Yedidyah Bar David

Martin Perina

tags

participants (2)