On Fri, Dec 22, 2017 at 12:06 PM, Martin Perina <mperina@redhat.com> wrote:



Only problem I registered was the first start of engine on the first upgraded host (that should be step 7 in link above), where I got error on engine (not able to access web admin portal); I see this in server.log 

2017-12-22 00:40:17,674+01 INFO  [org.quartz.core.QuartzScheduler] (ServerService Thread Pool -- 63) Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2017-12-22 00:40:17,682+01 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 63) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container
2017-12-22 00:44:28,611+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
    ("core-service" => "management"),
    ("management-interface" => "native-interface")
]'

Adding Vaclav, maybe something in Wildfly? Martin, any hint on engine side?

​Yeah, I've already seen such error a few times, it usually happens when access to storage is really slow or the host itself is overloaded and WildFly is not able to startup properly until default 300 seconds interval​ is over.

If this is going to happen often, we will have to raise that timeout for all installations.


Ok, thanks for the confirmation of what I suspected.
Actually this particular environment is based on a single NUC where I have ESXi 6.0U2 and the 3 hosts are actually VMs of this vSphere environment, so that the hosted engine is an L2 VM
And the replica 3 (with arbiter) of the hosts insists at the end on a single physical SSD disk below...
All in all it is already a great success that this kind of infrastructure has been able to update from 4.1 to 4.2... I use it basically as a functional testing and btw on vSphere there is also another CentOS 7 VM running ;-)
 
Thanks,
Gianluca