On Fri, Dec 22, 2017 at 12:06 PM, Martin Perina <mperina@redhat.com> wrote:

On Fri, Dec 22, 2017 at 11:59 AM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:

2017-12-22 11:44 GMT+01:00 Gianluca Cecchi <gianluca.cecchi@gmail.com>:
On Thu, Dec 21, 2017 at 2:35 PM, Sandro Bonazzola <sbonazzo@redhat.com> wrote:
Hi,
now that oVirt 4.2.0 has been released, we're starting to see some reports about issues that for now are related to not so common deployments.
We'd also like to get some feedback from those who upgraded to this amazing release without any issue and add these positive feedback under our developers (digital) Christmas tree as a gift for the effort put in this release.
Looking forward to your positive reports!

Not having positive feedback? Let us know too!
We are putting an effort in the next weeks to promptly assist whoever hit troubles during or after the upgrade. Let us know in this users@ovirt.org mailing list (preferred) or on IRC using irc.oftc.net server and #ovirt channel.

We are also closely monitoring bugzilla.redhat.com for new bugs on oVirt project, so you can report issues there as well.

Thanks,
--
SANDRO BONAZZOLA

Hi Sandro,
nice to see final 4.2!

I successfully update a test/lab nested HCI cluster from oVirt 4.1.7 + Gluster 3.10 to oVirt 4.2 + Gluster 3.12 (automatically picked by the upgrade)
3 hosts with CentOS 7.4

Thanks for the report Gianluca!

Basically following here:
https://ovirt.org/documentation/how-to/hosted-engine/#upgrade-hosted-engine

steps 5,6 substituted by reboot of the upgraded host.

My running C6 VM had no downtime during upgrade of the three hosts

Only problem I registered was the first start of engine on the first upgraded host (that should be step 7 in link above), where I got error on engine (not able to access web admin portal); I see this in server.log
(see full attach here https://drive.google.com/file/d/1UQAllZfjueVGkXDsBs09S7THGDFn9YPa/view?usp=sharing )

2017-12-22 00:40:17,674+01 INFO [org.quartz.core.QuartzScheduler] (ServerService Thread Pool -- 63) Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2017-12-22 00:40:17,682+01 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 63) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container
2017-12-22 00:44:28,611+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
("core-service" => "management"),
("management-interface" => "native-interface")
]'

Adding Vaclav, maybe something in Wildfly? Martin, any hint on engine side?

Yeah, I've already seen such error a few times, it usually happens when access to storage is really slow or the host itself is overloaded and WildFly is not able to startup properly until default 300 seconds interval is over.

If this is going to happen often, we will have to raise that timeout for all installations.

This is under investigation here:

https://bugzilla.redhat.com/show_bug.cgi?id=1528292

We had more than one evidence in fresh installs with the new ansible flow but, up to now, no evidence on upgrades.

2017-12-22 00:44:28,722+01 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 65) WFLYUT0022: Unregistered web context: '/ovirt-engine/apidoc' from server 'default-server'

Then I restarted the hosted engine vm on the same first upgraded host and it was able now to correctly start and web admin portal ok. The corresponding lines in server.log had become:

2017-12-22 00:48:17,536+01 INFO [org.quartz.core.QuartzScheduler] (ServerService Thread Pool -- 63) Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2017-12-22 00:48:17,545+01 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 63) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container
2017-12-22 00:48:23,248+01 INFO [org.jboss.as.server] (ServerService Thread Pool -- 25) WFLYSRV0010: Deployed "ovirt-web-ui.war" (runtime-name : "ovirt-web-ui.war")
2017-12-22 00:48:23,248+01 INFO [org.jboss.as.server] (ServerService Thread Pool -- 25) WFLYSRV0010: Deployed "restapi.war" (runtime-name : "restapi.war")
2017-12-22 00:48:23,248+01 INFO [org.jboss.as.server] (ServerService Thread Pool -- 25) WFLYSRV0010: Deployed "engine.ear" (runtime-name : "engine.ear")
2017-12-22 00:48:23,248+01 INFO [org.jboss.as.server] (ServerService Thread Pool -- 25) WFLYSRV0010: Deployed "apidoc.war" (runtime-name : "apidoc.war")
2017-12-22 00:48:24,175+01 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
2017-12-22 00:48:24,219+01 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://127.0.0.1:8706/management

I also updated cluster and DC level to 4.2.

Gianluca

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

--
SANDRO BONAZZOLA
ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D
Red Hat EMEA
TRIED. TESTED. TRUSTED.

--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users