Well, I finally upgraded all to 4.2. Unfortunately I broke a Server in the upgrade process and I needed much more time than expected(the Host was with Hosted Engine and Gluster). I will not go further on that problems because I interrupted the upgrade process and try to fix it and I ended with a kernel panic.

I recommend to use tmux or screen for the upgrade. 

My experience:

I used this tutorial for the upgrade process: https://ovirt.org/documentation/how-to/hosted-engine/#upgrade-hosted-engine

You can use it for the Hosted Engine VM and the Hosts. Be patient, don't interrupt the yum update process, and follow the instructions.
If you have a locale different than US-UTF8, please change it for US-UTF8 before the upgrade process. Where? /etc/locale.conf on CentOS. If you are using e.g. puppet, please deactivate it on upgrade process, to avoid the locale change(if you have something on puppet that changes it).

Problems:

I still having problems with glusterfs(Peer Rejected) and unstable, but it probably happens because I copied the UUID from the broke server from glusterfs and added the new installed server with same ip back to gluster.


IMHO:
Please check that you have the engine backup done. Save it somewhere, NFS, rsync-it to another server....
When running engine-setup after the yum update on ovirt-engine VM: don't to the in place upgrade from postgresql. It's really nice to have it, but you you can avoid risks, why do so?
Keep the Backup from Postgresql.

Previous Version: 4.1.x
Updated to: 4.2

Setup:

CentOS 7.4.108
4 Servers, 3 with Gluster for engine.

If you have questions....

Best Regards,

Gabriel


Gabriel Stein
------------------------------
Gabriel Ferraz Stein
Tel.: +49 (0)  170 2881531

2017-12-22 12:19 GMT+01:00 Gianluca Cecchi <gianluca.cecchi@gmail.com>:
On Fri, Dec 22, 2017 at 12:06 PM, Martin Perina <mperina@redhat.com> wrote:



Only problem I registered was the first start of engine on the first upgraded host (that should be step 7 in link above), where I got error on engine (not able to access web admin portal); I see this in server.log 

2017-12-22 00:40:17,674+01 INFO  [org.quartz.core.QuartzScheduler] (ServerService Thread Pool -- 63) Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2017-12-22 00:40:17,682+01 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 63) WFLYCLINF0002: Started timeout-base cache from ovirt-engine container
2017-12-22 00:44:28,611+01 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
    ("core-service" => "management"),
    ("management-interface" => "native-interface")
]'

Adding Vaclav, maybe something in Wildfly? Martin, any hint on engine side?

​Yeah, I've already seen such error a few times, it usually happens when access to storage is really slow or the host itself is overloaded and WildFly is not able to startup properly until default 300 seconds interval​ is over.

If this is going to happen often, we will have to raise that timeout for all installations.


Ok, thanks for the confirmation of what I suspected.
Actually this particular environment is based on a single NUC where I have ESXi 6.0U2 and the 3 hosts are actually VMs of this vSphere environment, so that the hosted engine is an L2 VM
And the replica 3 (with arbiter) of the hosts insists at the end on a single physical SSD disk below...
All in all it is already a great success that this kind of infrastructure has been able to update from 4.1 to 4.2... I use it basically as a functional testing and btw on vSphere there is also another CentOS 7 VM running ;-)
 
Thanks,
Gianluca

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users