Update gluster HCI from 4.1.3 to 4.1.7

Hello, I have an environment with 3 hosts and gluster HCI on 4.1.3. I'm following this link to take it to 4.1.7 https://www.ovirt.org/documentation/how-to/hosted-engine/#upgrade-hosted-eng... The hosts and engine were at 7.3 prior of beginning the update. All went ok for the engine that now is on 7.4 (not rebooted yet) Points 4. 5. 6. for the first updated host were substituted by rebooting it I'm at point: 7. Exit the global maintenance mode: in a few minutes the engine VM should migrate to the fresh upgraded host cause it will get an higher score One note: actually exiting from global maintenance doesn't imply that the host previously put into maintenance exiting from it, correct? So in my workflow, before point 7., actually I have selected the host and activated it. Currently the situation is this - engine running on ovirt02 - update happened on ovirt03 Then after exiting from global maintenance I don't see the engine vm migrating to it. And in fact (see below) the score of ovirt02 is the same (3400) as the one of ovirt03, so it seems it is correct that engine remains there...? Which kind of messages should I see inside logs of engine/hosts? [root@ovirt01 ~]# rpm -q vdsm vdsm-4.19.20-1.el7.centos.x86_64 [root@ovirt02 ~]# rpm -q vdsm vdsm-4.19.20-1.el7.centos.x86_64 [root@ovirt02 ~]# [root@ovirt03 ~]# rpm -q vdsm vdsm-4.19.37-1.el7.centos.x86_64 from host ovirt01: [root@ovirt01 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3352 stopped : False Local maintenance : False crc32 : 256f2128 local_conf_timestamp : 12251210 Host timestamp : 12251178 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=12251178 (Tue Nov 28 10:11:20 2017) host-id=1 score=3352 vm_conf_refresh_time=12251210 (Tue Nov 28 10:11:52 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 9b8c8a6c local_conf_timestamp : 12219386 Host timestamp : 12219357 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=12219357 (Tue Nov 28 10:11:23 2017) host-id=2 score=3400 vm_conf_refresh_time=12219386 (Tue Nov 28 10:11:52 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 9f6399ef local_conf_timestamp : 2136 Host timestamp : 2136 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2136 (Tue Nov 28 10:11:56 2017) host-id=3 score=3400 vm_conf_refresh_time=2136 (Tue Nov 28 10:11:56 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False [root@ovirt01 ~]# Can I manually migrate engine vm to ovirt03? On ovirt03: [root@ovirt03 ~]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 6e2bd1d7-9c8e-4c54-9d85-f36e1b871771 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt01.localdomain.local:/gluster/brick1/engine Brick2: ovirt02.localdomain.local:/gluster/brick1/engine Brick3: ovirt03.localdomain.local:/gluster/brick1/engine (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 6 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on transport.address-family: inet [root@ovirt03 ~]# [root@ovirt03 ~]# gluster volume heal engine info Brick ovirt01.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 Brick ovirt02.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 Brick ovirt03.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 [root@ovirt03 ~]# Thanks, Gianluca

Hello, I have an environment with 3 hosts and gluster HCI on 4.1.3. I'm following this link to take it to 4.1.7 https://www.ovirt.org/documentation/how-to/hosted- engine/#upgrade-hosted-engine The hosts and engine were at 7.3 prior of beginning the update. All went ok for the engine that now is on 7.4 (not rebooted yet) Points 4. 5. 6. for the first updated host were substituted by rebooting it I'm at point: 7. Exit the global maintenance mode: in a few minutes the engine VM should migrate to the fresh upgraded host cause it will get an higher score One note: actually exiting from global maintenance doesn't imply that the host previously put into maintenance exiting from it, correct? [kasturi] - you are right. Global maintenance main use is to allow administrator start / stop / modify the engine vm with out any worry of interference from the HA agents . So in my workflow, before point 7., actually I have selected the host and activated it. Currently the situation is this - engine running on ovirt02 - update happened on ovirt03 [kasturi] - looks fine. Then after exiting from global maintenance I don't see the engine vm migrating to it. [kasturi] - which is expected. And in fact (see below) the score of ovirt02 is the same (3400) as the one of ovirt03, so it seems it is correct that engine remains there...? [kasturi] - yes, it remains there. For a properly active host which has everything configured properly the score will be 3400. When global maintenance is enabled on the cluster, active score becomes 'Global Maintenance Enabled' (can be viewed from general tab of host ) and once exited score on all the hosts becomes 3400 and hosted engine will be running on the same host where it was running before enabling global maintenance. Which kind of messages should I see inside logs of engine/hosts? [root@ovirt01 ~]# rpm -q vdsm vdsm-4.19.20-1.el7.centos.x86_64 [root@ovirt02 ~]# rpm -q vdsm vdsm-4.19.20-1.el7.centos.x86_64 [root@ovirt02 ~]# [root@ovirt03 ~]# rpm -q vdsm vdsm-4.19.37-1.el7.centos.x86_64 from host ovirt01: [root@ovirt01 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt01.localdomain.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3352 stopped : False Local maintenance : False crc32 : 256f2128 local_conf_timestamp : 12251210 Host timestamp : 12251178 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=12251178 (Tue Nov 28 10:11:20 2017) host-id=1 score=3352 vm_conf_refresh_time=12251210 (Tue Nov 28 10:11:52 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : 192.168.150.103 Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 9b8c8a6c local_conf_timestamp : 12219386 Host timestamp : 12219357 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=12219357 (Tue Nov 28 10:11:23 2017) host-id=2 score=3400 vm_conf_refresh_time=12219386 (Tue Nov 28 10:11:52 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt03.localdomain.local Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 9f6399ef local_conf_timestamp : 2136 Host timestamp : 2136 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2136 (Tue Nov 28 10:11:56 2017) host-id=3 score=3400 vm_conf_refresh_time=2136 (Tue Nov 28 10:11:56 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False [root@ovirt01 ~]# Can I manually migrate engine vm to ovirt03? [kasturi] yes, definitely. You should be able to migrate. hosted-engine --vm-status looks fine On ovirt03: [root@ovirt03 ~]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 6e2bd1d7-9c8e-4c54-9d85-f36e1b871771 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt01.localdomain.local:/gluster/brick1/engine Brick2: ovirt02.localdomain.local:/gluster/brick1/engine Brick3: ovirt03.localdomain.local:/gluster/brick1/engine (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 6 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on transport.address-family: inet [root@ovirt03 ~]# [root@ovirt03 ~]# gluster volume heal engine info Brick ovirt01.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 Brick ovirt02.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 Brick ovirt03.localdomain.local:/gluster/brick1/engine Status: Connected Number of entries: 0 [root@ovirt03 ~]# [kasturi] - By the way, any reason why the engine volume is configured to be an arbiter volume? We always recommend engine volume to be a replicated volume to maintain High Availability of the Hosted Engine vm. Thanks, Gianluca _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Tue, Nov 28, 2017 at 6:36 PM, Kasturi Narra <knarra@redhat.com> wrote:
Hello, I have an environment with 3 hosts and gluster HCI on 4.1.3. I'm following this link to take it to 4.1.7 https://www.ovirt.org/documentation/how-to/hosted-engine/#up grade-hosted-engine
[snip]
7. Exit the global maintenance mode: in a few minutes the engine VM should migrate to the fresh upgraded host cause it will get an higher score
One note: actually exiting from global maintenance doesn't imply that the host previously put into maintenance exiting from it, correct?
[kasturi] - you are right. Global maintenance main use is to allow administrator start / stop / modify the engine vm with out any worry of interference from the HA agents .
So probably one item between 6. and 7. has to be added . exit hosted-engine host from maintenance (Select Host -> Management -> activate)
Then after exiting from global maintenance I don't see the engine vm migrating to it. [kasturi] - which is expected.
Reading documents I thought it should have migrated to the "higher" version host... Perhaps this applies only when there is a cluster version upgrade in the datacenter, such as 3.6 -> 4.0 or 4.0 -> 4.1 and it is not true in general?
Can I manually migrate engine vm to ovirt03?
[kasturi]
yes, definitely. You should be able to migrate. hosted-engine --vm-status looks fine
Yes, it worked as expected
On ovirt03:
[root@ovirt03 ~]# gluster volume info engine
Volume Name: engine Type: Replicate Volume ID: 6e2bd1d7-9c8e-4c54-9d85-f36e1b871771 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt01.localdomain.local:/gluster/brick1/engine Brick2: ovirt02.localdomain.local:/gluster/brick1/engine Brick3: ovirt03.localdomain.local:/gluster/brick1/engine (arbiter) Options Reconfigured: performance.strict-o-direct: on nfs.disable: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 6 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on transport.address-family: inet [root@ovirt03 ~]#
[snip]
[kasturi] - By the way, any reason why the engine volume is configured to be an arbiter volume? We always recommend engine volume to be a replicated volume to maintain High Availability of the Hosted Engine vm.
Can I mix in general volumes with arbiter and volumes fully replicated in the same infrastructure, correct? Actually this particular system is based on a single NUC6i5SYH with 32Gb of ram and 2xSSD disks, where I have ESXi 6.0U2 installed. The 3 oVirt HCI hosts are 3 vSphere VMs, so the engine VM is an L2 guest. Moreover in oVirt I have another CentOS 6 VM (L2 guest) configured and there is also another CentOS 7 vSphere VM running side by side. Without arbiter it would have been too cruel... ;-) It was 4 months I didn't come at it and I found it rock solid active, so I decided to verify update from 4.1.3 to 4.1.7 and all went ok now. Remaining points I'm going to investigate more are: - edit of running options for the Engine VM. Right now I'm force to manually run engine in my particular nested environment with hosted-engine --vm-start --vm-conf=/root/alternate_engine_vm.conf where I have emulatedMachine=pc-i440fx-rhel7.2.0 because with 7.3 and 7.4 it doesn' start as described through this thread: http://lists.ovirt.org/pipermail/users/2017-July/083149.html Still it seems I cannot set it from the web admin gui in 4.1.7 and the same happens for other engine vm parameters. I don't know if in 4.2 there will be any improvement in managing this. - migrate from fuse to libgfapi - migrate the gluster network volumes from ovirtmgmt to another defined logical network I tried (after updating to gluster 3.10) with some problems in 4.1.3 for an export domain. It seems more critical for data and engine storage domains. See also this thread here with my attempts at that time: http://lists.ovirt.org/pipermail/users/2017-July/083077.html Thanks for your time, Gianluca
participants (2)
-
Gianluca Cecchi
-
Kasturi Narra