On Mon, Dec 11, 2017 at 9:47 AM, Roberto Nunin <robnunin@gmail.com> wrote:Hello allduring weekend, I've re-tried to deploy my 4.2_rc lab.Everything was fine, apart the fact host 2 and 3 weren't imported. I had to add them to the cluster manually, with the NEW function.After this Gluster volumes were added fine to the environment.Next engine deploy on nodes 2 and 3, ended with ok status.Tring to migrate HE from host 1 to host 2 was fine, the same from host 2 to host 3.After these two attempts, no way to migrate HE back to any host.Tried Maintenance mode set to global, reboot the HE and now I'm in the same condition reported below, not anymore able to boot the HE.Here's hosted-engine --vm-status:!! Cluster is in GLOBAL MAINTENANCE mode !!--== Host 1 status ==--conf_on_shared_storage : TrueStatus up-to-date : TrueHostname : aps-te61-mng.example.comHost ID : 1Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}Score : 3400stopped : FalseLocal maintenance : Falsecrc32 : 7dfc420blocal_conf_timestamp : 181953Host timestamp : 181952Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=181952 (Mon Dec 11 09:21:46 2017)host-id=1score=3400vm_conf_refresh_time=181953 (Mon Dec 11 09:21:47 2017)conf_on_shared_storage=Truemaintenance=Falsestate=GlobalMaintenancestopped=False--== Host 2 status ==--conf_on_shared_storage : TrueStatus up-to-date : TrueHostname : aps-te64-mng.example.comHost ID : 2Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}Score : 3400stopped : FalseLocal maintenance : Falsecrc32 : 67c7dd1dlocal_conf_timestamp : 181946Host timestamp : 181946Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=181946 (Mon Dec 11 09:21:49 2017)host-id=2score=3400vm_conf_refresh_time=181946 (Mon Dec 11 09:21:49 2017)conf_on_shared_storage=Truemaintenance=Falsestate=GlobalMaintenancestopped=False--== Host 3 status ==--conf_on_shared_storage : TrueStatus up-to-date : TrueHostname : aps-te68-mng.example.comHost ID : 3Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}Score : 3400stopped : FalseLocal maintenance : Falsecrc32 : 4daea041local_conf_timestamp : 181078Host timestamp : 181078Extra metadata (valid at timestamp):metadata_parse_version=1metadata_feature_version=1timestamp=181078 (Mon Dec 11 09:21:53 2017)host-id=3score=3400vm_conf_refresh_time=181078 (Mon Dec 11 09:21:53 2017)conf_on_shared_storage=Truemaintenance=Falsestate=GlobalMaintenancestopped=False!! Cluster is in GLOBAL MAINTENANCE mode !!(it is in global maintenance to avoid messages to be sent to admin mailbox).As soon as you exit the global maintenance mode, one of the hosts should take care of automatically restarting the engine VM within a couple of minutes.If you want to manually start the engine VM over a specific host while in maintenance mode you can use:hosted-engine --vm-starton the specific hostEngine image is available on all three hosts, gluster is working fine:Volume Name: engineType: ReplicateVolume ID: 95355a0b-1f45-4329-95c7-604682e812d0 Status: StartedSnapshot Count: 0Number of Bricks: 1 x 3 = 3Transport-type: tcpBricks:Brick1: aps-te61-mng.example.com:/gluster_bricks/engine/engine Brick2: aps-te64-mng.example.com:/gluster_bricks/engine/engine Brick3: aps-te68-mng.example.com:/gluster_bricks/engine/engine Options Reconfigured:nfs.disable: ontransport.address-family: inetperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.low-prio-threads: 32network.remote-dio: offcluster.eager-lock: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: full cluster.locking-scheme: granularcluster.shd-max-threads: 8cluster.shd-wait-qlength: 10000features.shard: onuser.cifs: offstorage.owner-uid: 36storage.owner-gid: 36network.ping-timeout: 30performance.strict-o-direct: oncluster.granular-entry-heal: enablefeatures.shard-block-size: 64MBEngine qemu image seems to be ok:[root@aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# l total 2660796-rw-rw----. 1 vdsm kvm 53687091200 Dec 8 17:22 35ac0f88-e97d-4710-a385-127c751a3190 -rw-rw----. 1 vdsm kvm 1048576 Dec 11 09:04 35ac0f88-e97d-4710-a385-127c751a3190.lease -rw-r--r--. 1 vdsm kvm 285 Dec 8 11:19 35ac0f88-e97d-4710-a385-127c751a3190.meta [root@aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# qemu-img info 35ac0f88-e97d-4710-a385-127c75 1a3190 image: 35ac0f88-e97d-4710-a385-127c751a3190 file format: rawvirtual size: 50G (53687091200 bytes)disk size: 2.5G[root@aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# Attached agent and broker logs from the last host where HE startup was attempted. Only last two hours.Any hint to perform further investigation ?Thanks in advance.Environment: 3 HPE Proliant BL680cG7, OS on mirrored volume 1, gluster on mirrored volume 2, 1 TB for each server.Multiple network adapters (6), configured only one.I've used last ovirt-node-ng iso image : ovirt-node-ng-installer-ovirt-4.2-pre-2017120512.iso and ovirt-hosted-engine-setup- 2.2.1-0.0.master.2017120612355 3.git94f4c9e.el7.centos.noarch to work around HE static ip address not masked correctly. --Roberto2017-12-07 12:44 GMT+01:00 Roberto Nunin <robnunin@gmail.com>:Hiafter successfully deployed fresh 4.2_rc with ovirt node, I'm facing with a blocking problem.hosted-engine won't boot. Reaching the console via vnc hook, I can see that it is at initial boot screen, but for any OS release available, I receive:then--googling around, I'm not able to find suggestions. Any hints ?ThanksRoberto
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users