
Hi, From the hosted-engine FAQ, the engine VM should be up and running in about 5 minutes after its host was forced poweroff. However, after updated oVirt 3.6.4 to 3.6.5, the engine VM won't restart automatically even after 10+ minutes (I already made sure that global maintenance mode is set to none). I initially thought its a time sync issue, so I installed and enabled ntp on the hosts and engine. However, the issue still persists. ###Versions: [root@host01 ~]# rpm -qa | grep ovirt libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-vmconsole-1.0.0-1.el7.centos.noarch ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.5.3-1.el7.centos.noarch ovirt-host-deploy-1.4.1-1.el7.centos.noarch ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch ovirt-hosted-engine-setup-1.3.5.0-1.el7.centos.noarch ovirt-release36-007-1.noarch ovirt-setup-lib-1.0.1-1.el7.centos.noarch [root@host01 ~]# rpm -qa | grep vdsm vdsm-infra-4.17.26-0.el7.centos.noarch vdsm-jsonrpc-4.17.26-0.el7.centos.noarch vdsm-gluster-4.17.26-0.el7.centos.noarch vdsm-python-4.17.26-0.el7.centos.noarch vdsm-yajsonrpc-4.17.26-0.el7.centos.noarch vdsm-4.17.26-0.el7.centos.noarch vdsm-cli-4.17.26-0.el7.centos.noarch vdsm-xmlrpc-4.17.26-0.el7.centos.noarch vdsm-hook-vmfex-dev-4.17.26-0.el7.centos.noarch ###Log files: https://app.box.com/s/fkurmwagogwkv5smkwwq7i4ztmwf9q9r ###After host02 was killed: [root@host03 wees]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : host01.ovirt.forest.go.th Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 396766e0 Host timestamp : 4391 --== Host 2 status ==-- Status up-to-date : True Hostname : host02.ovirt.forest.go.th Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 0 stopped : True Local maintenance : False crc32 : 3a345b65 Host timestamp : 1458 --== Host 3 status ==-- Status up-to-date : True Hostname : host03.ovirt.forest.go.th Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 4c34b0ed Host timestamp : 11958 ###After host02 was killed for a while: [root@host03 wees]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : host01.ovirt.forest.go.th Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 72e4e418 Host timestamp : 4415 --== Host 2 status ==-- Status up-to-date : False Hostname : host02.ovirt.forest.go.th Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : 3a345b65 Host timestamp : 1458 --== Host 3 status ==-- Status up-to-date : False Hostname : host03.ovirt.forest.go.th Host ID : 3 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : 4c34b0ed Host timestamp : 11958 ###After host02 was up again completely: [root@host03 wees]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : host01.ovirt.forest.go.th Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : f5728fca Host timestamp : 5555 --== Host 2 status ==-- Status up-to-date : True Hostname : host02.ovirt.forest.go.th Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : e5284763 Host timestamp : 715 --== Host 3 status ==-- Status up-to-date : True Hostname : host03.ovirt.forest.go.th Host ID : 3 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : bc10c7fc Host timestamp : 13119 -- Wee