hosted engine vm not present

Hello I successfully migrated from HE 4.3.10 to HE 4.4.9, but I think I commited a mistake: The HostedEngine vm was running on the host (haboob) where I deployed the upgrade path. Everything was ok except that I deployed it on the wrong host (haboob). So I live migrated the HostedEngine on the centos prexisting host (kilimanjaro) and erase haboob. Then I reinstalled a new host (fuego) to replace haboob. The HostedEngine is able to migrate between kilimajaro and fuego but now the vm seems to not be seen by any host when doing: [root@fuego ~]# hosted-engine --vm-status --== Host fuego (id: 1) status ==-- Host ID : 1 Host timestamp : 3252 Score : 3350 Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"} Hostname : fuego Local maintenance : False stopped : False crc32 : 14527b72 conf_on_shared_storage : True local_conf_timestamp : 3257 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3252 (Sun Nov 28 18:27:29 2021) host-id=1 score=3350 vm_conf_refresh_time=3257 (Sun Nov 28 18:27:34 2021) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False --== Host kilimanjaro.v100.abes.fr (id: 3) status ==-- Host ID : 3 Host timestamp : 65261186 Score : 0 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Hostname : kilimanjaro.v100.abes.fr Local maintenance : True stopped : False crc32 : c381cf1e conf_on_shared_storage : True local_conf_timestamp : 65261189 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=65261186 (Sun Nov 28 19:27:23 2021) host-id=3 score=0 vm_conf_refresh_time=65261189 (Sun Nov 28 19:27:26 2021) conf_on_shared_storage=True maintenance=True state=LocalMaintenance stopped=False When doing hosted-engine --console, it returns: [root@fuego ~]# hosted-engine --console Command VM.getStats with args {'vmID': '74d2966c-2efa-41f0-a5c3-dd383f690a92'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74d2966c-2efa-41f0-a5c3-dd383f690a92'}) The engine VM is not on this host It is like the vmID was the old 4.3.10 HostedEngine that doesn't exist anymore. How can I make the new HostedEngine vmID be the good one known by HA and hosts? *I'm afraid to lose the HostedEngine vm when stopping it!* Thank you for your precious help. -- Nathanaël Blanchet Supervision réseau SIRE 227 avenue Professeur-Jean-Louis-Viala 34193 MONTPELLIER CEDEX 5 Tél. 33 (0)4 67 54 84 55 Fax 33 (0)4 67 54 84 14 blanchet@abes.fr

On Sun, Nov 28, 2021 at 8:46 PM Nathanaël Blanchet <blanchet@abes.fr> wrote:
Hello
I successfully migrated from HE 4.3.10 to HE 4.4.9, but I think I commited a mistake:
The HostedEngine vm was running on the host (haboob) where I deployed the upgrade path.
Everything was ok except that I deployed it on the wrong host (haboob). So I live migrated the HostedEngine on the centos prexisting host (kilimanjaro) and erase haboob. Then I reinstalled a new host (fuego) to replace haboob. The HostedEngine is able to migrate between kilimajaro and fuego but now the vm seems to not be seen by any host when doing:
[root@fuego ~]# hosted-engine --vm-status
--== Host fuego (id: 1) status ==--
Host ID : 1 Host timestamp : 3252 Score : 3350 Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"} Hostname : fuego Local maintenance : False stopped : False crc32 : 14527b72 conf_on_shared_storage : True local_conf_timestamp : 3257 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3252 (Sun Nov 28 18:27:29 2021) host-id=1 score=3350 vm_conf_refresh_time=3257 (Sun Nov 28 18:27:34 2021) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False
--== Host kilimanjaro.v100.abes.fr (id: 3) status ==--
Host ID : 3 Host timestamp : 65261186 Score : 0 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Hostname : kilimanjaro.v100.abes.fr Local maintenance : True stopped : False crc32 : c381cf1e conf_on_shared_storage : True local_conf_timestamp : 65261189 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=65261186 (Sun Nov 28 19:27:23 2021) host-id=3 score=0 vm_conf_refresh_time=65261189 (Sun Nov 28 19:27:26 2021) conf_on_shared_storage=True maintenance=True state=LocalMaintenance stopped=False
When doing hosted-engine --console, it returns:
[root@fuego ~]# hosted-engine --console Command VM.getStats with args {'vmID': '74d2966c-2efa-41f0-a5c3-dd383f690a92'} failed: (code=1, message=Virtual machine does not exist: {'vmId': '74d2966c-2efa-41f0-a5c3-dd383f690a92'}) The engine VM is not on this host
It is like the vmID was the old 4.3.10 HostedEngine that doesn't exist anymore.
How can I make the new HostedEngine vmID be the good one known by HA and hosts?
I'm afraid to lose the HostedEngine vm when stopping it!
I am not sure I fully understood your flow, but whatever it was, why should you be afraid of losing the old vm? "Losing" it is an integral part of the process, even if not through your current state. Part of the upgrade is taking an engine-backup, right? I suppose you took one. If you changed anything in the old engine since then, and can take another backup, perhaps do that. Then try again "from scratch" (exact details may vary, perhaps share more if needed). Re the old engine and its VM: It's very important to make sure that only one engine will manage your system, and after taking the backup, it should be the new engine, after it's up. So even if you somehow entered a state where the old engine vm is alive, better disable/stop the engine there. Good luck and best regards, -- Didi
participants (2)
-
Nathanaël Blanchet
-
Yedidyah Bar David