On Sun, Nov 28, 2021 at 8:46 PM Nathanaël Blanchet <blanchet(a)abes.fr> wrote:
Hello
I successfully migrated from HE 4.3.10 to HE 4.4.9, but I think I commited a mistake:
The HostedEngine vm was running on the host (haboob) where I deployed the upgrade path.
Everything was ok except that I deployed it on the wrong host (haboob). So I live
migrated the HostedEngine on the centos prexisting host (kilimanjaro) and erase haboob.
Then I reinstalled a new host (fuego) to replace haboob. The HostedEngine is able to
migrate between kilimajaro and fuego but now the vm seems to not be seen by any host when
doing:
[root@fuego ~]# hosted-engine --vm-status
--== Host fuego (id: 1) status ==--
Host ID : 1
Host timestamp : 3252
Score : 3350
Engine status : {"vm": "down",
"health": "bad", "detail": "unknown",
"reason": "vm not running on this host"}
Hostname : fuego
Local maintenance : False
stopped : False
crc32 : 14527b72
conf_on_shared_storage : True
local_conf_timestamp : 3257
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3252 (Sun Nov 28 18:27:29 2021)
host-id=1
score=3350
vm_conf_refresh_time=3257 (Sun Nov 28 18:27:34 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host kilimanjaro.v100.abes.fr (id: 3) status ==--
Host ID : 3
Host timestamp : 65261186
Score : 0
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm":
"down_unexpected", "detail": "unknown"}
Hostname : kilimanjaro.v100.abes.fr
Local maintenance : True
stopped : False
crc32 : c381cf1e
conf_on_shared_storage : True
local_conf_timestamp : 65261189
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=65261186 (Sun Nov 28 19:27:23 2021)
host-id=3
score=0
vm_conf_refresh_time=65261189 (Sun Nov 28 19:27:26 2021)
conf_on_shared_storage=True
maintenance=True
state=LocalMaintenance
stopped=False
When doing hosted-engine --console, it returns:
[root@fuego ~]# hosted-engine --console
Command VM.getStats with args {'vmID':
'74d2966c-2efa-41f0-a5c3-dd383f690a92'} failed:
(code=1, message=Virtual machine does not exist: {'vmId':
'74d2966c-2efa-41f0-a5c3-dd383f690a92'})
The engine VM is not on this host
It is like the vmID was the old 4.3.10 HostedEngine that doesn't exist anymore.
How can I make the new HostedEngine vmID be the good one known by HA and hosts?
I'm afraid to lose the HostedEngine vm when stopping it!
I am not sure I fully understood your flow, but whatever it was, why should you
be afraid of losing the old vm? "Losing" it is an integral part of the process,
even if not through your current state.
Part of the upgrade is taking an engine-backup, right? I suppose you took one.
If you changed anything in the old engine since then, and can take another
backup, perhaps do that. Then try again "from scratch" (exact details may
vary, perhaps share more if needed).
Re the old engine and its VM: It's very important to make sure that only one
engine will manage your system, and after taking the backup, it should be the
new engine, after it's up. So even if you somehow entered a state where the
old engine vm is alive, better disable/stop the engine there.
Good luck and best regards,
--
Didi