Hi,
I deployed a oVirt (4.3.10) cluster with HostedEngine and GlusterFS volumes (engine,
vmstore, data), the glusterfs cluster on node1/node2/node3, and the engine vm can be
running on those 3 nodes.
Then I added a 4th nodes into cluster.
But, when I operates on Eninge Web Portal, it's always reports 503 error, then I
checked `hsoted-engine --vm-status`, see below:
```
[root@vhost1 ~]# hosted-engine –vm-status
–== Host vhost1.yhmk.lan (id: 1) status ==–
conf_on_shared_storage : True
Status up-to-date : True
Hostname : vhost1.<span style=”background-color: rgb(255, 255, 255); color: rgb(51, 51,
51);”>alatest</span>.lan
Host ID : 1
Engine status : {“reason”: “bad vm status”, “health”: “bad”, “vm”: “down_unexpected”,
“detail”: “Down”}
Score : 0
stopped : False
Local maintenance : False
crc32 : 1f25baff
local_conf_timestamp : 1253650
Host timestamp : 1253649
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1253649 (Thu Apr 8 08:05:48 2021)
host-id=1
score=0
vm_conf_refresh_time=1253650 (Thu Apr 8 08:05:48 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineUnexpectedlyDown
stopped=False
timeout=Thu Jan 15 20:23:29 1970
–== Host vhost2.yhmk.lan (id: 2) status ==–
conf_on_shared_storage : True
Status up-to-date : True
Hostname : vhost2.<span style=”background-color: rgb(255, 255, 255); color: rgb(51, 51,
51);”>alatest</span>.lan
Host ID : 2
Engine status : {“reason”: “vm not running on this host”, “health”: “bad”, “vm”:
“down_unexpected”, “detail”: “unknown”}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 539fc30c
local_conf_timestamp : 1253343
Host timestamp : 1253343
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1253343 (Thu Apr 8 08:05:46 2021)
host-id=2
score=3400
vm_conf_refresh_time=1253343 (Thu Apr 8 08:05:46 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
–== Host vhost3.yhmk.lan (id: 3) status ==–
conf_on_shared_storage : True
Status up-to-date : True
Hostname : vhost3.alatest.lan
Host ID : 3
Engine status : {“reason”: “bad vm status”, “health”: “bad”, “vm”: “up”, “detail”:
“Powering up”}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 4072e0b8
local_conf_timestamp : 1252345
Host timestamp : 1252345
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1252345 (Thu Apr 8 08:05:42 2021)
host-id=3
score=3400
vm_conf_refresh_time=1252345 (Thu Apr 8 08:05:42 2021)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
```
Then, wait a moment, can access web portal again, and check the hosts status, alway
reports one or more hosts with label `unavaiable as HA score`, but it will dispear later.
And I found, sometimes the engine vm will migrated to another nodes whill this problem
occur.
So, seems the HostedEngine is not stable, always occur this problem, could you please help
me with this ? Thanks!