We lost the backend storage that hosts our self hosted engine tonight.
We've recovered it and there was no data corruption on the volume
containing the HE disk. However, when we try to start the HE it doesn't
give an error, but it also doesn't start.
The VM isn't pingable and the liveliness check always fails.
[root@ovirttest1 ~]# hosted-engine --vm-status | grep -A20 ovirttest1
Hostname :
ovirttest1.wolfram.com
Host ID : 1
Engine status : {"reason": "failed liveliness
check",
"health": "bad", "vm": "up", "detail":
"up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 2c2f3ec9
local_conf_timestamp : 18980042
Host timestamp : 18980039
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=18980039 (Fri Nov 10 01:17:59 2017)
host-id=1
score=3400
vm_conf_refresh_time=18980042 (Fri Nov 10 01:18:03 2017)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
The environment is in Global Maintenance so that we can isolate it to
starting on a specific host to eliminate as many variables as possible.
I've attached the agent and broker logs