Hi Benny,
Who should I be reaching out to for help with a gluster based
hosted engine corruption?
--== Host 1 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : ovirtnode1.abcxyzdomains.net
Host ID : 1
Engine status : {"reason": "failed liveliness
check", "health": "bad", "vm": "up", "detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 92254a68
local_conf_timestamp : 115910
Host timestamp : 115910
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=115910 (Mon Jun 18 09:43:20 2018)
host-id=1
score=3400
vm_conf_refresh_time=115910 (Mon Jun 18 09:43:20 2018)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
My when I VNC into my HE, All I get is:
Probing EDD (edd=off to disable)... ok
So, that's why it's failing the liveliness check... I cannot get
the screen on HE to change short of ctl-alt-del which will reboot
the HE.
I do have backups for the HE that are/were run on a nightly basis.
If the cluster was left alone, the HE vm would bounce from machine
to machine trying to boot. This is why the cluster is in
maintenance mode.
One of the nodes was down for a period of time and brought back,
sometime through the night, which is when the automated backup
kicks, the HE started bouncing around. Got nearly 1000 emails.
This seems to be the same error (but may not be the same cause) as
listed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1569827
Thanks,
Hanson