Hello, and thank you all for your help.
I'm running Oracle's rebranded oVirt 4.3.10. All has been good until I patched
my self-hosted engine. I ran through the normal process: backup, global maintenance
mode, update the oVirt packages, run engine-setup, etc. All completed normally without
issues. I rebooted the self-hosted engine VM, and now it constantly fails liveliness
checks and the HA agent reboots it every five minutes, or so. I put it in back in global
maintenance so the HA agent would not reboot it. The VM is up and works correctly. I can
do everything normally.
From what I can tell the HA agent liveliness check is just a http get to the web portal.
I can see that happening with success. What is the lilveliness check actually doing? All
services on the VM are up and running without issue. Where can I look to figure this
out?
Here is the output of hosted-engine --vm-status:
[root@itdlolv101 ~]# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host itdlolv100.ci.seattle.wa.us (id: 1) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : itdlolv100.ci.seattle.wa.us
Host ID : 1
Engine status : {"reason": "vm not running on this
host", "health": "bad", "vm": "down",
"detail": "unknown"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : 855e161f
local_conf_timestamp : 55128
Host timestamp : 55128
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=55128 (Wed Jun 8 12:52:20 2022)
host-id=1
score=3400
vm_conf_refresh_time=55128 (Wed Jun 8 12:52:20 2022)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host itdlolv101.ci.seattle.wa.us (id: 2) status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : itdlolv101.ci.seattle.wa.us
Host ID : 2
Engine status : {"reason": "failed liveliness
check", "health": "bad", "vm": "up",
"detail": "Up"}
Score : 3400
stopped : False
Local maintenance : False
crc32 : cc1c2261
local_conf_timestamp : 45453
Host timestamp : 45453
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=45453 (Wed Jun 8 12:55:15 2022)
host-id=2
score=3400
vm_conf_refresh_time=45453 (Wed Jun 8 12:55:15 2022)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
[root@itdlolv101 ~]#