[ovirt-users] Engine HA-Issues

Sven Achtelik Sven.Achtelik at eps.aero
Fri Jul 14 16:58:04 UTC 2017


Hi All,

after running solid for several month my ovirt-engine started rebooting on several hosts. I've looked into the hostend-engine -vm-status and it sees that the engine is up on one host but not reachable. At the same time I can access the gui and everything is working fine. After some time the engine is shutting down and all hosts are trying to start the engine until one is the winner, at least it looks like this. Any clues where to look at and find the issue with the liveliness check ?

--------------------------------------------------------------------------------------------------------

--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt-node01
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 3eb33843
local_conf_timestamp               : 17128
Host timestamp                     : 17113
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=17113 (Fri Jul 14 11:50:23 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=17128 (Fri Jul 14 11:50:38 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt-node02.mgmt.lan
Host ID                            : 2
Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 2a8c86cc
local_conf_timestamp               : 523182
Host timestamp                     : 523167
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=523167 (Fri Jul 14 11:50:25 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=523182 (Fri Jul 14 11:50:40 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineStarting
        stopped=False


--== Host 3 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt-node03.mgmt.lan
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : f8490d79
local_conf_timestamp               : 527698
Host timestamp                     : 527683
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=527683 (Fri Jul 14 11:50:33 2017)
        host-id=3
        score=3400
        vm_conf_refresh_time=527698 (Fri Jul 14 11:50:47 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False

----------------------------------------------------------------------------------------------
Thank you,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170714/8cb56df5/attachment-0001.html>


More information about the Users mailing list