Hi ,

  Can you please check the following. Following could be one of the reason why HE vm restarts every minute.

Check the error or engine health state. If it’s to do with Liveliness check, then this is mostly an issue connecting to engine.

- Check if engine FQDN is reachable from all hosts

-  curl -v http://<engine-fqdn>/ovirt-engine/services/health - does this return ok?

- Access the HE console and check if ovirt-engine is running.

- Check /var/log/ovirt-engine/server.log or /var/log/ovirt-engine/engine.log if there are errors starting ovirt-engine


Thanks

kasturi



On Fri, Jul 14, 2017 at 10:28 PM, Sven Achtelik <Sven.Achtelik@eps.aero> wrote:

Hi All,

 

after running solid for several month my ovirt-engine started rebooting on several hosts. I’ve looked into the hostend-engine –vm-status and it sees that the engine is up on one host but not reachable. At the same time I can access the gui and everything is working fine. After some time the engine is shutting down and all hosts are trying to start the engine until one is the winner, at least it looks like this. Any clues where to look at and find the issue with the liveliness check ?

 

--------------------------------------------------------------------------------------------------------

 

--== Host 1 status ==--

 

conf_on_shared_storage             : True

Status up-to-date                  : True

Hostname                           : ovirt-node01

Host ID                            : 1

Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}

Score                              : 3400

stopped                            : False

Local maintenance                  : False

crc32                              : 3eb33843

local_conf_timestamp               : 17128

Host timestamp                     : 17113

Extra metadata (valid at timestamp):

        metadata_parse_version=1

        metadata_feature_version=1

        timestamp=17113 (Fri Jul 14 11:50:23 2017)

        host-id=1

        score=3400

        vm_conf_refresh_time=17128 (Fri Jul 14 11:50:38 2017)

        conf_on_shared_storage=True

        maintenance=False

        state=EngineDown

        stopped=False

 

 

--== Host 2 status ==--

 

conf_on_shared_storage             : True

Status up-to-date                  : True

Hostname                           : ovirt-node02.mgmt.lan

Host ID                            : 2

Engine status                      : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}

Score                              : 3400

stopped                            : False

Local maintenance                  : False

crc32                              : 2a8c86cc

local_conf_timestamp               : 523182

Host timestamp                     : 523167

Extra metadata (valid at timestamp):

        metadata_parse_version=1

        metadata_feature_version=1

        timestamp=523167 (Fri Jul 14 11:50:25 2017)

        host-id=2

        score=3400

        vm_conf_refresh_time=523182 (Fri Jul 14 11:50:40 2017)

        conf_on_shared_storage=True

        maintenance=False

        state=EngineStarting

        stopped=False

 

 

--== Host 3 status ==--

 

conf_on_shared_storage             : True

Status up-to-date                  : True

Hostname                           : ovirt-node03.mgmt.lan

Host ID                            : 3

Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}

Score                              : 3400

stopped                            : False

Local maintenance                  : False

crc32                              : f8490d79

local_conf_timestamp               : 527698

Host timestamp                     : 527683

Extra metadata (valid at timestamp):

        metadata_parse_version=1

        metadata_feature_version=1

        timestamp=527683 (Fri Jul 14 11:50:33 2017)

        host-id=3

        score=3400

        vm_conf_refresh_time=527698 (Fri Jul 14 11:50:47 2017)

        conf_on_shared_storage=True

        maintenance=False

        state=EngineDown

        stopped=False

 

----------------------------------------------------------------------------------------------

Thank you,

Sven


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users