<div dir="ltr"><div>I added a hook to rhevm, and then restarted the engine service which triggered a hosted-engine VM shutdown (likely because of the failed liveliness check).</div><div><br></div><div>Once the hosted-engine VM shutdown it did not restart on the other host.</div>

<div><br></div>On both hosts configured for hosted-engine I&#39;m seeing logs from ha-agent where each host thinks the other host has a better score. Is there supposed to be a mechanism for a tie breaker here? I do notice that the log mentions best REMOTE host, so perhaps I&#39;m interpreting this message incorrectly.<div>

<br></div><div>ha-agent logs:<br><div><br></div><div>Host 001:</div><div><br></div><div><div>MainThread::INFO::2014-07-21 11:51:57,396::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1405957917.4 type=state_transition detail=EngineDown-EngineDown hostname=&#39;rhev001.miovision.corp&#39;</div>

<div>MainThread::INFO::2014-07-21 11:51:57,397::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored</div><div>MainThread::INFO::2014-07-21 11:51:57,924::hosted_engine::323::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)</div>

<div>MainThread::INFO::2014-07-21 11:51:57,924::hosted_engine::328::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host rhev002.miovision.corp (id: 2, score: 2400)</div><div>MainThread::INFO::2014-07-21 11:52:07,961::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score</div>

<div>MainThread::INFO::2014-07-21 11:52:07,975::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1405957927.98 type=state_transition detail=EngineDown-EngineDown hostname=&#39;rhev001.miovision.corp&#39;</div>

</div><div><br></div><div>Host 002:</div><div><br></div><div><div>MainThread::INFO::2014-07-21 11:51:47,405::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1405957907.41 type=state_transition detail=EngineDown-EngineDown hostname=&#39;rhev002.miovision.corp&#39;</div>

<div>MainThread::INFO::2014-07-21 11:51:47,406::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineDown) sent? ignored</div><div>MainThread::INFO::2014-07-21 11:51:47,834::hosted_engine::323::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineDown (score: 2400)</div>

<div>MainThread::INFO::2014-07-21 11:51:47,835::hosted_engine::328::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host rhev001.miovision.corp (id: 1, score: 2400)</div><div>MainThread::INFO::2014-07-21 11:51:57,870::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score</div>

<div>MainThread::INFO::2014-07-21 11:51:57,883::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1405957917.88 type=state_transition detail=EngineDown-EngineDown hostname=&#39;rhev002.miovision.corp&#39;</div>

</div><div><br></div><div>This went on for 20 minutes about an hour ago, and I decided to --vm-start on one of the hosts. The manager VM runs for a few minutes with the engine ui accessible, before shutting itself down again.</div>

<div><br></div><div>I then put host 002 into local maintenance mode, and host 001 auto started the hosted-engine VM. The logging still references host 002 as the &#39;best remote host&#39; even though the calculated score is now 0:</div>

<div><br></div><div><div>MainThread::INFO::2014-07-21 12:03:24,011::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1405958604.01 type=state_transition detail=EngineUp-EngineUp hostname=&#39;rhev001.miovision.corp&#39;</div>

<div>MainThread::INFO::2014-07-21 12:03:24,013::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineUp-EngineUp) sent? ignored</div><div>MainThread::INFO::2014-07-21 12:03:24,515::hosted_engine::323::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 2400)</div>

<div>MainThread::INFO::2014-07-21 12:03:24,516::hosted_engine::328::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host rhev002.miovision.corp (id: 2, score: 0)</div><div>MainThread::INFO::2014-07-21 12:03:34,567::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1405958614.57 type=state_transition detail=EngineUp-EngineUp hostname=&#39;rhev001.miovision.corp&#39;</div>

</div><div><br></div><div>Once the hosted-engine VM was up for about 5 minutes I took host 002 out of local maintenance mode and the VM has not since shutdown.</div><div><br></div><div>Is this expected behaviour? Is this the normal recovery process when two hosts both hosting hosted-engine are started at the same time? I would have expected once hosted-engine VM was detected as bad (liveliness check from when I restarted the engine service) and the VM was shutdown, that it would spin back up on the next available host.</div>

<div><br></div><div>Thanks,<br>Steve</div><div><br></div><div><br></div><div><br></div><div><br></div></div></div>