[ovirt-users] RHEV 3.4 trial hosted-engine either host wants to take ownership

Martin Sivak msivak at redhat.com
Tue Jul 22 07:54:50 UTC 2014


Hi Steve,

we had a bug (or two..) in the score comparison logic:

https://bugzilla.redhat.com/show_bug.cgi?id=1093366

Which was fixed by:

http://gerrit.ovirt.org/29580
http://gerrit.ovirt.org/29787

and

http://gerrit.ovirt.org/30025

Unfortunately those did not get to the current 3.4 releases, but will be available in the upcoming 3.5 and any subsequent 3.4 that will appear.

One of the patches (29580) just fixes two words in the code and you can apply it manually if you want.

Regards

--
Martin Sivák
msivak at redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

----- Original Message -----
> I added a hook to rhevm, and then restarted the engine service which
> triggered a hosted-engine VM shutdown (likely because of the failed
> liveliness check).
> 
> Once the hosted-engine VM shutdown it did not restart on the other host.
> 
> On both hosts configured for hosted-engine I'm seeing logs from ha-agent
> where each host thinks the other host has a better score. Is there supposed
> to be a mechanism for a tie breaker here? I do notice that the log mentions
> best REMOTE host, so perhaps I'm interpreting this message incorrectly.
> 
> ha-agent logs:
> 
> Host 001:
> 
> MainThread::INFO::2014-07-21
> 11:51:57,396::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1405957917.4 type=state_transition
> detail=EngineDown-EngineDown hostname='rhev001.miovision.corp'
> MainThread::INFO::2014-07-21
> 11:51:57,397::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Success, was notification of state_transition (EngineDown-EngineDown) sent?
> ignored
> MainThread::INFO::2014-07-21
> 11:51:57,924::hosted_engine::323::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-07-21
> 11:51:57,924::hosted_engine::328::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host rhev002.miovision.corp (id: 2, score: 2400)
> MainThread::INFO::2014-07-21
> 11:52:07,961::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine down, local host does not have best score
> MainThread::INFO::2014-07-21
> 11:52:07,975::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1405957927.98 type=state_transition
> detail=EngineDown-EngineDown hostname='rhev001.miovision.corp'
> 
> Host 002:
> 
> MainThread::INFO::2014-07-21
> 11:51:47,405::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1405957907.41 type=state_transition
> detail=EngineDown-EngineDown hostname='rhev002.miovision.corp'
> MainThread::INFO::2014-07-21
> 11:51:47,406::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Success, was notification of state_transition (EngineDown-EngineDown) sent?
> ignored
> MainThread::INFO::2014-07-21
> 11:51:47,834::hosted_engine::323::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineDown (score: 2400)
> MainThread::INFO::2014-07-21
> 11:51:47,835::hosted_engine::328::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host rhev001.miovision.corp (id: 1, score: 2400)
> MainThread::INFO::2014-07-21
> 11:51:57,870::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine down, local host does not have best score
> MainThread::INFO::2014-07-21
> 11:51:57,883::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1405957917.88 type=state_transition
> detail=EngineDown-EngineDown hostname='rhev002.miovision.corp'
> 
> This went on for 20 minutes about an hour ago, and I decided to --vm-start on
> one of the hosts. The manager VM runs for a few minutes with the engine ui
> accessible, before shutting itself down again.
> 
> I then put host 002 into local maintenance mode, and host 001 auto started
> the hosted-engine VM. The logging still references host 002 as the 'best
> remote host' even though the calculated score is now 0:
> 
> MainThread::INFO::2014-07-21
> 12:03:24,011::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1405958604.01 type=state_transition
> detail=EngineUp-EngineUp hostname='rhev001.miovision.corp'
> MainThread::INFO::2014-07-21
> 12:03:24,013::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Success, was notification of state_transition (EngineUp-EngineUp) sent?
> ignored
> MainThread::INFO::2014-07-21
> 12:03:24,515::hosted_engine::323::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Current state EngineUp (score: 2400)
> MainThread::INFO::2014-07-21
> 12:03:24,516::hosted_engine::328::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Best remote host rhev002.miovision.corp (id: 2, score: 0)
> MainThread::INFO::2014-07-21
> 12:03:34,567::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Trying: notify time=1405958614.57 type=state_transition
> detail=EngineUp-EngineUp hostname='rhev001.miovision.corp'
> 
> Once the hosted-engine VM was up for about 5 minutes I took host 002 out of
> local maintenance mode and the VM has not since shutdown.
> 
> Is this expected behaviour? Is this the normal recovery process when two
> hosts both hosting hosted-engine are started at the same time? I would have
> expected once hosted-engine VM was detected as bad (liveliness check from
> when I restarted the engine service) and the VM was shutdown, that it would
> spin back up on the next available host.
> 
> Thanks,
> Steve
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



More information about the Users mailing list