[ovirt-users] Self-hosted engine won't start
Daniel Helgenberger
daniel.helgenberger at m-box.de
Tue Aug 19 15:00:23 UTC 2014
On Di, 2014-08-19 at 07:42 +1000, John Gardeniers wrote:
> Hi Daniel,
>
> As per my original post, each host believed the *other* is a better
> candidate, with the result that neither would start the engine. As you
> may have read by now, the bug has been confirmed and a fix has been
> proposed.
Indeed!
I run in this bug also. I also applied Jiris fix.
However, for some reason one of my hosts showed a score of 2000; this is
why it was working for me it seems.
> Your claim that HA is working is incorrect. A system that requires
> manual intervention when something goes wrong is not HA.
>
> regards,
> John
>
>
> On 18/08/14 19:18, Daniel Helgenberger wrote:
>
> > Hello John,
> >
> >
> > On Mi, 2014-07-23 at 19:47 -0400, Jason Brooks wrote:
> > > ----- Original Message -----
> > > > From: "John Gardeniers" <jgardeniers at objectmastery.com>
> > > > To: "users" <users at ovirt.org>
> > > > Sent: Wednesday, July 23, 2014 4:29:45 PM
> > > > Subject: [ovirt-users] Self-hosted engine won't start
> > > >
> > > > Hi All,
> > > >
> > > > I have created a lab with 2 hypervisors and a self-hosted engine. Today
> > > > I followed the upgrade instructions as described in
> > > > http://www.ovirt.org/Hosted_Engine_Howto and rebooted the engine. I
> > > > didn't really do an upgrade but simply wanted to test what would happen
> > > > when the engine was rebooted.
> > > >
> > > > When the engine didn't restart I re-ran hosted-engine
> > > > --set-maintenance=none and restarted the vdsm, ovirt-ha-agent and
> > > > ovirt-ha-broker services on both nodes. 15 minutes later it still hadn't
> > > > restarted, so I then tried rebooting both hypervisers. After an hour
> > > > there was still no sign of the engine starting. The agent logs don't
> > > > help me much. The following bits are repeated over and over.
> > > >
> > > > ovirt1 (192.168.19.20):
> > > >
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:40,272::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> > > > Trying: notify time=1406157520.27 type=state_transition
> > > > detail=EngineDown-EngineDown hostname='ovirt1.om.net'
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:40,272::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> > > > Success, was notification of state_transition (EngineDown-EngineDown)
> > > > sent? ignored
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:40,594::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > > > Current state EngineDown (score: 2400)
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:40,594::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > > > Best remote host 192.168.19.21 (id: 2, score: 2400)
> > > >
> > > > ovirt2 (192.168.19.21):
> > > >
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:04,005::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> > > > Trying: notify time=1406157484.01 type=state_transition
> > > > detail=EngineDown-EngineDown hostname='ovirt2.om.net'
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:04,006::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> > > > Success, was notification of state_transition (EngineDown-EngineDown)
> > > > sent? ignored
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:04,324::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > > > Current state EngineDown (score: 2400)
> > > > MainThread::INFO::2014-07-24
> > > > 09:18:04,324::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> > > > Best remote host 192.168.19.20 (id: 1, score: 2400)
> > > >
> > > > From the above information I decided to simply shut down one hypervisor
> > > > and see what happens. The engine did start back up again a few minutes
> > > > later.
> > > I've seen this behavior, too.
> > >
> > > Jason
> > >
> > > > The interesting part is that each hypervisor seems to think the other is
> > > > a better host.
> > Where do you get this from? From the line:
> > 'Best remote host 192.168.19.20 (id: 1, score: 2400)' ?
> >
> > I assume this is not the case; HA broker just looking for the best
> > remote candidate.
> >
> > But I have also trouble with this behavior; esp. when I had the cluster
> > in global maintenance.
> > I resolve this by stating hosted engine manually in in global
> > maintenance and waiting for {"health": "good", "vm": "up", "detail":
> > "up"} and disabling global maintenance afterwards.
> >
> > I found the HA feature is indeed working - and tried out best by
> > manually stopping the engine service (service hosted-engine stop). IIRC
> > This should trigger a failover and reboot of the engine.
> >
> >
> > > The two machines are identical, so there's no reason I
> > > > can see for this odd behaviour. In a lab environment this is little more
> > > > than an annoying inconvenience. In a production environment it would be
> > > > completely unacceptable.
> > > >
> > > > May I suggest that this issue be looked into and some means found to
> > > > eliminate this kind of mutual exclusion? e.g. After a few minutes of
> > > > such an issue one hypervisor could be randomly given a slightly higher
> > > > weighting, which should result in it being chosen to start the engine.
> > > >
> > > > regards,
> > > > John
> > > > _______________________________________________
> > > > Users mailing list
> > > > Users at ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/users
> > > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/users
> >
> > Cheers,
> > Daniel
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4380 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140819/131be001/attachment-0001.bin>
More information about the Users
mailing list