[Users] Hosted Engine recovery failure of all HA - nodes

Andrew Lau andrew at andrewklau.com
Wed Apr 9 03:03:38 UTC 2014


On Wed, Apr 9, 2014 at 2:09 AM, Daniel Helgenberger
<daniel.helgenberger at m-box.de> wrote:
> Hello,
>
> I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
> production use.
>
> I "simulated" an ungraceful shutdown of all HA nodes (powercut) while
> the engine was running. After powering up, the system did not recover
> itself (it seemed).
> I had to restart the ovirt-hosted-ha service (witch was in a locked
> state) and then manually run 'hosted-engine --vm-start'.

I noticed this happens too, I think the issue is after N attempts the
ovirt-ha-agent process will kill itself if it believes it can't access
the storage or it fails in some other way. The ovirt-ha-broker service
however still remains and continues to calculate the score. It'll be
nice I guess if it could pro-actively restart the ha-agent every now
and then.

>
> What is the supposed procedure after a shutdown (graceful / ungraceful)
> of Hosted-Engine HA nodes? Should the engine recover by itself? Should
> the running VM's be restarted automatically?

I don't think any other VMs get restarted automatically, this is
because the engine is used to ensure that the VM hasn't been restarted
on another host. This is where power management etc comes into play.

If all the nodes come up at the same time, in my testing, it took 10
minutes for the ha-agents to settle and then finally decide which host
to bring up the engine. Then technically... (untested) any VMs which
you've marked as HA should be automatically brought back up by the
engine. This would be 15-20 minutes to recover which feels a little
slow.. although fairly automatic.

>
> Thanks,
> Daniel
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



More information about the Users mailing list