[ovirt-users] Power failure recovery

Artyom Lukianov alukiano at redhat.com
Wed Jun 7 13:56:38 UTC 2017


Under the engine-config, I can see two variables that connected to the
restart of HA VM's
MaxNumOfTriesToRunFailedAutoStartVm: "Number of attempts to restart highly
available VM that went down unexpectedly" (Value Type: Integer)
RetryToRunAutoStartVmIntervalInSeconds: "How often to try to restart highly
available VM that went down unexpectedly (in seconds)" (Value Type: Integer)
And their default parameters are:
# engine-config -g MaxNumOfTriesToRunFailedAutoStartVm
MaxNumOfTriesToRunFailedAutoStartVm: 10 version: general
# engine-config -g RetryToRunAutoStartVmIntervalInSeconds
RetryToRunAutoStartVmIntervalInSeconds: 30 version: general

So check the engine.log if you do not see that the engine restarts the HA
VM's ten times, it is definitely a bug otherwise, you can just to play with
this parameters to adapt it to your case.
Best Regards

On Wed, Jun 7, 2017 at 12:52 PM, Chris Boot <bootc at bootc.net> wrote:

> Hi all,
>
> We've got a three-node "hyper-converged" oVirt 4.1.2 + GlusterFS cluster
> on brand new hardware. It's not quite in production yet but, as these
> things always go, we already have some important VMs on it.
>
> Last night the servers (which aren't yet on UPS) suffered a brief power
> failure. They all booted up cleanly and the hosted engine started up ~10
> minutes afterwards (presumably once the engine GlusterFS volume was
> sufficiently healed and the HA stack realised). So far so good.
>
> As soon at the HostedEngine started up it tried to start all our Highly
> Available VMs. Unfortunately our master storage domain was as yet
> inactive as GlusterFS was presumably still trying to get it healed.
> About 10 minutes later the master domain was activated and
> "reconstructed" and an SPM was selected, but oVirt had tried and failed
> to start all the HA VMs already and didn't bother trying again.
>
> All the VMs started just fine this morning when we realised what
> happened and logged-in to oVirt to start them.
>
> Is this known and/or expected behaviour? Can we do anything to delay
> starting HA VMs until the storage domains are there? Can we get oVirt to
> keep trying to start HA VMs when they fail to start?
>
> Is there a bug for this already or should I be raising one?
>
> Thanks,
> Chris
>
> --
> Chris Boot
> bootc at bootc.net
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170607/42da3412/attachment.html>


More information about the Users mailing list