<div dir="ltr">Under the engine-config, I can see two variables that connected to the restart of HA VM's<div><span style="font-family:monospace"><span style="color:rgb(0,0,0)">MaxNumOfTriesToRunFailedAutoStartVm: "Number of attempts to restart </span><font color="#000000">highly</font><span style="color:rgb(0,0,0)"> available VM that went down unexpectedly" (Value Type: Integer) </span><br>RetryToRunAutoStartVmIntervalInSeconds: "How often to try to restart <font color="#000000">high</font><span style="color:rgb(0,0,0)">ly available VM that went down unexpectedly (in seconds)" (Value Type: Integer)</span></span></div><div>And their default parameters are:<span style="font-family:monospace"><font color="#000000"><br></font></span><div><span style="font-family:monospace"><span style="color:rgb(0,0,0)"># engine-config -g MaxNumOfTriesToRunFailedAutoStartVm </span><br>MaxNumOfTriesToRunFailedAutoStartVm: 10 version: general <br># engine-config -g RetryToRunAutoStartVmIntervalInSeconds <br>RetryToRunAutoStartVmIntervalInSeconds: 30 version: general</span></div><div><span style="font-family:monospace"><br></span></div><div><font face="arial, helvetica, sans-serif">So check the</font><span style="font-family:monospace"> engine.log</span><font face="arial, helvetica, sans-serif"> if you do not see that the engine restarts the HA VM's ten times, it is definitely a bug otherwise, you can just to play with this parameters to adapt it to your case.</font><br><div><div><div>Best Regards</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 7, 2017 at 12:52 PM, Chris Boot <span dir="ltr"><<a href="mailto:bootc@bootc.net" target="_blank">bootc@bootc.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
We've got a three-node "hyper-converged" oVirt 4.1.2 + GlusterFS cluster<br>
on brand new hardware. It's not quite in production yet but, as these<br>
things always go, we already have some important VMs on it.<br>
<br>
Last night the servers (which aren't yet on UPS) suffered a brief power<br>
failure. They all booted up cleanly and the hosted engine started up ~10<br>
minutes afterwards (presumably once the engine GlusterFS volume was<br>
sufficiently healed and the HA stack realised). So far so good.<br>
<br>
As soon at the HostedEngine started up it tried to start all our Highly<br>
Available VMs. Unfortunately our master storage domain was as yet<br>
inactive as GlusterFS was presumably still trying to get it healed.<br>
About 10 minutes later the master domain was activated and<br>
"reconstructed" and an SPM was selected, but oVirt had tried and failed<br>
to start all the HA VMs already and didn't bother trying again.<br>
<br>
All the VMs started just fine this morning when we realised what<br>
happened and logged-in to oVirt to start them.<br>
<br>
Is this known and/or expected behaviour? Can we do anything to delay<br>
starting HA VMs until the storage domains are there? Can we get oVirt to<br>
keep trying to start HA VMs when they fail to start?<br>
<br>
Is there a bug for this already or should I be raising one?<br>
<br>
Thanks,<br>
Chris<br>
<span class="gmail-m_4641259026813353560HOEnZb"><font color="#888888"><br>
--<br>
Chris Boot<br>
<a href="mailto:bootc@bootc.net" target="_blank">bootc@bootc.net</a><br>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
</font></span></blockquote></div><br></div></div></div></div></div></div>