<p dir="ltr">Hi,</p>
<p dir="ltr">On Apr 9, 2014 5:43 PM, "Martin Sivak" <<a href="mailto:msivak@redhat.com">msivak@redhat.com</a>> wrote:<br>
><br>
> Hi,<br>
><br>
> > I noticed this happens too, I think the issue is after N attempts the<br>
> > ovirt-ha-agent process will kill itself if it believes it can't access<br>
> > the storage or it fails in some other way.<br>
><br>
> If the agent can't access storage or VDSM it waits for 60 seconds and tries again. After three (iirc) failed attempts it shuts down.</p>
<p dir="ltr">Is there any reason it shuts down? Could it not be possible to just have it sleep for x minutes? Have that sleep time exponentially scale after each fail.<br>
><br>
> > The ovirt-ha-broker service<br>
> > however still remains and continues to calculate the score.<br>
><br>
> The broker acts only as a data link, the score is computed by the agent. The broker is used to propagate it to storage (and to collect data).</p>
<p dir="ltr">Thanks for clarifying, I remember seeing some reference to score in the broker log. Assumed incorrectly.<br>
><br>
> > It'll be<br>
> > nice I guess if it could pro-actively restart the ha-agent every now<br>
> > and then.<br>
><br>
> We actually have a bug that is related to this: <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1030441">https://bugzilla.redhat.com/show_bug.cgi?id=1030441</a><br>
><br>
> Greg, are you still working on it?<br>
><br>
> > > What is the supposed procedure after a shutdown (graceful / ungraceful)<br>
> > > of Hosted-Engine HA nodes? Should the engine recover by itself? Should<br>
> > > the running VM's be restarted automatically?<br>
><br>
> If the agent-broker pair recovers and sanlock is not preventing taking the lock (which was not released properly) then the engine VM should be started automatically.<br>
><br>
> > If all the nodes come up at the same time, in my testing, it took 10<br>
> > minutes for the ha-agents to settle and then finally decide which host<br>
> > to bring up the engine.<br>
><br>
> We set a 10 minute mandatory down time for a host when a VM start is not successful. That might be because the sanlock still things somebody is running the VM. The /var/log/ovirt-hosted-engine-ha/agent.log would help here.<br>
><br>
> Regards<br>
> --<br>
> Martin Sivák<br>
> <a href="mailto:msivak@redhat.com">msivak@redhat.com</a><br>
> Red Hat Czech<br>
> RHEV-M SLA / Brno, CZ<br>
><br>
> ----- Original Message -----<br>
> > On Wed, Apr 9, 2014 at 2:09 AM, Daniel Helgenberger<br>
> > <<a href="mailto:daniel.helgenberger@m-box.de">daniel.helgenberger@m-box.de</a>> wrote:<br>
> > > Hello,<br>
> > ><br>
> > > I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for<br>
> > > production use.<br>
> > ><br>
> > > I "simulated" an ungraceful shutdown of all HA nodes (powercut) while<br>
> > > the engine was running. After powering up, the system did not recover<br>
> > > itself (it seemed).<br>
> > > I had to restart the ovirt-hosted-ha service (witch was in a locked<br>
> > > state) and then manually run 'hosted-engine --vm-start'.<br>
> ><br>
> > I noticed this happens too, I think the issue is after N attempts the<br>
> > ovirt-ha-agent process will kill itself if it believes it can't access<br>
> > the storage or it fails in some other way. The ovirt-ha-broker service<br>
> > however still remains and continues to calculate the score. It'll be<br>
> > nice I guess if it could pro-actively restart the ha-agent every now<br>
> > and then.<br>
> ><br>
> > ><br>
> > > What is the supposed procedure after a shutdown (graceful / ungraceful)<br>
> > > of Hosted-Engine HA nodes? Should the engine recover by itself? Should<br>
> > > the running VM's be restarted automatically?<br>
> ><br>
> > I don't think any other VMs get restarted automatically, this is<br>
> > because the engine is used to ensure that the VM hasn't been restarted<br>
> > on another host. This is where power management etc comes into play.<br>
> ><br>
> > If all the nodes come up at the same time, in my testing, it took 10<br>
> > minutes for the ha-agents to settle and then finally decide which host<br>
> > to bring up the engine. Then technically... (untested) any VMs which<br>
> > you've marked as HA should be automatically brought back up by the<br>
> > engine. This would be 15-20 minutes to recover which feels a little<br>
> > slow.. although fairly automatic.<br>
> ><br>
> > ><br>
> > > Thanks,<br>
> > > Daniel<br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > ><br>
> > > _______________________________________________<br>
> > > Users mailing list<br>
> > > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
> > > <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> > ><br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
> > <a href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a><br>
> ><br>
</p>