[Engine-devel] Autorecovery feature plan for review

Ayal Baron abaron at redhat.com
Wed Feb 15 16:28:11 UTC 2012



----- Original Message -----
> Hi,
> 
> A short summary from the call today, please correct me if I forgot or
> misunderstood something.
> 
> Ayal argued that the failed host/storagedomain should be reactivated
> by a periodically executed job, he would prefer if the engine could
> [try to] correct the problem right on discovery.
> Livnat's point was that this is hard to implement and it is OK if we
> move it to Nonoperational state and periodically check it again.
> 
> There was a little arguing if we call the current behavior a bug or a
> missing behavior, I believe this is not quite important.
> 
> I did not fully understand the last few sentences from Livant, did we
> manage to agree in a change in the plan?

A couple of points that we agreed upon:
1. no need for new mechanism, just initiate this from the monitoring context.
   Preferably, if not difficult, evaluate the monitoring data, if host should remain in non-op then don't bother running initVdsOnUp
2. configuration of when to call initvdsonup is orthogonal to auto-init behaviour and if introduced should be on by default and user should be able to configure this either on or off for the host in general (no lower granularity) and can only be configured via the API.
When disabled initVdsOnUp would be called only when admin activates the host/storage and any error would keep it inactive (I still don't understand why this is at all needed but whatever).

Note that going forward what I envision is engine pushing down the entire host configuration once and from that point on the host would try to keep this configuration up and running.  Once this happens there will be no need for initVdsOnUp at all.


> 
> Anyway, I agree with Ayal that it would be very nice if the engine
> could fix the issues right on discovery, but I also agree that this
> feature would take a bigger effort. It would be nice to know what
> effort it would take to get the monitoring do this safely. Could we
> still call it monitoring then?
> 
> Laszlo
> 
> ----- Original Message -----
> > From: "Ayal Baron" <abaron at redhat.com>
> > To: "Laszlo Hornyak" <lhornyak at redhat.com>
> > Cc: engine-devel at ovirt.org, "Yaniv Kaul" <ykaul at redhat.com>
> > Sent: Wednesday, February 15, 2012 12:46:05 PM
> > Subject: Re: [Engine-devel] Autorecovery feature plan for review
> > 
> > 
> > 
> > ----- Original Message -----
> > > Hi Ayal,
> > > 
> > > ----- Original Message -----
> > > > From: "Ayal Baron" <abaron at redhat.com>
> > > > To: "Yaniv Kaul" <ykaul at redhat.com>
> > > > Cc: engine-devel at ovirt.org
> > > > Sent: Wednesday, February 15, 2012 12:19:48 PM
> > > > Subject: Re: [Engine-devel] Autorecovery feature plan for
> > > > review
> > > > 
> > > > 
> > > > > 
> > > > > I still fail to understand why you 'punish' existing objects
> > > > > and
> > > > > not
> > > > > giving them the new feature enabled by default.
> > > > 
> > > > This is not a feature, it's a bug!
> > > 
> > > Whatever we call it, it is a change in behavior. We agreed that
> > > it
> > > will be enabled for all existing objects by default.
> > > 
> > > http://globalnerdy.com/wordpress/wp-content/uploads/2007/12/bug_vs_feature.gif
> > > 
> > > > This should not be treated as a feature and this should not be
> > > > configurable!
> > > 
> > > I can imagine some situations when I would not like the
> > > autorecovery
> > > to happen, but if everyone agrees not to make it configurable, I
> > > will just remove it from my patchset.
> > 
> > It's not autorecovery, you're not recovering anything.  You're
> > reflecting the fact that the resource is back to normal (not due to
> > anything that the engine did).
> > This is why it is a bug today.
> > This is why it should not be configurable.
> > 
> > > 
> > > > Today an object moves to non-operational due to state reported
> > > > by
> > > > vdsm.  The object should immediately return to up the moment
> > > > vdsm
> > > > reports the object as ok (this means that you don't stop
> > > > monitoring
> > > > just because there is an error).
> > > > That's it. no db field and no nothing...
> > > > This pertains to storage domains, network, host status,
> > > > whatever.
> > > > 
> > > > > Y.
> > > > > 
> > > > > > b. In environment to be clean installed -we have 0 existing
> > > > > > entities -
> > > > > > after clean install all new entities in the system will be
> > > > > > create
> > > > > > with
> > > > > > auto recoverable set to true.
> > > > > > Will this be considered a bad behavior?
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Engine-devel mailing list
> > > > > > Engine-devel at ovirt.org
> > > > > > http://lists.ovirt.org/mailman/listinfo/engine-devel
> > > > > 
> > > > > _______________________________________________
> > > > > Engine-devel mailing list
> > > > > Engine-devel at ovirt.org
> > > > > http://lists.ovirt.org/mailman/listinfo/engine-devel
> > > > > 
> > > > _______________________________________________
> > > > Engine-devel mailing list
> > > > Engine-devel at ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/engine-devel
> > > > 
> > > 
> > 
> 



More information about the Engine-devel mailing list