Re: [Engine-devel] Autorecovery feature plan for review

15 Feb 2012


      On 15/02/12 18:28, Ayal Baron wrote:
...
----- Original Message -----
...
Hi,
A short summary from the call today, please correct me if I forgot or
misunderstood something.
Ayal argued that the failed host/storagedomain should be reactivated
by a periodically executed job, he would prefer if the engine could
[try to] correct the problem right on discovery.
Livnat's point was that this is hard to implement and it is OK if we
move it to Nonoperational state and periodically check it again.
There was a little arguing if we call the current behavior a bug or a
missing behavior, I believe this is not quite important.
I did not fully understand the last few sentences from Livant, did we
manage to agree in a change in the plan?
A couple of points that we agreed upon:
1. no need for new mechanism, just initiate this from the monitoring context.
   Preferably, if not difficult, evaluate the monitoring data, if host should remain in non-op then don't bother running initVdsOnUp
2. configuration of when to call initvdsonup is orthogonal to auto-init behaviour and if introduced should be on by default and user should be able to configure this either on or off for the host in general (no lower granularity) and can only be configured via the API.
When disabled initVdsOnUp would be called only when admin activates the host/storage and any error would keep it inactive (I still don't understand why this is at all needed but whatever).
Also a note from Moran on the call was to check if we can unify the
non-operational and Error statuses of the host.
It was mentioned on the call that the reason for having ERROR state is
for recovery (time out of the error state) but since we are about to
recover from non-operational status as well there is no reason to have
two different statuses.
...
Note that going forward what I envision is engine pushing down the entire host configuration once and from that point on the host would try to keep this configuration up and running.  Once this happens there will be no need for initVdsOnUp at all.
...
Anyway, I agree with Ayal that it would be very nice if the engine
could fix the issues right on discovery, but I also agree that this
feature would take a bigger effort. It would be nice to know what
effort it would take to get the monitoring do this safely. Could we
still call it monitoring then?
Basically the monitoring flow moves the host to non-operational, what
Ayal suggests is that it will also trigger the recovery flow
(initialization flow).

I think that modeling it to be triggered from the monitoring flow will
block monitoring of the host during the initialization flow which can
save us races going forward.
Let's see if we can design the solution to be triggered by the monitoring.
...
...
Laszlo
----- Original Message -----
...
From: "Ayal Baron" <abaron@redhat.com>
To: "Laszlo Hornyak" <lhornyak@redhat.com>
Cc: engine-devel@ovirt.org, "Yaniv Kaul" <ykaul@redhat.com>
Sent: Wednesday, February 15, 2012 12:46:05 PM
Subject: Re: [Engine-devel] Autorecovery feature plan for review
----- Original Message -----
...
Hi Ayal,
----- Original Message -----
...
From: "Ayal Baron" <abaron@redhat.com>
To: "Yaniv Kaul" <ykaul@redhat.com>
Cc: engine-devel@ovirt.org
Sent: Wednesday, February 15, 2012 12:19:48 PM
Subject: Re: [Engine-devel] Autorecovery feature plan for
review
...
I still fail to understand why you 'punish' existing objects
and
not
giving them the new feature enabled by default.
This is not a feature, it's a bug!
Whatever we call it, it is a change in behavior. We agreed that
it
will be enabled for all existing objects by default.
http://globalnerdy.com/wordpress/wp-content/uploads/2007/12/bug_vs_feature.g...
...
This should not be treated as a feature and this should not be
configurable!
I can imagine some situations when I would not like the
autorecovery
to happen, but if everyone agrees not to make it configurable, I
will just remove it from my patchset.
It's not autorecovery, you're not recovering anything.  You're
reflecting the fact that the resource is back to normal (not due to
anything that the engine did).
This is why it is a bug today.
This is why it should not be configurable.
...
...
Today an object moves to non-operational due to state reported
by
vdsm.  The object should immediately return to up the moment
vdsm
reports the object as ok (this means that you don't stop
monitoring
just because there is an error).
That's it. no db field and no nothing...
This pertains to storage domains, network, host status,
whatever.
...
Y.
> b. In environment to be clean installed -we have 0 existing
> entities -
> after clean install all new entities in the system will be
> create
> with
> auto recoverable set to true.
> Will this be considered a bad behavior?
>
>
> _______________________________________________
> Engine-devel mailing list
> Engine-devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________
Engine-devel mailing list
Engine-devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________
Engine-devel mailing list
Engine-devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________
Engine-devel mailing list
Engine-devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel

Re: [Engine-devel] Autorecovery feature plan for review

Livnat Peer