[Engine-devel] [Design for 3.2 RFE] Improving proxy selection algorithm for Power Management operations

Mon Nov 12 09:40:07 UTC 2012

----- Original Message -----
> From: "Simon Grinberg" <simon at redhat.com>
> To: "Itamar Heim" <iheim at redhat.com>
> Cc: "Eli Mesika" <emesika at redhat.com>, "engine-devel" <engine-devel at ovirt.org>
> Sent: Sunday, November 11, 2012 11:22:29 PM
> Subject: Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy selection algorithm for Power Management operations
> 
> 
> 
> ----- Original Message -----
> > From: "Itamar Heim" <iheim at redhat.com>
> > To: "Simon Grinberg" <simon at redhat.com>
> > Cc: "Eli Mesika" <emesika at redhat.com>, "engine-devel"
> > <engine-devel at ovirt.org>
> > Sent: Sunday, November 11, 2012 10:52:53 PM
> > Subject: Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy
> > selection algorithm for Power Management operations
> > 
> > On 11/11/2012 05:45 PM, Simon Grinberg wrote:
> > > 3. The directly selected hosts comes to accommodate two use
> > > cases:
> > >     -3.1- Switch failure - if the fence network for hosts in a
> > >     DC/Cluster have to split between two switches. Then you will
> > >     prefer to use hosts that are for sure on the other switch
> > >     -3.2- Legacy clusters merged into larger clusters due to a
> > >     move
> > >     to oVirt then the infrastructural may still fit to the legacy
> > >     connectivity - lot's of firewalls rules or direct connections
> > >     that limit access to fencing devices to specific hosts.
> > >     -3.3- Clustered applications within the VMs, you only want
> > >     your
> > >     peers to be allowed to fence you. This is limited for VMs
> > >     running on specific host group (affinity management that we
> > >     don't have yet, but we can lock VMs to specific hosts).
> > 
> > that's VMs asking to fence (stop) other VMs, not hosts. why are you
> > mixing it with host fencing?
> 
> What happens if the host on which the peer VM is down?
> You need to fence the host. I was thinking about preventing a race
> where the VM asks to fence it's peer while the engine fences the
> host. In this case the fence of the peer VM may be reported as
> failed (no option to send stop to the VM) while the host status is
> yet unknown, or worse may succeed after the host rebooted killing
> the VM again after it restarted.
> 
> To prevent that you request to fence the host instead of fencing the
> VM a. But you are right that it does not matter which host will do
> the fencing, I was thinking on the old stile infra.
> 
> 
> > 
> > >
> > >     Note that the above was not meant to accommodate any random
> > >     server, just hosts in the setup, hosts that already run VDSM.
> > >     Meaning that maybe instead of the FQDN we can just use
> > >     hostname
> > >     - so the UUID will be registered in the tables
> > >     I don't why it's so complex, if a host provided is removed
> > >     from
> > >     the system you either get a canDoAction to remove it from the
> > >     configuration as well (or a warning that this will remove the
> > >     host from the fencing configuration). Your only risk if all
> > >     of
> > >     them are removed, then you need to set the exclamation mark
> > >     again (power management is not configured for this host)
> > 
> > because this was a text field, and i don't like code having to know
> > to
> > check some obscure field and parse it for dependencies.
> > relations between entities are supposed to be via db referential
> > integrity if possible (we had some locking issues with these).
> > i prefer implementation will start with the more simple use case
> > not
> > covering these complexities.
> > 
> > 
> > > - 5. Thinking about it more, Though the chain is more generic and
> > > flexible, I would like to return to my original suggestion, of
> > > having just primary and secondary proxy:
> > >       Primary Proxy 1 => Drop down -> Any cluster host / Any DC
> > >       host / RHEV Manager / Named host out of the list of all the
> > >       hosts
> > >       Secondary Proxy 2 => Drop down -> Any cluster host / Any DC
> > >       host / RHEV Manager / Named host out of the list of all the
> > >       hosts
> > >       I think is simpler as far as a user is concerned and it's
> > >       simpler for us to implement two fields single value in
> > >       each.
> > >       And I don't believe we really need more, even in the simple
> > >       case of cluster only hosts, for clusters larger then 4
> > >       hosts
> > >       by the time you get to the secondary it may be too late.
> > >       Secondary is more critical for the 'Named host' option or
> > >       small clusters.
> > 
> > this is a bit simpler. but as for specifying a specific host:
> > - now you are asking to check two fields (proxy1, proxy2)
> > - probably to also alert if all these hosts moved to maint, or when
> >    moving them to another cluster, etc.
> > - it doesn't cover the use case of splitting between switches, sub
> > clusters, etc. as you are limited to two hosts, which may have been
> > moved to maint/shutdown for power saving, etc. (since you are using
> > a
> > static host assignment, rather than an implied group of hosts
> > (cluster,
> > dc, engine)
> 
> Are you offering to allow defining hosts-groups? :). I'll be happy if
> you do, we really need that for some cases of the affinity feature.
> Especially those involving multi-site.
> 
> Hosts group == "A set of named hosts within the same cluster" 

Reading again, I actually like it better then using specific host, it may be worth while to wait while making sure that when we implement this for SLA we design the hosts grouping generic enough to be used by the fencing mechanism. 

> 
> 
> > 
> > 
> >