Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy selection algorithm for Power Management operations

12 Nov 2012


      ----- Original Message -----
...
From: "Simon Grinberg" <simon@redhat.com>
To: "Itamar Heim" <iheim@redhat.com>
Cc: "Eli Mesika" <emesika@redhat.com>, "engine-devel" <engine-devel@ovirt.org>
Sent: Sunday, November 11, 2012 11:22:29 PM
Subject: Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy selection algorithm for Power Management operations
----- Original Message -----
...
From: "Itamar Heim" <iheim@redhat.com>
To: "Simon Grinberg" <simon@redhat.com>
Cc: "Eli Mesika" <emesika@redhat.com>, "engine-devel"
<engine-devel@ovirt.org>
Sent: Sunday, November 11, 2012 10:52:53 PM
Subject: Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy
selection algorithm for Power Management operations
On 11/11/2012 05:45 PM, Simon Grinberg wrote:
...
3. The directly selected hosts comes to accommodate two use
cases:
    -3.1- Switch failure - if the fence network for hosts in a
    DC/Cluster have to split between two switches. Then you will
    prefer to use hosts that are for sure on the other switch
    -3.2- Legacy clusters merged into larger clusters due to a
    move
    to oVirt then the infrastructural may still fit to the legacy
    connectivity - lot's of firewalls rules or direct connections
    that limit access to fencing devices to specific hosts.
    -3.3- Clustered applications within the VMs, you only want
    your
    peers to be allowed to fence you. This is limited for VMs
    running on specific host group (affinity management that we
    don't have yet, but we can lock VMs to specific hosts).
that's VMs asking to fence (stop) other VMs, not hosts. why are you
mixing it with host fencing?
What happens if the host on which the peer VM is down?
You need to fence the host. I was thinking about preventing a race
where the VM asks to fence it's peer while the engine fences the
host. In this case the fence of the peer VM may be reported as
failed (no option to send stop to the VM) while the host status is
yet unknown, or worse may succeed after the host rebooted killing
the VM again after it restarted.
To prevent that you request to fence the host instead of fencing the
VM a. But you are right that it does not matter which host will do
the fencing, I was thinking on the old stile infra.
...
...
Note that the above was not meant to accommodate any random
    server, just hosts in the setup, hosts that already run VDSM.
    Meaning that maybe instead of the FQDN we can just use
    hostname
    - so the UUID will be registered in the tables
    I don't why it's so complex, if a host provided is removed
    from
    the system you either get a canDoAction to remove it from the
    configuration as well (or a warning that this will remove the
    host from the fencing configuration). Your only risk if all
    of
    them are removed, then you need to set the exclamation mark
    again (power management is not configured for this host)
because this was a text field, and i don't like code having to know
to
check some obscure field and parse it for dependencies.
relations between entities are supposed to be via db referential
integrity if possible (we had some locking issues with these).
i prefer implementation will start with the more simple use case
not
covering these complexities.
...
- 5. Thinking about it more, Though the chain is more generic and
flexible, I would like to return to my original suggestion, of
having just primary and secondary proxy:
      Primary Proxy 1 => Drop down -> Any cluster host / Any DC
      host / RHEV Manager / Named host out of the list of all the
      hosts
      Secondary Proxy 2 => Drop down -> Any cluster host / Any DC
      host / RHEV Manager / Named host out of the list of all the
      hosts
      I think is simpler as far as a user is concerned and it's
      simpler for us to implement two fields single value in
      each.
      And I don't believe we really need more, even in the simple
      case of cluster only hosts, for clusters larger then 4
      hosts
      by the time you get to the secondary it may be too late.
      Secondary is more critical for the 'Named host' option or
      small clusters.
this is a bit simpler. but as for specifying a specific host:
- now you are asking to check two fields (proxy1, proxy2)
- probably to also alert if all these hosts moved to maint, or when
   moving them to another cluster, etc.
- it doesn't cover the use case of splitting between switches, sub
clusters, etc. as you are limited to two hosts, which may have been
moved to maint/shutdown for power saving, etc. (since you are using
a
static host assignment, rather than an implied group of hosts
(cluster,
dc, engine)
Are you offering to allow defining hosts-groups? :). I'll be happy if
you do, we really need that for some cases of the affinity feature.
Especially those involving multi-site.
Hosts group == "A set of named hosts within the same cluster"
Reading again, I actually like it better then using specific host, it may be worth while to wait while making sure that when we implement this for SLA we design the hosts grouping generic enough to be used by the fencing mechanism.
...
...