
----- Original Message -----
From: "Simon Grinberg" <simon@redhat.com> To: "Itamar Heim" <iheim@redhat.com> Cc: "Eli Mesika" <emesika@redhat.com>, "engine-devel" <engine-devel@ovirt.org> Sent: Sunday, November 11, 2012 11:22:29 PM Subject: Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy selection algorithm for Power Management operations
----- Original Message -----
From: "Itamar Heim" <iheim@redhat.com> To: "Simon Grinberg" <simon@redhat.com> Cc: "Eli Mesika" <emesika@redhat.com>, "engine-devel" <engine-devel@ovirt.org> Sent: Sunday, November 11, 2012 10:52:53 PM Subject: Re: [Engine-devel] [Design for 3.2 RFE] Improving proxy selection algorithm for Power Management operations
On 11/11/2012 05:45 PM, Simon Grinberg wrote:
3. The directly selected hosts comes to accommodate two use cases: -3.1- Switch failure - if the fence network for hosts in a DC/Cluster have to split between two switches. Then you will prefer to use hosts that are for sure on the other switch -3.2- Legacy clusters merged into larger clusters due to a move to oVirt then the infrastructural may still fit to the legacy connectivity - lot's of firewalls rules or direct connections that limit access to fencing devices to specific hosts. -3.3- Clustered applications within the VMs, you only want your peers to be allowed to fence you. This is limited for VMs running on specific host group (affinity management that we don't have yet, but we can lock VMs to specific hosts).
that's VMs asking to fence (stop) other VMs, not hosts. why are you mixing it with host fencing?
What happens if the host on which the peer VM is down? You need to fence the host. I was thinking about preventing a race where the VM asks to fence it's peer while the engine fences the host. In this case the fence of the peer VM may be reported as failed (no option to send stop to the VM) while the host status is yet unknown, or worse may succeed after the host rebooted killing the VM again after it restarted.
To prevent that you request to fence the host instead of fencing the VM a. But you are right that it does not matter which host will do the fencing, I was thinking on the old stile infra.
Note that the above was not meant to accommodate any random server, just hosts in the setup, hosts that already run VDSM. Meaning that maybe instead of the FQDN we can just use hostname - so the UUID will be registered in the tables I don't why it's so complex, if a host provided is removed from the system you either get a canDoAction to remove it from the configuration as well (or a warning that this will remove the host from the fencing configuration). Your only risk if all of them are removed, then you need to set the exclamation mark again (power management is not configured for this host)
because this was a text field, and i don't like code having to know to check some obscure field and parse it for dependencies. relations between entities are supposed to be via db referential integrity if possible (we had some locking issues with these). i prefer implementation will start with the more simple use case not covering these complexities.
- 5. Thinking about it more, Though the chain is more generic and flexible, I would like to return to my original suggestion, of having just primary and secondary proxy: Primary Proxy 1 => Drop down -> Any cluster host / Any DC host / RHEV Manager / Named host out of the list of all the hosts Secondary Proxy 2 => Drop down -> Any cluster host / Any DC host / RHEV Manager / Named host out of the list of all the hosts I think is simpler as far as a user is concerned and it's simpler for us to implement two fields single value in each. And I don't believe we really need more, even in the simple case of cluster only hosts, for clusters larger then 4 hosts by the time you get to the secondary it may be too late. Secondary is more critical for the 'Named host' option or small clusters.
this is a bit simpler. but as for specifying a specific host: - now you are asking to check two fields (proxy1, proxy2) - probably to also alert if all these hosts moved to maint, or when moving them to another cluster, etc. - it doesn't cover the use case of splitting between switches, sub clusters, etc. as you are limited to two hosts, which may have been moved to maint/shutdown for power saving, etc. (since you are using a static host assignment, rather than an implied group of hosts (cluster, dc, engine)
Are you offering to allow defining hosts-groups? :). I'll be happy if you do, we really need that for some cases of the affinity feature. Especially those involving multi-site.
Hosts group == "A set of named hosts within the same cluster"
Reading again, I actually like it better then using specific host, it may be worth while to wait while making sure that when we implement this for SLA we design the hosts grouping generic enough to be used by the fencing mechanism.