Re: [ovirt-devel] [ovirt-users] oVirt HA.

On 29/04/15 21:53, Dan Yasny wrote:
There is always room for improvement, but think about it: ever since SolidICE, there has been a demand to minimize the amount of hardware used in a minimalistic setup, thus the hosted engine project. And now that we have it, all of a sudden, we need to provide a way to make multiple engines work in active/passive mode? If that capability is provided, I'm sure a new demand will arise, asking for active/active engines, infinitely scalable, and so on.
of course you wan't active/active clusters for an enterprise product, rather sooner than later
The question really is, where the line is drawn. The engine downtime can be a few minutes, it's not that critical in setups of hundreds of hosts. oVirt's raison d'etre is to make VMs run, everything else is just plumbing around that.
I disagree: ovirt is a provider of critical infrastructure (vms and their management) for modern it business. imagine a large organisation just using ovirt for their virtualization, with lots of different departments which at will can spawn their own vms, maybe even from different countrys with different time zones (just like red hat ;) ). of course, if just the engine service is down for some reason and you can just restart it with an outage of some seconds, or maybe a minute - fine. but everything above a minute could become critical for large orgs relying on the ability to spawn vms at any given time. or imagine critical HA vms running on ovirt: you can't migrate them, when the engine is not running. you might not even want a downtime of a single second for them, that's why you implemented things like live migration in the first place. the bottom line is: if you manage critical infrastructure, the tools to manage this infrastructure have to be as reliable as the infrastructure itself. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

On Thu, Apr 30, 2015 at 8:12 AM, Sven Kieske <s.kieske@mittwald.de> wrote:
but everything above a minute could become critical for large orgs relying on the ability to spawn vms at any given time.
or imagine critical HA vms running on ovirt: you can't migrate them, when the engine is not running. you might not even want a downtime of a single second for them, that's why you implemented things like live migration in the first place.
the bottom line is: if you manage critical infrastructure, the tools to manage this infrastructure have to be as reliable as the infrastructure itself.
If there is any interest I can revamp a testbed similar to what already done about one year ago with CentOS 6.5 and oVirt 3.3.3 See for summary about my configuration here: http://lists.ovirt.org/pipermail/users/2014-March/022176.html At that time I configured the cluster with Pacemaker/cman and the resource split on the two-node cluster was something like this Last updated: Wed Mar 5 18:07:51 2014 Last change: Wed Mar 5 18:07:51 2014 via crm_resource on ovirteng01.localdomain.local Stack: cman Current DC: ovirteng01.localdomain.local - partition with quorum Version: 1.1.10-14.el6_5.2-368c726 2 Nodes configured 14 Resources configured Online: [ ovirteng01.localdomain.local ovirteng02.localdomain.local ] Master/Slave Set: ms_OvirtData [OvirtData] Masters: [ ovirteng01.localdomain.local ] Slaves: [ ovirteng02.localdomain.local ] Resource Group: ovirt ip_OvirtData (ocf::heartbeat:IPaddr2): Started ovirteng01.localdomain.local lvm_ovirt (ocf::heartbeat:LVM): Started ovirteng01.localdomain.local fs_OvirtData (ocf::heartbeat:Filesystem): Started ovirteng01.localdomain.local pgsql_OvirtData (lsb:postgresql): Started ovirteng01.localdomain.local ovirt-engine (lsb:ovirt-engine): Started ovirteng01.localdomain.local ovirt-websocket-proxy (lsb:ovirt-websocket-proxy): Started ovirteng01.localdomain.local httpd (ocf::heartbeat:apache): Started ovirteng01.localdomain.local Clone Set: p_lsb_nfs-clone [p_lsb_nfs] Started: [ ovirteng01.localdomain.local ovirteng02.localdomain.local ] Clone Set: p_exportfs_root-clone [p_exportfs_root] Started: [ ovirteng01.localdomain.local ovirteng02.localdomain.local ] there were some customizations I had to do related to ovirt-engine service init script and to setup HA for POstgreSQL. I already have as a task to dig into cluster changes for CentOS 7.1 and so I can try to see how it adapts with oVirt 3.5 too. Gianluca

----- Original Message -----
From: "Sven Kieske" <s.kieske@mittwald.de> To: devel@ovirt.org Cc: users@ovirt.org Sent: Thursday, April 30, 2015 2:12:55 AM Subject: Re: [ovirt-users] oVirt HA.
On 29/04/15 21:53, Dan Yasny wrote:
There is always room for improvement, but think about it: ever since SolidICE, there has been a demand to minimize the amount of hardware used in a minimalistic setup, thus the hosted engine project. And now that we have it, all of a sudden, we need to provide a way to make multiple engines work in active/passive mode? If that capability is provided, I'm sure a new demand will arise, asking for active/active engines, infinitely scalable, and so on.
of course you wan't active/active clusters for an enterprise product, rather sooner than later
No doubt there, however, that's not *just* HA any longer :)
The question really is, where the line is drawn. The engine downtime can be a few minutes, it's not that critical in setups of hundreds of hosts. oVirt's raison d'etre is to make VMs run, everything else is just plumbing around that.
I disagree:
ovirt is a provider of critical infrastructure (vms and their management) for modern it business.
imagine a large organisation just using ovirt for their virtualization, with lots of different departments which at will can spawn their own vms, maybe even from different countrys with different time zones (just like red hat ;) ).
of course, if just the engine service is down for some reason and you can just restart it with an outage of some seconds, or maybe a minute - fine.
but everything above a minute could become critical for large orgs relying on the ability to spawn vms at any given time.
I think you're getting away from the point here. If the hosted engine's HA isn't fast enough, you can cluster the engine in other ways, that were available way before hosted engine came to be.
or imagine critical HA vms running on ovirt: you can't migrate them, when the engine is not running. you might not even want a downtime of a single second for them, that's why you implemented things like live migration in the first place.
the bottom line is: if you manage critical infrastructure, the tools to manage this infrastructure have to be as reliable as the infrastructure itself.
-- Mit freundlichen Grüßen / Regards
Sven Kieske
Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 30/04/15 15:14, Dan Yasny wrote:
No doubt there, however, that's not *just* HA any longer :)
sorry for being a nitpick and to quote wikipedia but: "There are three principles of high availability engineering. They are 1. Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system.[..]" e.g active/active clusters ;) I promise this will be my last mail to this thread :-) -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH & Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
participants (3)
-
Dan Yasny
-
Gianluca Cecchi
-
Sven Kieske