Some thoughts on enhancing High Availability in oVirt

Tue Feb 14 16:40:20 UTC 2012

On 02/14/2012 06:31 PM, Adam Litke wrote:
> On Thu, Feb 09, 2012 at 11:45:09AM -0500, Perry Myers wrote:
>> warning: tl;dr
>>
>> Right now, HA in oVirt is limited to VM level granularity.  Each VM
>> provides a heartbeat through vdsm back to the oVirt Engine.  If that
>> heartbeat is lost, the VM is terminated and (if the user has configured
>> it) the VM is relaunched.  If the host running that VM has lost its
>> heartbeat, the host is fenced (via a remote power operation) and all HA
>> VMs are restarted on an alternate host.
>>
> Has anyone considered how live snapshots and live block copy will intersect HA
> to provide a better end-user experience?  For example, will we be able to handle
> a storage connection failure without power-cycling VMs by migrating storage to a
> failover storage domain and/or live-migrating the VM to a host with functioning
> storage connections?

I think migrating a paused VM (due to EIO) is something KVM is afraid to 
do - there might be in-flight (in the host already) data en-route to the 
storage.
I'm not entirely sure how you migrate the storage, when it's failed.
Y.

>
>> Also, the policies for controlling if/when a VM should be restarted are
>> somewhat limited and hardcoded.
>>
>> So there are two things that we can improve here:
>>
>> 1. Provide introspection into VMs so that we can monitor the health of
>>     individual services and not just the VM
>>
>> 2. Provide a more configurable way of expressing policy for when a VM
>>     (and its services) should trigger remediation by the HA subsystem
>>
>> We can tackle these two things in isolation, or we can try to combine
>> and solve them at the same time.
>>
>> Some possible paths (not the only ones) might be:
>>
> I also want to mention Memory Overcommitment Manager.  It hasn't been included
> in vdsm yet, but the patches will be hitting gerrit within the next couple of
> days.  MOM will contribute a single-host policy which is useful for making
> decisions about the condition of a host and applying remediation policies:
> ballooning, ksm, cgroups, vm ejection (migrating to another host).  It is
> lightweight and will integrate seamlessly with vdsm from an oVirt-engine
> perspective.
>
>> * Leverage Pacemaker Cloud (http://pacemaker-cloud.org/)
>>
>> Pacemaker Cloud works by providing a generic (read: virt mgmt system
>> agnostic) way of managing HA for virtual machines and their services.
>> At a high level the concept is that you define 1 or more virtual
>> machines to be in a application group, and pcmk-cloud spawns a process
>> to monitor that application group using either Matahari/QMF or direct
>> SSH access.
>>
>> pcmk-cloud is not meant to be a user facing component, so integration
>> work would need to be done here to have oVirt consume the pcmk-cloud
>> REST API for specifying what the application groups (sets of VMs) are
>> and exposing that through the oVirt web UI.
>>
>> pcmk-cloud at a high level has the following functions:
>>    + monitoring of services through Matahari/QMF/SSH
>>    + monitoring of VMs through Matahari/QMF/SSH/Deltacloud
>>    + control of services through Matahari/QMF/SSH
>>    + control of VMs through Deltacloud or the native provider (in this
>>      case oVirt Engine REST API)
>>    + policy engine/model (per application group) to make decisions about
>>      when to control services/VMs based on the monitoring input
>>
>> Integration decisions:
>>    + pcmk-cloud to use existing transports for monitoring/control
>>      (QMF/SSH) or do we leverage a new transport via vdsm/ovirt-guest-
>>      agent?
>>    + pcmk-cloud could act as the core policy engine to determine VM
>>      placement in the oVirt datacenter/clusters or it could be used
>>      solely for the monitoring/remediation aspect
>>
>>
>> * Leverage guest monitoring agents w/ ovirt-guest-agent
>>
>> This would be taking the Services Agent from Matahari (which is just a C
>> library) and utilizing it from the ovirt-guest-agent.  So oga would
>> setup recurring monitoring of services using this lib and use its
>> existing communication path with vdsm->oVirt Engine to report back
>> service events.  In turn, oVirt Engine would need to interpret these
>> events and then issue service control actions back to oga
>>
>> Conceptually this is very similar to using pcmk-cloud in the case where
>> pcmk-cloud utilizes information obtained through oga/vdsm through oVirt
>> Engine instead of communicating directly to Guests via QMF/SSH.  In
>> fact, taking this route would probably end up duplicating some effort
>> because effectively you'd need the pcmk-cloud concept of the Cloud
>> Application Policy Engine (formerly called DPE/Deployable Policy Engine)
>> built directly into oVirt Engine anyhow.
>>
>> So part of looking at this is determining how much reuse/integration of
>> existing components makes sense vs. just re-implementing similar concepts.
>>
>> I've cc'd folks from the HA community/pcmk-cloud and hopefully we can
>> have a bit of a discussion to determine the best path forward here.
>>
>> Perry
>> _______________________________________________
>> Arch mailing list
>> Arch at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/arch
>>