Some thoughts on enhancing High Availability in oVirt

Perry Myers pmyers at redhat.com
Wed Feb 15 01:41:29 UTC 2012


> As long as you expect the VM to enforce reliability on the raw
> storage devices then you are going to have problems with restarting
> HA VMs. If you switch your thinking to making the storage operations
> HA, then all you need is a response cache.
> 
> A restarted VM replays the operation, and the cached response is
> retransmitted (or the operation is benignly re-applied). Without
> defining the operations so that they can be benignly re-applied or
> adding a response cache you will always be able to come up with some
> order of failure that won't work. There is no cost-effective way to
> guarantee that you snapshot the VM only when there is no in-flight
> storage activity.

How is this any different than a bare metal host crashing while writes
are in flight either to a local disk or FC disk?  When something crashes
(be it physical or virtual) you're always going to lose some data that
was in flight but not committed to disk (network has same issue).  It's
up to individual applications to be resilient to this.

I think this issue is somewhat orthogonal to simply providing reduced
MTTR by restarting failed services or VMs.



More information about the Arch mailing list