Some thoughts on enhancing High Availability in oVirt

Tue Feb 21 19:18:42 UTC 2012

>> - How does CAPE makes the decision that it is 'safe' to restart the
>> resource?
> 
> it terminates it via deltacloud.  This may not be sufficient, as in your
> use case.  We could add additional support here to do extra fencing
> operations to match the underlying IAAS platform.

I don't see why this would be necessary.

1. CAPE makes a call to deltacloud (stop VM A)
2. deltacloud in turn uses oVirt REST API to issue a stop VM A command
3. oVirt Engine tries to stop the VM via vdsm (fencing the VM itself)
4. if the VM cannot be terminated due to host being inaccessible, OE at
   that point would fall back to host fencing

So the IAAS platform is responsible for ensuring that a 'stop VM'
command via deltacloud results in either:

1. success with assurance that the VM has been terminated by some means
   (be it VM fencing or host fencing)
2. failure, which could mean that an unexpected error occurred or that
   host fencing ultimately failed to power down the host

What should never happen in oVirt Engine is:

3. success, but oVirt Engine is not sure if the VM is terminated or not

As that could lead to some nice data corruption :)

In this sense, pcmk-cloud using deltacloud to talk to oVirt Engine for
VM lifecycle control is no different than how pcmk-cloud would use
deltacloud for talking to EC2 or any other cloud provider