I finally got around to running some tests on our environment, are you getting the same case where as when one host drops the VM ends up in a paused state and can't be migrated?

With your case, you should be able to obtain full HA, quorum is just a protection for split brain. If you enable the migration policy to migrate all VMs, in theory the VMs from the crashed node I assume should migrate to the other node when it sees the node is offline. 

I was wondering if this may be because of the VM reads directly from the gluster storage server and there doesn't seem to be any fail over? Would a NFS solution with keepalived across the two servers fix this issue as the connection would be isolated to IP address rather than the single gluster node? I'm not too familiar with completely how the libgfapi protocol works.

Could anyone else chime in?


I finally took the following configuration :

- Migration policy is "don't migrate"
- cluster.server-quorum-type is none
- cluster.quorum-type is none

When a host is down, a manual migration allows me to use the other.
Later, I'll add another host so that I get a real HA.

