Hi Chris,
I have an Ovirt + Dell Compellent similar to yours (previous model, not
SC8000) and sometimes I faced issues similar to yours.
From my experience I can advise you to
A) check links between
SAN and servers, all paths, all configuration,
cabling. Everything should be setup correctly (all redundant paths green,
server mappings etc) BEFORE installing ovirt. We had a running KVM
environment before "upgrading" it to ovirt 3.5.1
B) Also check fencing is working both manually and automatically
(connections to iDRAC etc). This is a kind of pre-requisite to have HA
working.
C) I also noticed that when something is not going well on one of the
shared storage, this brings down the whole cluster (VM run, but a lot of
headaches being). First of all note that ovirt tries to stabilize the
situation itself for as long as ~15 minutes or more. It is slow in
re-fencing etc. Sometimes it enters in a loop and you have to locate the
problematic storage. You want to check the multipath on every server is
working correctly.
If you are having problems with just two nodes, I guess something is not
really ok at configuration level. I have 2 clusters, 12 hosts and several
(lots) of shared storage working and usually when something goes wrong is
because of an human error (like when I deleted the LUN on the SAN before
destroying the storage on the ovirt interface).
On the hand, I have the overall impression that the system is not
forgiving at all and that it is far from being rock solid.
Cheers
AG