Great, thanks for clarification. ---- On Thu, 25 Oct 2018 13:07:58 +0100 Simone Tiraboschi
<stirabos(a)redhat.com> wrote ---- On Thu, Oct 25, 2018 at 1:31 PM Alan G
<alan+ovirt(a)griff.me.uk> wrote: Hi, I have 4.1 cluster with FC block storage and
hosted engine. Last night a host went unreachable due to a driver/firmware issue with the
NIC card. The Engine spotted this, the host was fenced and everything behaved as expected.
However, it got me thinking - if the affected host had been the one running the Engine,
what would have happened? I'm assuming the Engine would have failed liveness check on
the other hosted engine hosts and they would attempt to start the Engine. But as the
"failed" host still had access to the storage (I believe the HBA was still
working) then they would not be able to get a lock on the storage. In which case I'm
in a catch-22, the Engine cannot fence the failed host because its network is isolated,
but the Engine cannot be restarted else where until the failed host is fenced. At this
point it requires human intervention to fence the failed host. Is my understanding correct
on this? If so is there any way to mitigate this risk? ovirt-ha-agent implements a
specific test for this kind of failures continuously trying to ping a specific IPv4
address (usually the network gateway) to check network connectivity on each involved host.
On failed pings each host penalises itself by a certain amount of points; the HA score of
each host is written into the hosted-engine metadata volume on the shared storage so each
host can also see the score of other hosts and in your case this would work since all the
hosts can still access the storage via FC. Once the difference between the score of the
host running the engine VM and the best candidate host is large enough a migrate to best
host (or shutdown and restart there if not possible as in your case) action will be
triggered. If you want, you can easily try to reproduce this scenario. Thanks, Alan
_______________________________________________ Users mailing list -- users(a)ovirt.org To
unsubscribe send an email to users-leave(a)ovirt.org Privacy Statement:
https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/ List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6AGWQYGYLXJ...