On Thu, Oct 25, 2018 at 1:31 PM Alan G <alan+ovirt(a)griff.me.uk> wrote:
Hi,
I have 4.1 cluster with FC block storage and hosted engine.
Last night a host went unreachable due to a driver/firmware issue with the
NIC card. The Engine spotted this, the host was fenced and everything
behaved as expected.
However, it got me thinking - if the affected host had been the one
running the Engine, what would have happened? I'm assuming the Engine would
have failed liveness check on the other hosted engine hosts and they would
attempt to start the Engine. But as the "failed" host still had access to
the storage (I believe the HBA was still working) then they would not be
able to get a lock on the storage. In which case I'm in a catch-22, the
Engine cannot fence the failed host because its network is isolated, but
the Engine cannot be restarted else where until the failed host is fenced.
At this point it requires human intervention to fence the failed host.
Is my understanding correct on this? If so is there any way to mitigate
this risk?
ovirt-ha-agent implements a specific test for this kind of failures
continuously trying to ping a specific IPv4 address (usually the network
gateway) to check network connectivity on each involved host.
On failed pings each host penalises itself by a certain amount of points;
the HA score of each host is written into the hosted-engine metadata volume
on the shared storage so each host can also see the score of other hosts
and in your case this would work since all the hosts can still access the
storage via FC.
Once the difference between the score of the host running the engine VM and
the best candidate host is large enough a migrate to best host (or shutdown
and restart there if not possible as in your case) action will be triggered.
If you want, you can easily try to reproduce this scenario.
Thanks,
Alan
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6AGWQYGYLXJ...