Hi,
I've tested: "Bug 1090511 [RFE] Improve fencing robustness by retrying failed
attempts".
Spoiler alert: Tested feature worked, but fencing was not successful due to bug
https://bugzilla.redhat.com/1124141
---
How to setup environment for testing:
- 3 hosts are required, at least two of them with PM enabled.
- 2 hosts (A, B), with pm enabled, should be with one cluster, remaining one (C) in
another cluster. Reason for that is that search for fencing proxy is first done in same
cluster, only if there's none host available, hosts outside of this cluster is
considered; this separation is needed to make sure that right (not working) fencing proxy
is selected first.
notation:
host A ~ defective host to be fenced
host B ~ first selected fencing proxy, which will fail fencing host A.
host C ~ second selected fencing proxy, which should succeed fencing host A.
A and B are in same cluster.
process:
1. On host B we alter iptables, so it cannot contact host A and fence it. SSH was blocked
to disallow soft fencing and ipmi was blocked to disallow 'hard' fencing.
iptables -A OUTPUT -p udp -d 10.34.63.198 --dport 623 -j DROP
iptables -A OUTPUT -p tcp -d 10.34.63.178 --dport 22 -j DROP
2. On host A was removed rules allowing connection to vdsm [1] and vdsm was restarted
vdsm[2] so all ssh connections needs to be reopened. That makes engine think, that host is
down/overloaded.
drop rule:
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:54321
followed by
systemctl restart vdsmd
Result: After restart of vdsmd engine recognised host A as iresponsive, and tried to fence
it. First attempt to fence host A was performed by host B and failed as expected, second
attempt to fence host A performed by host C and from code perspective succeeded. Error
message [1] correctly displayed. However fence was not successful due to bug
https://bugzilla.redhat.com/1124141 which causes java.lang.StackOverflowError. Code
related to this bug should be OK, but will be working only after mentioned bug is fixed.
M.
[1]. Fencing operation failed with proxy host <ID>, trying another proxy...