Hi,
I'd say your scenario is exactly what the storage leases are for.
I can tell you that it used to work in 4.3, but I haven't tested/needed
the feature in quite a while :)
Maybe open a bugzilla and attach relevant logs to get the developer
attention.
Greetings
Klaas
On 8/5/21 4:42 PM, Gianluca Cecchi wrote:
Hello,
supposing latest 4.4.7 environment installed with an external engine
and two hosts, one in one site and one in another site.
For storage I have one FC storage domain.
I try to simulate a sort of "site failure scenario" to see what kind
of HA I should expect.
The 2 hosts have power mgmt configured through fence_ipmilan.
I have 2 VMs, one configured as HA with lease on storage (Resume
Behavior: kill) and one not marked as HA.
Initially host1 is SPM and it is the host that runs the two VMs.
Fencing of host1 from host2 initially works ok. I can test also from
command line:
# fence_ipmilan -a 10.10.193.152 -P -l my_fence_user -A password -L
operator -S /usr/local/bin/pwd.sh -o status
Status: ON
On host2 I then prevent reaching host1 iDRAC:
firewall-cmd --direct --add-rule ipv4 filter OUTPUT 0 -d 10.10.193.152
-p udp --dport 623 -j DROP
firewall-cmd --direct --add-rule ipv4 filter OUTPUT 1 -j ACCEPT
so that:
# fence_ipmilan -a 10.10.193.152 -P -l my_fence_user -A password -L
operator -S /usr/local/bin/pwd.sh -o status
2021-08-05 15:06:07,254 ERROR: Failed: Unable to obtain correct plug
status or plug is not available
On host1 I generate panic:
# date ; echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
Thu Aug 5 15:06:24 CEST 2021
host1 correctly completes its crash dump (kdump integration is
enabled) and reboots, but I stop it at grub prompt so that host1 is
unreachable from host2 point of view and also power fencing not determined
At this point I thought that VM lease functionality would have come in
place and host2 would be able to re-start the HA VM, as it is able to
see that the lease is not taken from the other host and so it can
acquire the lock itself....
Instead it goes through the attempt to power fence loop
I wait about 25 minutes without any effect but continuous attempts.
After 2 minutes host2 correctly becomes SPM and VMs are marked as unknown
At a certain point after the failures in power fencing host1, I see
the event:
Failed to power fence host host1. Please check the host status and
it's power management settings, and then manually reboot it and click
"Confirm Host Has Been Rebooted"
If I select host and choose "Confirm Host Has Been Rebooted", then the
two VMs are marked as down and the HA one is correctly booted by host2.
But this requires my manual intervention.
Is the behavior above the expected one or the use of VM leases should
have allowed host2 to bypass fencing inability and start the HA VM
with lease? Otherwise I don't understand the reason to have the lease
itself at all....
Thanks,
Gianluca
_______________________________________________
Users mailing list --users(a)ovirt.org
To unsubscribe send an email tousers-leave(a)ovirt.org
Privacy
Statement:https://www.ovirt.org/privacy-policy.html
oVirt Code of
Conduct:https://www.ovirt.org/community/about/community-guidelines/
List
Archives:https://lists.ovirt.org/archives/list/users@ovirt.org/message/FK...