Konstantin, thank you very much for the explanation, it was very
enlightening.
I believe I left something open in the previous message.
I'm using Hosted Engine, all VMs have HA enabled and Power Management is
disabled on all hosts. No IPMI configured (at least I didn't configure
anything about iLO/IPMI in oVirt).
There was a loss of communication with the Storage for approximately 3
minutes and this caused all Hosts to reboot.
Em qua., 30 de nov. de 2022 às 08:50, Volenbovskyi, Konstantin <
Konstantin.Volenbovskyi(a)haufe.com> escreveu:
Hi,
I would say that you observed ‘fencing’ and not SSH soft fencing, but
actual reboot via IPMI.
https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing.html
You can disable Power management for hosts.
Before doing that you need to understand following:
-what is impact on VMs when this happens?
-the working assumption is that your VMs
work just fine, but you need to think about other cases where VMs lose
their storage and/or network.
For me it seems that this was storage domain that is not a VM storage
domain, so VMs’ disks were just fine.
Maybe it was hosted_storage domain in your case…
-any of those VMs are High-availability VMs? Once you
disable Power Management you will not have automatic restart on different
hosts of those.
You need to understand that idea of fencing is either to recover host
automatically and possibly to restart VMs
and make sure that there are no duplicated VMs.
There are 100% cases where fencing is used and there is subset of those,
X% number of cases where you would consider that behavior is suboptimal.
The drawback of disabling fencing is that you might get suboptimal
behavior in Y% cases (100% minus X%)
BR,
Konstantin
*From: *Murilo Morais <murilo(a)evocorp.com.br>
*Date: *Wednesday, 30 November 2022 at 12:13
*To: *users <users(a)ovirt.org>
*Subject: *[ovirt-users] Forced restart when losing communication with
the Storages
Good morning everyone!
Is there a way to disable the forced reboot of the machines? This morning
there was an event in our infrastructure where the hosts lost communication
with the Storage but this caused all the hosts to restart abruptly.
Would this be the correct behavior of oVirt? Is there any way to disable
this?