Re: Info about soft fencing mechanism

On Jun 13, 2019 16:14, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, I would like to know in better detail how soft fencing works in 4.3. In particular, with "soft fencing" we "only" mean vdsmd restart attempt, correct? Who is responsible for issuing the command? Manager or host itself?
The manager should take the decision, but the actual command should be done by another host.
Because in case of Manager, if the host has already lost connection, how could the manager be able to do it?
Soft fencing is ussed when ssh is available. In all other cases it doesn't work.
Thanks in advance for clarifications and eventually documentation pointers
oVirt DOCs need a lot of updates, but I never found a way to add or edit a page. Best Regards, Strahil Nikolov

On Fri, Jun 14, 2019 at 3:02 PM Strahil <hunter86_bg@yahoo.com> wrote:
On Jun 13, 2019 16:14, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, I would like to know in better detail how soft fencing works in 4.3. In particular, with "soft fencing" we "only" mean vdsmd restart attempt,
correct?
Yes, it just restarts vdsmd service using SSH connection. In the past we had several cases, where VDSM was non-responsive, but VMs were running fine, that's why we added this as the 1st step in non-responding treatment flow. We try to connect to host using SSH, restarts VDSM and waits if host start communicate again. If there is an error during SSH connection or service restart, we immediately continue to next phase of the treatment.
Who is responsible for issuing the command? Manager or host itself?
The manager should take the decision, but the actual command should be done by another host.
The manager, this flow is started from host monitoring if there a network error or connection timeout ...
Because in case of Manager, if the host has already lost connection, how could the manager be able to do it?
Soft fencing is ussed when ssh is available. In all other cases it doesn't work.
So if engine cannot communicate with host, we don't know the reason, so there are several steps in non-responding treatment: 1. SSH Soft Fencing 2. Kdump detection (if it's configured for the host and we detecte host is dumping, we can restart HA VMs on different host) 3. Power Management restart - according to cluster fencing policy we can skip restarting host if for exampl host is renewing its storage lease or gluster cluster is healing - this part is executed on different host in the same cluster/data center If you want to know more about fencing in oVirt, please take a look at below links: Host fencing in oVirt - Fixing the unknown and allowing VMs to be highly available https://www.youtube.com/watch?v=V1JQtmdleaM Integrating kdump into oVirt https://www.youtube.com/watch?v=RAGV_za_Qvw Automatic fencing in oVirt https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing.html Fence-kdump integration in oVirt https://www.ovirt.org/develop/release-management/features/infra/fence-kdump.... And course feel free to ask questions Martin
Thanks in advance for clarifications and eventually documentation pointers
oVirt DOCs need a lot of updates, but I never found a way to add or edit a page.
Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OQIENJDAWQNHOR...
-- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
participants (2)
-
Martin Perina
-
Strahil