Re: [Engine-devel] SSH Soft Fencing

30 Jun 2013

      On Thu, Jun 27, 2013 at 08:48:39AM -0400, Eli Mesika wrote:
...
----- Original Message -----
...
From: "Martin Perina" <mperina@redhat.com>
To: engine-devel@ovirt.org
Cc: "Yair Zaslavsky" <yzaslavs@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, "Eli Mesika" <emesika@redhat.com>
Sent: Thursday, June 27, 2013 1:51:06 PM
Subject: SSH Soft Fencing
Hi,
SSH Soft Fencing is a new feature for 3.3 and it tries to restart VDSM
using SSH connection on non responsive hosts prior to real fencing.
More info can be found at
http://www.ovirt.org/Automatic_Fencing#Automatic_Fencing_in_oVirt_3.3
In current SSH Soft Fencing implementation the restart VDSM using SSH
command is part of standard fencing implementation in
VdsNotRespondingTreatmentCommand. But this command is executed only
if a host has a valid PM configuration. If host doesn't have a valid
PM configuration, the execution of the command is disabled and host
state is change to Non Responsive.
So my question are:
1) Should SSH Soft Fencing be executed on hosts without valid PM
   configuration?
I think that the answer should be yes. The vdsm restart will solve most of problems
Would you enumerate the problems that would be solved by a vdsm restart
(on list, but on the feature page, too)?
I am aware of two issues, both are vdsm bugs:
- If libvirtd crashes, vdsm not is not restarted unless there are
  running VMs
- Vdsm had several bugs in its soft prepareForShutdown process, getting
  itself stuck there in case of various background storage processes.

I think that solving these two issues would be safer and cleaner than
introducing `ssh host service vdsmd restart` flow.

The first issue is only a matter of untangling some vdsm internal
ugliness: whenever a libvirtconnection is produced, it should be wrapped
so that it cathces libvirt crashes. Unlike now, where only VM-related
libvirtconnection undergo this treatment.

The second issue can be avoiding by vdsm resorting to kill-9-ing itself.
After all, this is what `service vdsmd restart` ends up doing after a
VERY short timeout (2-3 seconds, iicr).

I suppose that there are other reasoning for a remote restart, but in
general, I think that it's better to have Vdsm "do the right thing" than
expecting Engine to control that remotely.

Regards,
Dan.
...
, so why not using it whether a PM agent is defined or not.
...
2) Should VDSM restart using SSH command be reimplemented
   as standalone command to be usable also in other parts of engine?
   If 1) is true, I think it will have to be done anyway.
+1
...
Martin Perina
_______________________________________________
Engine-devel mailing list
Engine-devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel

Re: [Engine-devel] SSH Soft Fencing

Dan Kenigsberg