
On Thu, Jun 27, 2013 at 08:48:39AM -0400, Eli Mesika wrote:
----- Original Message -----
From: "Martin Perina" <mperina@redhat.com> To: engine-devel@ovirt.org Cc: "Yair Zaslavsky" <yzaslavs@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, "Eli Mesika" <emesika@redhat.com> Sent: Thursday, June 27, 2013 1:51:06 PM Subject: SSH Soft Fencing
Hi,
SSH Soft Fencing is a new feature for 3.3 and it tries to restart VDSM using SSH connection on non responsive hosts prior to real fencing. More info can be found at
http://www.ovirt.org/Automatic_Fencing#Automatic_Fencing_in_oVirt_3.3
In current SSH Soft Fencing implementation the restart VDSM using SSH command is part of standard fencing implementation in VdsNotRespondingTreatmentCommand. But this command is executed only if a host has a valid PM configuration. If host doesn't have a valid PM configuration, the execution of the command is disabled and host state is change to Non Responsive.
So my question are:
1) Should SSH Soft Fencing be executed on hosts without valid PM configuration?
I think that the answer should be yes. The vdsm restart will solve most of problems
Would you enumerate the problems that would be solved by a vdsm restart (on list, but on the feature page, too)? I am aware of two issues, both are vdsm bugs: - If libvirtd crashes, vdsm not is not restarted unless there are running VMs - Vdsm had several bugs in its soft prepareForShutdown process, getting itself stuck there in case of various background storage processes. I think that solving these two issues would be safer and cleaner than introducing `ssh host service vdsmd restart` flow. The first issue is only a matter of untangling some vdsm internal ugliness: whenever a libvirtconnection is produced, it should be wrapped so that it cathces libvirt crashes. Unlike now, where only VM-related libvirtconnection undergo this treatment. The second issue can be avoiding by vdsm resorting to kill-9-ing itself. After all, this is what `service vdsmd restart` ends up doing after a VERY short timeout (2-3 seconds, iicr). I suppose that there are other reasoning for a remote restart, but in general, I think that it's better to have Vdsm "do the right thing" than expecting Engine to control that remotely. Regards, Dan.
, so why not using it whether a PM agent is defined or not.
2) Should VDSM restart using SSH command be reimplemented as standalone command to be usable also in other parts of engine? If 1) is true, I think it will have to be done anyway.
+1
Martin Perina
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel