On Thu, Jun 27, 2013 at 08:48:39AM -0400, Eli Mesika wrote:
----- Original Message -----
> From: "Martin Perina" <mperina(a)redhat.com>
> To: engine-devel(a)ovirt.org
> Cc: "Yair Zaslavsky" <yzaslavs(a)redhat.com>, "Barak Azulay"
<bazulay(a)redhat.com>, "Eli Mesika" <emesika(a)redhat.com>
> Sent: Thursday, June 27, 2013 1:51:06 PM
> Subject: SSH Soft Fencing
>
> Hi,
>
> SSH Soft Fencing is a new feature for 3.3 and it tries to restart VDSM
> using SSH connection on non responsive hosts prior to real fencing.
> More info can be found at
>
>
http://www.ovirt.org/Automatic_Fencing#Automatic_Fencing_in_oVirt_3.3
>
> In current SSH Soft Fencing implementation the restart VDSM using SSH
> command is part of standard fencing implementation in
> VdsNotRespondingTreatmentCommand. But this command is executed only
> if a host has a valid PM configuration. If host doesn't have a valid
> PM configuration, the execution of the command is disabled and host
> state is change to Non Responsive.
>
> So my question are:
>
> 1) Should SSH Soft Fencing be executed on hosts without valid PM
> configuration?
I think that the answer should be yes. The vdsm restart will solve most of problems
Would you enumerate the problems that would be solved by a vdsm restart
(on list, but on the feature page, too)?
I am aware of two issues, both are vdsm bugs:
- If libvirtd crashes, vdsm not is not restarted unless there are
running VMs
- Vdsm had several bugs in its soft prepareForShutdown process, getting
itself stuck there in case of various background storage processes.
I think that solving these two issues would be safer and cleaner than
introducing `ssh host service vdsmd restart` flow.
The first issue is only a matter of untangling some vdsm internal
ugliness: whenever a libvirtconnection is produced, it should be wrapped
so that it cathces libvirt crashes. Unlike now, where only VM-related
libvirtconnection undergo this treatment.
The second issue can be avoiding by vdsm resorting to kill-9-ing itself.
After all, this is what `service vdsmd restart` ends up doing after a
VERY short timeout (2-3 seconds, iicr).
I suppose that there are other reasoning for a remote restart, but in
general, I think that it's better to have Vdsm "do the right thing" than
expecting Engine to control that remotely.
Regards,
Dan.
, so why not using it whether a PM agent is defined or not.
>
> 2) Should VDSM restart using SSH command be reimplemented
> as standalone command to be usable also in other parts of engine?
> If 1) is true, I think it will have to be done anyway.
+1
>
>
> Martin Perina
>
_______________________________________________
Engine-devel mailing list
Engine-devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel