[Engine-devel] SSH Soft Fencing

Barak Azulay bazulay at redhat.com
Sun Jun 30 16:26:47 UTC 2013



----- Original Message -----
> From: "Dan Kenigsberg" <danken at redhat.com>
> To: "Eli Mesika" <emesika at redhat.com>
> Cc: engine-devel at ovirt.org
> Sent: Sunday, June 30, 2013 5:40:49 PM
> Subject: Re: [Engine-devel] SSH Soft Fencing
> 
> On Thu, Jun 27, 2013 at 08:48:39AM -0400, Eli Mesika wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Martin Perina" <mperina at redhat.com>
> > > To: engine-devel at ovirt.org
> > > Cc: "Yair Zaslavsky" <yzaslavs at redhat.com>, "Barak Azulay"
> > > <bazulay at redhat.com>, "Eli Mesika" <emesika at redhat.com>
> > > Sent: Thursday, June 27, 2013 1:51:06 PM
> > > Subject: SSH Soft Fencing
> > > 
> > > Hi,
> > > 
> > > SSH Soft Fencing is a new feature for 3.3 and it tries to restart VDSM
> > > using SSH connection on non responsive hosts prior to real fencing.
> > > More info can be found at
> > > 
> > > http://www.ovirt.org/Automatic_Fencing#Automatic_Fencing_in_oVirt_3.3
> > > 
> > > In current SSH Soft Fencing implementation the restart VDSM using SSH
> > > command is part of standard fencing implementation in
> > > VdsNotRespondingTreatmentCommand. But this command is executed only
> > > if a host has a valid PM configuration. If host doesn't have a valid
> > > PM configuration, the execution of the command is disabled and host
> > > state is change to Non Responsive.
> > > 
> > > So my question are:
> > > 
> > > 1) Should SSH Soft Fencing be executed on hosts without valid PM
> > >    configuration?
> > 
> > I think that the answer should be yes. The vdsm restart will solve most of
> > problems
> 
> Would you enumerate the problems that would be solved by a vdsm restart
> (on list, but on the feature page, too)?
> I am aware of two issues, both are vdsm bugs:
> - If libvirtd crashes, vdsm not is not restarted unless there are
>   running VMs
> - Vdsm had several bugs in its soft prepareForShutdown process, getting
>   itself stuck there in case of various background storage processes.
> 
> I think that solving these two issues would be safer and cleaner than
> introducing `ssh host service vdsmd restart` flow.
> 
> The first issue is only a matter of untangling some vdsm internal
> ugliness: whenever a libvirtconnection is produced, it should be wrapped
> so that it cathces libvirt crashes. Unlike now, where only VM-related
> libvirtconnection undergo this treatment.
> 
> The second issue can be avoiding by vdsm resorting to kill-9-ing itself.
> After all, this is what `service vdsmd restart` ends up doing after a
> VERY short timeout (2-3 seconds, iicr).
> 
> I suppose that there are other reasoning for a remote restart, but in
> general, I think that it's better to have Vdsm "do the right thing" than
> expecting Engine to control that remotely.


theoretically you are absolutely right, but this is much more challenging when the platform you are using keeps changing and might introduce unfamiliar behaviors or bugs.
You have enumerated several issues that we have encountered in the past and were fixed by us or by different components.
- libvirt related
- prepareForShutdown
- ... I even remember some from SuperVDSM

All the above eventually were handled brutally by the engine and caused the host to be entirely fenced and all running VMs were killed (and the service they gave went down).

This is about trying to handle an unexpected situation in a more somewhat delicate manner that in most cases will save killing the VMs, in a scenario where the host is going to be fenced anyway 

Now the question Martin had raised is whether this functionality should be applied also when a host has no physical Power-Management device, 

Hopes this provides the info you refereed to.


Thanks
Barak Azulay 


> 
> Regards,
> Dan.
> > , so why not using it whether a PM agent is defined or not.
> > 
> > > 
> > > 2) Should VDSM restart using SSH command be reimplemented
> > >    as standalone command to be usable also in other parts of engine?
> > >    If 1) is true, I think it will have to be done anyway.
> > 
> > +1
> > 
> > > 
> > > 
> > > Martin Perina
> > > 
> > _______________________________________________
> > Engine-devel mailing list
> > Engine-devel at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/engine-devel
> _______________________________________________
> Engine-devel mailing list
> Engine-devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel
> 
> 
> 



More information about the Engine-devel mailing list