
On 06/30/2013 07:26 PM, Barak Azulay wrote:
----- Original Message -----
From: "Dan Kenigsberg" <danken@redhat.com> To: "Eli Mesika" <emesika@redhat.com> Cc: engine-devel@ovirt.org Sent: Sunday, June 30, 2013 5:40:49 PM Subject: Re: [Engine-devel] SSH Soft Fencing
On Thu, Jun 27, 2013 at 08:48:39AM -0400, Eli Mesika wrote:
----- Original Message -----
From: "Martin Perina" <mperina@redhat.com> To: engine-devel@ovirt.org Cc: "Yair Zaslavsky" <yzaslavs@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, "Eli Mesika" <emesika@redhat.com> Sent: Thursday, June 27, 2013 1:51:06 PM Subject: SSH Soft Fencing
Hi,
SSH Soft Fencing is a new feature for 3.3 and it tries to restart VDSM using SSH connection on non responsive hosts prior to real fencing. More info can be found at
http://www.ovirt.org/Automatic_Fencing#Automatic_Fencing_in_oVirt_3.3
In current SSH Soft Fencing implementation the restart VDSM using SSH command is part of standard fencing implementation in VdsNotRespondingTreatmentCommand. But this command is executed only if a host has a valid PM configuration. If host doesn't have a valid PM configuration, the execution of the command is disabled and host state is change to Non Responsive.
So my question are:
1) Should SSH Soft Fencing be executed on hosts without valid PM configuration?
I think that the answer should be yes. The vdsm restart will solve most of problems
Would you enumerate the problems that would be solved by a vdsm restart (on list, but on the feature page, too)? I am aware of two issues, both are vdsm bugs: - If libvirtd crashes, vdsm not is not restarted unless there are running VMs - Vdsm had several bugs in its soft prepareForShutdown process, getting itself stuck there in case of various background storage processes.
I think that solving these two issues would be safer and cleaner than introducing `ssh host service vdsmd restart` flow.
The first issue is only a matter of untangling some vdsm internal ugliness: whenever a libvirtconnection is produced, it should be wrapped so that it cathces libvirt crashes. Unlike now, where only VM-related libvirtconnection undergo this treatment.
The second issue can be avoiding by vdsm resorting to kill-9-ing itself. After all, this is what `service vdsmd restart` ends up doing after a VERY short timeout (2-3 seconds, iicr).
I suppose that there are other reasoning for a remote restart, but in general, I think that it's better to have Vdsm "do the right thing" than expecting Engine to control that remotely.
theoretically you are absolutely right, but this is much more challenging when the platform you are using keeps changing and might introduce unfamiliar behaviors or bugs. You have enumerated several issues that we have encountered in the past and were fixed by us or by different components. - libvirt related - prepareForShutdown - ... I even remember some from SuperVDSM
All the above eventually were handled brutally by the engine and caused the host to be entirely fenced and all running VMs were killed (and the service they gave went down).
This is about trying to handle an unexpected situation in a more somewhat delicate manner that in most cases will save killing the VMs, in a scenario where the host is going to be fenced anyway
+1 We can not anticipates our own bugs ;)
Now the question Martin had raised is whether this functionality should be applied also when a host has no physical Power-Management device,
Hopes this provides the info you refereed to.
Thanks Barak Azulay
Regards, Dan.
, so why not using it whether a PM agent is defined or not.
2) Should VDSM restart using SSH command be reimplemented as standalone command to be usable also in other parts of engine? If 1) is true, I think it will have to be done anyway.
+1
Martin Perina
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
_______________________________________________ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel