[vdsm] Review Request: Add an option to create a watchdog device.

Ryan Harper ryanh at us.ibm.com
Mon Nov 26 15:50:34 UTC 2012


* Doron Fediuck <dfediuck at redhat.com> [2012-11-26 09:20]:
> ----- Original Message -----
> > From: "Ryan Harper" <ryanh at us.ibm.com>
> > To: "Doron Fediuck" <dfediuck at redhat.com>
> > Cc: "Sheldon" <shaohef at linux.vnet.ibm.com>, arch at ovirt.org, "Zheng Sheng ZS Zhou" <zhshzhou at cn.ibm.com>, "Itamar
> > Heim" <iheim at redhat.com>, agl at linux.vnet.ibm.com, "Shu Ming" <shuming at linux.vnet.ibm.com>, "Mark Wu"
> > <wudxw at linux.vnet.ibm.com>, ryanh at us.ibm.com, snmishra at us.ibm.com, danken at redhat.com
> > Sent: Monday, November 26, 2012 4:01:48 PM
> > Subject: Re: [vdsm] Review Request: Add an option to create a watchdog device.
> > 
> > * Doron Fediuck <dfediuck at redhat.com> [2012-11-22 03:56]:
> > > 
> > > ----- Original Message -----
> > > 
> > > > From: "Sheldon" <shaohef at linux.vnet.ibm.com>
> > > > To: "Doron Fediuck" <dfediuck at redhat.com>
> > > > Cc: arch at ovirt.org, "Zheng Sheng ZS Zhou" <zhshzhou at cn.ibm.com>,
> > > > "Itamar Heim" <iheim at redhat.com>, agl at linux.vnet.ibm.com, "Shu
> > > > Ming"
> > > > <shuming at linux.vnet.ibm.com>, "Mark Wu"
> > > > <wudxw at linux.vnet.ibm.com>,
> > > > ryanh at us.ibm.com, snmishra at us.ibm.com, danken at redhat.com
> > > > Sent: Thursday, November 22, 2012 11:00:18 AM
> > > > Subject: Re: [vdsm] Review Request: Add an option to create a
> > > > watchdog device.
> > > 
> > > > On 11/21/2012 04:00 PM, Doron Fediuck wrote:
> > > 
> > > > > > Currently, we do not have any plans to implement the engine
> > > > > > side
> > > > > > of
> > > > > > the feature.
> > > > > 
> > > > 
> > > > > > But I will add a watchdog feature page to describe how engine
> > > > > > enable
> > > > > > this feature. It's definitely great if any engine guy would
> > > > > > like
> > > > > > to
> > > > > > take the engine part. I will be glad to provide help if
> > > > > > needed.
> > > > > 
> > > > 
> > > > > Hi Sheldon,
> > > > 
> > > > > Any news on the engine side?
> > > > 
> > > > > Currently the vdsm side is merged, while the engine side still
> > > > > missing.
> > > > 
> > > > > The wiki page also lacks the engine side. Can you please handle
> > > > > it?
> > > > 
> > > 
> > > > Hi Doron,
> > > 
> > > > I have updated the wiki page.
> > > > http://wiki.ovirt.org/wiki/Add_an_option_to_create_a_watchdog_device
> > > > And for vdsm side, I should also add a new patch to report the
> > > > watchdog event.
> > > 
> > > > I can add a flat to vm's status, so engine can poll vm's status
> > > > to
> > > > check the event then notify the user, and let the user to take
> > > > some
> > > > actions, such as restart or dump guest for analysis.
> > > > Perhaps event report channel is more better, but I have not find
> > > > any
> > > > in vdsm. But it is a big work to add an event register mechanism
> > > > for
> > > > vdsm.
> > > 
> > > > what's your suggestion?
> > > 
> > > > --
> > > > Sheldon Feng(?????????) <shaohef at linux.vnet.ibm.com> IBM Linux
> > > > Technology
> > > > Center
> > > 
> > > Hi Sheldon,
> > > AFAIK, watchdog fires automatically, so no real need for user
> > > interaction
> > > when an event happens. So I'd expect the user to set the relevant
> > > action
> > > before starting the VM. Once the watchdog is triggered, it will do
> > > whatever
> > > action he has set, and notify the user.
> > > 
> > > So I'd expect the user to have a list of actions for the watchdog
> > > device
> > > in the engine UI, with a default of none. The user should be able
> > > to choose
> > > which action to set when starting or editing the VM (for next run).
> > 
> > I'd like to suggest we pick something other than none by default
> > since
> > we've gone through the trouble of configuring and enabling a
> > watchdog.
> > I think it's worth the discussion of what a better default behavior
> > should be given access to a watchdog.
> > 
> > I'd suggest that a simple reboot mode would be most useful.
> > 
> 
> Hi Ryan, good point.
> The reason I asked for none is exactly since someone though of it
> when writing the device actions. ie- otherwise no-op makes no sense,
> but as we all know no-op sometimes proves to be a much needed option
> if not the default one.
> In this context, a watchdog has quite an explosive potential for a VM.
> So for the sake of all users I'd rather ask them to specify exactly
> what should be done. Otherwise- Primum non nocere. I'm sure one day
> someone will appreciate it.

While I understand what your saying; I think it's worth actually walking
through all of the actions and selecting the best here.  VDSM has a role
to play here in how *best* to configure a VM.  I think that a watchdog
can elevate the usefulness of a VM by ensuring that it stays running
without user intervention.

As you say, having an unexpected reboot when it's not wanted can cause
an issue, so we have at least two areas to discuss:

1) watchdog fidelity; does it do what it's supposed to do at the right
time and not malfunction.  This requires testing and use to validate.
Leaving the watchdog off by default will certainly reduce the amount of
testing time.

2) watchdog configuration.  What's the most reasonable and helpful
configuration, this includes the action as well as any variables
associated with that specific action.  I think the best course here is
to propose an initial configuration and start getting some test-time
under the configuration for validation.

If we're unwilling to enable an action by default, I'd like to have a
discussion around why that's the case.  The initial objection to
always-on with action=reboot seems to be concern about the watchdog
misfiring when it shouldn't.   Are their other concerns?

Another thought here is to think about the target guest OS type.  It may
be the case that specific actions/configurations make sense for one OS,
but not the other[1]

There was an engine-devel thread about libosinfo integration[2].


1. http://rwmj.wordpress.com/2010/03/03/what-is-a-watchdog/#comment-4959
2. http://lists.ovirt.org/pipermail/engine-devel/2012-September/002544.html


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh at us.ibm.com




More information about the Arch mailing list