[ovirt-users] problems with power management using idrac7 on r620
Eli Mesika
emesika at redhat.com
Wed Jun 17 09:19:21 UTC 2015
----- Original Message -----
> From: "Jason Keltz" <jason.keltz at gmail.com>
> To: "Marek marx Grac" <mgrac at redhat.com>
> Cc: "Eli Mesika" <emesika at redhat.com>, "users" <users at ovirt.org>
> Sent: Wednesday, June 17, 2015 12:02:48 PM
> Subject: Re: problems with power management using idrac7 on r620
>
> Hi Marek.
>
> Actually its the idrac that I believe has the memory leak. Dell wants to
> know how often ovirt is querying the idrac for status and whether the delay
> is configurable.
Well oVirt does not query the status automatically by default
There is a feature that enables that
http://www.ovirt.org/Features/PMHealthCheck
Basically this feature depends on 2 configuration values :
PMHealthCheckEnabled that shoul be true if the feature is enabled
PMHealthCheckIntervalInSec which is defaulted to 3600 Sec , so it is checked in that case once in an hour
So, first please check if this is enabled in your environment
engine-config -g PMHealthCheckEnabled
engine-config -g PMHealthCheckIntervalInSec
Other scenario when status is used is when host becomes non-responsive
In case that host become non responsive :
After a grace period that depends on the host load and if it is SPM or not a soft-fence attempt (vdsmd service restart) is issued
If the soft-fence attempt fails we will do a real fencing (if power management is configured correctly on the host and a proxy host is found)
We are sending a STOP command
We are sending by default 18 status command , one each 10 sec until we get 'off' status from the agent
We are sending a START command
We are sending by default 18 status command , one each 10 sec until we get 'on' status from the agent
Those depends on the following configuration variables :
FenceStopStatusRetries - default 18
FenceStopStatusDelayBetweenRetriesInSec - default 10
FenceStartStatusRetries - default 18
FenceStartStatusDelayBetweenRetriesInSec - default 10
This can be changed using the engine-config tool (requires restart to take affect)
>
> Jason.
> On Jun 17, 2015 2:42 AM, "Marek "marx" Grac" <mgrac at redhat.com> wrote:
>
> >
> >
> > On 06/16/2015 09:37 AM, Eli Mesika wrote:
> >
> >> CCing Marek Grac
> >>
> >> ----- Original Message -----
> >>
> >>> From: "Jason Keltz" <jason.keltz at gmail.com>
> >>> To: "users" <users at ovirt.org>
> >>> Cc: "Eli Mesika" <emesika at redhat.com>
> >>> Sent: Monday, June 15, 2015 11:08:35 PM
> >>> Subject: problems with power management using idrac7 on r620
> >>>
> >>> Hi.
> >>>
> >>> I've been having problem with power management using iDRAC 7 EXPRESS on
> >>> a Dell R620. This uses a shared LOM as opposed to Enterprise that has a
> >>> dedicated one. Every now and then, idrac simply stops responding to
> >>> ping, so it can't respond to status commands from the proxy. If I send
> >>> a reboot with "ipmitool mc reset cold" command, the idrac reboots and
> >>> comes back, but after the problem has occurred, even after a reboot, it
> >>> responds to ping, but drops 80+% of packets. The only way I can "solve"
> >>> the problem is to physically restart the server. This isn't just
> >>> happening on one R620 - it's happening on all of my ovirt hosts. I
> >>> highly suspect it has to do with a memory leak, and being monitored by
> >>> engine causes the problem. I had applied a recent firmware upgrade
> >>> that was supposed to "solve" this kind of problem, but it doesn't. In
> >>> other to provide Dell with more details, can someone tell me how often
> >>> each host is being queried for status? I can't seem to find that info.
> >>> The idrac on my file server doesn't seem to exhibit the same problem,
> >>> and I suspect that is because it isn't being queried.
> >>>
> >> Hi,
> >
> > fence agent for IPMI is based on ipmitool. So if ping/ipmitool is not
> > working there is not much to do about it. I don't know enough about oVirt
> > engine but there is no real place where fence agent can memory leak because
> > it does not run as daemon.
> >
> > m,
> >
>
More information about the Users
mailing list