----- Original Message -----
From: "Jason Keltz" <jason.keltz(a)gmail.com>
To: "Marek marx Grac" <mgrac(a)redhat.com>
Cc: "Eli Mesika" <emesika(a)redhat.com>, "users"
<users(a)ovirt.org>
Sent: Wednesday, June 17, 2015 12:02:48 PM
Subject: Re: problems with power management using idrac7 on r620
Hi Marek.
Actually its the idrac that I believe has the memory leak. Dell wants to
know how often ovirt is querying the idrac for status and whether the delay
is configurable.
Well oVirt does not query the status automatically by default
There is a feature that enables that
http://www.ovirt.org/Features/PMHealthCheck
Basically this feature depends on 2 configuration values :
PMHealthCheckEnabled that shoul be true if the feature is enabled
PMHealthCheckIntervalInSec which is defaulted to 3600 Sec , so it is checked in that case
once in an hour
So, first please check if this is enabled in your environment
engine-config -g PMHealthCheckEnabled
engine-config -g PMHealthCheckIntervalInSec
Other scenario when status is used is when host becomes non-responsive
In case that host become non responsive :
After a grace period that depends on the host load and if it is SPM or not a soft-fence
attempt (vdsmd service restart) is issued
If the soft-fence attempt fails we will do a real fencing (if power management is
configured correctly on the host and a proxy host is found)
We are sending a STOP command
We are sending by default 18 status command , one each 10 sec until we get 'off'
status from the agent
We are sending a START command
We are sending by default 18 status command , one each 10 sec until we get 'on'
status from the agent
Those depends on the following configuration variables :
FenceStopStatusRetries - default 18
FenceStopStatusDelayBetweenRetriesInSec - default 10
FenceStartStatusRetries - default 18
FenceStartStatusDelayBetweenRetriesInSec - default 10
This can be changed using the engine-config tool (requires restart to take affect)
Jason.
On Jun 17, 2015 2:42 AM, "Marek "marx" Grac" <mgrac(a)redhat.com>
wrote:
>
>
> On 06/16/2015 09:37 AM, Eli Mesika wrote:
>
>> CCing Marek Grac
>>
>> ----- Original Message -----
>>
>>> From: "Jason Keltz" <jason.keltz(a)gmail.com>
>>> To: "users" <users(a)ovirt.org>
>>> Cc: "Eli Mesika" <emesika(a)redhat.com>
>>> Sent: Monday, June 15, 2015 11:08:35 PM
>>> Subject: problems with power management using idrac7 on r620
>>>
>>> Hi.
>>>
>>> I've been having problem with power management using iDRAC 7 EXPRESS on
>>> a Dell R620. This uses a shared LOM as opposed to Enterprise that has a
>>> dedicated one. Every now and then, idrac simply stops responding to
>>> ping, so it can't respond to status commands from the proxy. If I send
>>> a reboot with "ipmitool mc reset cold" command, the idrac reboots
and
>>> comes back, but after the problem has occurred, even after a reboot, it
>>> responds to ping, but drops 80+% of packets. The only way I can
"solve"
>>> the problem is to physically restart the server. This isn't just
>>> happening on one R620 - it's happening on all of my ovirt hosts. I
>>> highly suspect it has to do with a memory leak, and being monitored by
>>> engine causes the problem. I had applied a recent firmware upgrade
>>> that was supposed to "solve" this kind of problem, but it
doesn't. In
>>> other to provide Dell with more details, can someone tell me how often
>>> each host is being queried for status? I can't seem to find that info.
>>> The idrac on my file server doesn't seem to exhibit the same problem,
>>> and I suspect that is because it isn't being queried.
>>>
>> Hi,
>
> fence agent for IPMI is based on ipmitool. So if ping/ipmitool is not
> working there is not much to do about it. I don't know enough about oVirt
> engine but there is no real place where fence agent can memory leak because
> it does not run as daemon.
>
> m,
>