
Hi Eli.. Thank you! I checked and health check is not enabled.... So the problem causing the idrac to go away is not status monitoring from ovirt after all...Hmm... Makes me wonder if actually enabling it will prevent the problem from happening. Jas Sent with AquaMail for Android http://www.aqua-mail.com On June 17, 2015 5:19:28 AM Eli Mesika <emesika@redhat.com> wrote:
----- Original Message -----
From: "Jason Keltz" <jason.keltz@gmail.com> To: "Marek marx Grac" <mgrac@redhat.com> Cc: "Eli Mesika" <emesika@redhat.com>, "users" <users@ovirt.org> Sent: Wednesday, June 17, 2015 12:02:48 PM Subject: Re: problems with power management using idrac7 on r620
Hi Marek.
Actually its the idrac that I believe has the memory leak. Dell wants to know how often ovirt is querying the idrac for status and whether the delay is configurable.
Well oVirt does not query the status automatically by default There is a feature that enables that http://www.ovirt.org/Features/PMHealthCheck Basically this feature depends on 2 configuration values :
PMHealthCheckEnabled that shoul be true if the feature is enabled PMHealthCheckIntervalInSec which is defaulted to 3600 Sec , so it is checked in that case once in an hour
So, first please check if this is enabled in your environment
engine-config -g PMHealthCheckEnabled
engine-config -g PMHealthCheckIntervalInSec
Other scenario when status is used is when host becomes non-responsive
In case that host become non responsive :
After a grace period that depends on the host load and if it is SPM or not a soft-fence attempt (vdsmd service restart) is issued If the soft-fence attempt fails we will do a real fencing (if power management is configured correctly on the host and a proxy host is found) We are sending a STOP command We are sending by default 18 status command , one each 10 sec until we get 'off' status from the agent We are sending a START command We are sending by default 18 status command , one each 10 sec until we get 'on' status from the agent
Those depends on the following configuration variables :
FenceStopStatusRetries - default 18 FenceStopStatusDelayBetweenRetriesInSec - default 10 FenceStartStatusRetries - default 18 FenceStartStatusDelayBetweenRetriesInSec - default 10
This can be changed using the engine-config tool (requires restart to take affect)
Jason. On Jun 17, 2015 2:42 AM, "Marek "marx" Grac" <mgrac@redhat.com> wrote:
On 06/16/2015 09:37 AM, Eli Mesika wrote:
CCing Marek Grac
----- Original Message -----
From: "Jason Keltz" <jason.keltz@gmail.com> To: "users" <users@ovirt.org> Cc: "Eli Mesika" <emesika@redhat.com> Sent: Monday, June 15, 2015 11:08:35 PM Subject: problems with power management using idrac7 on r620
Hi.
I've been having problem with power management using iDRAC 7 EXPRESS on a Dell R620. This uses a shared LOM as opposed to Enterprise that has a dedicated one. Every now and then, idrac simply stops responding to ping, so it can't respond to status commands from the proxy. If I send a reboot with "ipmitool mc reset cold" command, the idrac reboots and comes back, but after the problem has occurred, even after a reboot, it responds to ping, but drops 80+% of packets. The only way I can "solve" the problem is to physically restart the server. This isn't just happening on one R620 - it's happening on all of my ovirt hosts. I highly suspect it has to do with a memory leak, and being monitored by engine causes the problem. I had applied a recent firmware upgrade that was supposed to "solve" this kind of problem, but it doesn't. In other to provide Dell with more details, can someone tell me how often each host is being queried for status? I can't seem to find that info. The idrac on my file server doesn't seem to exhibit the same problem, and I suspect that is because it isn't being queried.
Hi,
fence agent for IPMI is based on ipmitool. So if ping/ipmitool is not working there is not much to do about it. I don't know enough about oVirt engine but there is no real place where fence agent can memory leak because it does not run as daemon.
m,
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users