[ovirt-users] problems with power management using idrac7 on r620

Jason Keltz jas at cse.yorku.ca
Wed Jun 17 09:31:27 UTC 2015


Hi Eli..
Thank you!
I checked and health check is not enabled.... So the problem causing the 
idrac to go away is not status monitoring from ovirt after all...Hmm... 
Makes me wonder if actually enabling it will prevent the problem from 
happening.

Jas

Sent with AquaMail for Android
http://www.aqua-mail.com


On June 17, 2015 5:19:28 AM Eli Mesika <emesika at redhat.com> wrote:

>
>
> ----- Original Message -----
> > From: "Jason Keltz" <jason.keltz at gmail.com>
> > To: "Marek marx Grac" <mgrac at redhat.com>
> > Cc: "Eli Mesika" <emesika at redhat.com>, "users" <users at ovirt.org>
> > Sent: Wednesday, June 17, 2015 12:02:48 PM
> > Subject: Re: problems with power management using idrac7 on r620
> >
> > Hi Marek.
> >
> > Actually its the idrac that I believe has the memory leak.  Dell wants to
> > know how often ovirt is querying the idrac for status and whether the delay
> > is configurable.
>
> Well oVirt does not query the status automatically by default
> There is a feature that enables that
> http://www.ovirt.org/Features/PMHealthCheck
> Basically this feature depends on 2 configuration values :
>
> PMHealthCheckEnabled that shoul be true if the feature is enabled
> PMHealthCheckIntervalInSec which is defaulted to 3600 Sec , so it is 
> checked in that case once in an hour
>
> So, first please check if this is enabled in your environment
>
> engine-config -g PMHealthCheckEnabled
>
> engine-config -g PMHealthCheckIntervalInSec
>
> Other scenario when status is used is when host becomes non-responsive
>
> In case that host become non responsive :
>
> After a grace period that depends on the host load and if it is SPM or not 
> a soft-fence attempt (vdsmd service restart) is issued
> If the soft-fence attempt fails we will do a real fencing (if power 
> management is configured correctly on the host and a proxy host is found)
> We are sending a STOP command
> We are sending by default 18 status command , one each 10 sec until we get 
> 'off' status from the agent
> We are sending a START command
> We are sending by default 18 status command , one each 10 sec until we get 
> 'on' status from the agent
>
> Those depends on the following configuration variables :
>
> FenceStopStatusRetries - default 18
> FenceStopStatusDelayBetweenRetriesInSec - default 10
> FenceStartStatusRetries - default 18
> FenceStartStatusDelayBetweenRetriesInSec - default 10
>
> This can be changed using the engine-config tool (requires restart to take 
> affect)
>
>
>
> >
> > Jason.
> > On Jun 17, 2015 2:42 AM, "Marek "marx" Grac" <mgrac at redhat.com> wrote:
> >
> > >
> > >
> > > On 06/16/2015 09:37 AM, Eli Mesika wrote:
> > >
> > >> CCing Marek Grac
> > >>
> > >> ----- Original Message -----
> > >>
> > >>> From: "Jason Keltz" <jason.keltz at gmail.com>
> > >>> To: "users" <users at ovirt.org>
> > >>> Cc: "Eli Mesika" <emesika at redhat.com>
> > >>> Sent: Monday, June 15, 2015 11:08:35 PM
> > >>> Subject: problems with power management using idrac7 on r620
> > >>>
> > >>> Hi.
> > >>>
> > >>> I've been having problem with power management using iDRAC 7 EXPRESS on
> > >>> a Dell R620.  This uses a shared LOM as opposed to Enterprise that has a
> > >>> dedicated one.   Every now and then, idrac simply stops responding to
> > >>> ping, so it can't respond to status commands from the proxy.  If I send
> > >>> a reboot with "ipmitool mc reset cold" command, the idrac reboots and
> > >>> comes back, but after the problem has occurred, even after a reboot, it
> > >>> responds to ping, but drops 80+% of packets.  The only way I can "solve"
> > >>> the problem is to physically restart the server.    This isn't just
> > >>> happening on  one R620 - it's happening on all of my ovirt hosts.  I
> > >>> highly suspect it has to do with a memory leak, and being monitored by
> > >>> engine causes the problem.    I had applied a recent firmware upgrade
> > >>> that was supposed to "solve" this kind of problem, but it doesn't.  In
> > >>> other to provide Dell with more details, can someone tell me how often
> > >>> each host is being queried for status?  I can't seem to find that info.
> > >>> The idrac on my file server doesn't seem to exhibit the same problem,
> > >>> and I suspect that is because it isn't being queried.
> > >>>
> > >> Hi,
> > >
> > > fence agent for IPMI is based on ipmitool. So if ping/ipmitool is not
> > > working there is not much to do about it. I don't know enough about oVirt
> > > engine but there is no real place where fence agent can memory leak because
> > > it does not run as daemon.
> > >
> > > m,
> > >
> >
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>





More information about the Users mailing list