[ovirt-users] oVirt 3.5 : "Power Management Health Check" - feature pages

Arthur Berezin aberezin at redhat.com
Mon May 5 06:23:38 EDT 2014


Gilad, this is integral part of HA, For HA to work properly we rely on PM mechanism, this feature makes sure PM is available, ie. running existing "Check PM" periodically. 

We need to look at integrating a monitoring tool and it's benefits separately. 

Arthur 

----- Original Message -----

> From: "Gilad Chaplik" <gchaplik at redhat.com>
> To: "Yair Zaslavsky" <yzaslavs at redhat.com>
> Cc: "Arthur Berezin" <aberezin at redhat.com>, "users" <users at ovirt.org>
> Sent: Monday, May 5, 2014 12:06:28 PM
> Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health Check" -
> feature pages

> ----- Original Message -----
> > From: "Yair Zaslavsky" <yzaslavs at redhat.com>
> > To: "Gilad Chaplik" <gchaplik at redhat.com>
> > Cc: "Arthur Berezin" <aberezin at redhat.com>, "users" <users at ovirt.org>
> > Sent: Monday, May 5, 2014 12:00:10 PM
> > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health Check" -
> > feature pages
> >
> >
> >
> > ----- Original Message -----
> > > From: "Gilad Chaplik" <gchaplik at redhat.com>
> > > To: "Arthur Berezin" <aberezin at redhat.com>
> > > Cc: "users" <users at ovirt.org>, "Yair Zaslavsky" <yzaslavs at redhat.com>
> > > Sent: Monday, May 5, 2014 11:52:25 AM
> > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health Check" -
> > > feature pages
> > >
> > > ----- Original Message -----
> > > > From: "Arthur Berezin" <aberezin at redhat.com>
> > > > To: "Gilad Chaplik" <gchaplik at redhat.com>
> > > > Cc: "users" <users at ovirt.org>, "Yair Zaslavsky" <yzaslavs at redhat.com>
> > > > Sent: Monday, May 5, 2014 11:30:24 AM
> > > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health Check"
> > > > -
> > > > feature pages
> > > >
> > > > ----- Original Message -----
> > > >
> > > > > From: "Yair Zaslavsky" <yzaslavs at redhat.com>
> > > > > To: "Gilad Chaplik" <gchaplik at redhat.com>
> > > > > Cc: "Arthur Berezin" <aberezin at redhat.com>, "users" <users at ovirt.org>
> > > > > Sent: Monday, May 5, 2014 11:10:10 AM
> > > > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health
> > > > > Check"
> > > > > -
> > > > > feature pages
> > > >
> > > > > ----- Original Message -----
> > > > > > From: "Gilad Chaplik" <gchaplik at redhat.com>
> > > > > > To: "Yair Zaslavsky" <yzaslavs at redhat.com>
> > > > > > Cc: "Arthur Berezin" <aberezin at redhat.com>, "users"
> > > > > > <users at ovirt.org>
> > > > > > Sent: Monday, May 5, 2014 10:57:01 AM
> > > > > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health
> > > > > > Check"
> > > > > > -
> > > > > > feature pages
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > From: "Yair Zaslavsky" <yzaslavs at redhat.com>
> > > > > > > To: "Arthur Berezin" <aberezin at redhat.com>
> > > > > > > Cc: "Gilad Chaplik" <gchaplik at redhat.com>, "users"
> > > > > > > <users at ovirt.org>
> > > > > > > Sent: Monday, May 5, 2014 6:39:02 AM
> > > > > > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health
> > > > > > > Check"
> > > > > > > -
> > > > > > > feature pages
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > > From: "Arthur Berezin" <aberezin at redhat.com>
> > > > > > > > To: "Gilad Chaplik" <gchaplik at redhat.com>
> > > > > > > > Cc: "users" <users at ovirt.org>
> > > > > > > > Sent: Sunday, May 4, 2014 5:35:59 PM
> > > > > > > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management Health
> > > > > > > > Check"
> > > > > > > > -
> > > > > > > > feature pages
> > > > > > > >
> > > > > > > > In this case engine periodically checks health of hosts' power
> > > > > > > > management
> > > > > > > > as
> > > > > > > > HA relies on it.
> > > > > > > >
> > > > > > > > Arthur
> > > > > > > >
> > > > > > > > ----- Original Message -----
> > > > > > > >
> > > > > > > > > From: "Gilad Chaplik" <gchaplik at redhat.com>
> > > > > > > > > To: "Eli Mesika" <emesika at redhat.com>
> > > > > > > > > Cc: "users" <users at ovirt.org>, "Arthur Berezin"
> > > > > > > > > <aberezin at redhat.com>
> > > > > > > > > Sent: Sunday, May 4, 2014 5:26:45 PM
> > > > > > > > > Subject: Re: [ovirt-users] oVirt 3.5 : "Power Management
> > > > > > > > > Health
> > > > > > > > > Check"
> > > > > > > > > -
> > > > > > > > > feature pages
> > > > > > > >
> > > > > > > > > Hi Eli,
> > > > > > > >
> > > > > > > > > Here is my comment :)
> > > > > > > > > Why engine needs to send the status health check, isn't there
> > > > > > > > > any
> > > > > > > > > 3rd
> > > > > > > > > parties
> > > > > > > > > that does it, that we can integrate with?
> > > > > > > > > If found, it probably has /less (known) bugs/more features/
> > > > > > > > > and
> > > > > > > > > it's
> > > > > > > > > already
> > > > > > > > > written, tested, documented, allows further integration and
> > > > > > > > > probably
> > > > > > > > > deals
> > > > > > > > > with scale.
> > > > > > > >
> > > > > > > > > btw, fixed some typos in your pages :-)
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Gilad.
> > > > > > >
> > > > > > > Hi, what 3rd party for example do you refer to?
> > > > > > > The PM code already exists at engine,
> > > > > > > And you're also using quartz for scheduling.
> > > > > > >
> > > > > >
> > > > > > Yair,
> > > > > >
> > > > > > You're are raising some good points, but imo the entire host
> > > > > > monitoring
> > > > > > (inc
> > > > > > getVdsStats, etc.) should be externalized.
> > > > > > There are 2 major issues that we still don't cover:
> > > > > > - No HA for monitoring, who checks the hosts when the engine is
> > > > > > down.
> > > > > > - No scale - the engine is a bottle-neck in network and compute.
> > > > > > Although the above is a huge arch change, we need to start
> > > > > > somewhere,
> > > > > > this
> > > > > > feature sounds like a candidate to introduce it.
> > > > > >
> > > > > > About the examples:
> > > > > > http://sixrevisions.com/tools/10-free-server-network-monitoring-tools-that-kick-ass/
> > > > > > The main goal of the feature if my suggestion is taken, is to
> > > > > > select
> > > > > > to
> > > > > > most
> > > > > > appropriate one.
> > > > > >
> > > > > > Thanks,
> > > > > > Gilad.
> > > >
> > > > > Well, Nagios is being considered to be used or used by Gluster guys.
> > > > > However, it will still require (AFAIK) to code some nagios plugin to
> > > > > perfrom
> > > > > the health check.
> > > > > In addition, you will have to report somehow the state change to
> > > > > engine.
> > > > > IMHO, this a bit of an overkill (look also at the time that the check
> > > > > is
> > > > > run
> > > > > - once in an hour, so it can't be compared to getVmStats).
> > > > +1
> > > > These monitoring tools bring a lot of value, and there are some initial
> > > > integrations that we might want to look into[1][2].
> > > > But it's an overkill for this RFE - run "PM Check" periodically, in
> > > > addition
> > > > to initial PM check at host setup stage.
> > > >
> > > > [1] https://github.com/monitoring-ui-plugin/development
> > > > [2]
> > > > http://exchange.nagios.org/directory/Plugins/Operating-Systems/*-Virtual-Environments/Others/check_rhev3/details
> > >
> > >
> > > -1 on overkill.
> > > As I mentioned, proper monitoring is a huge feature; it should be
> > > gradually
> > > introduced, IMO this is a good starting point.
> > > We can look at it as an overkill _or_ as a jumpborad, that will reduce
> > > learning curve and future integrations issues.
> >
> > IMHO this will increase also deployment complexity, and require our
> > customers
> > to have another component installed.
> > The chances for bugs here (as you previously mentioned) IMHO are equally
> > the
> > same to bugs occurring due to developing a cusotm nagios plugin here.

> I appreciate your involvement, and rapid responses :-)
> I get your point, but the bugs for this specific feature will be nothing in
> compare to the bugs we'll get once we integrate fully external monitoring
> process (in terms of priority and severity), this is what I mean in
> 'learning curve and future integration issues'.

> >
> > >
> > > Gilad.
> > >
> > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > ----- Original Message -----
> > > > > > > > > > From: "Eli Mesika" <emesika at redhat.com>
> > > > > > > > > > To: "users" <users at ovirt.org>
> > > > > > > > > > Cc: "Arthur Berezin" <aberezin at redhat.com>
> > > > > > > > > > Sent: Sunday, May 4, 2014 12:18:47 PM
> > > > > > > > > > Subject: [ovirt-users] oVirt 3.5 : "Power Management Health
> > > > > > > > > > Check"
> > > > > > > > > > -
> > > > > > > > > > feature pages
> > > > > > > > > >
> > > > > > > > > > Hi
> > > > > > > > > >
> > > > > > > > > > The following wiki pages were added to the "Power
> > > > > > > > > > Management
> > > > > > > > > > Health
> > > > > > > > > > Check"
> > > > > > > > > > feature planned for oVirt 3.5
> > > > > > > > > >
> > > > > > > > > > http://www.ovirt.org/Features/PMHealthCheck
> > > > > > > > > > http://www.ovirt.org/Features/Design/DetailedPMHealthCheck
> > > > > > > > > >
> > > > > > > > > > Your comments/questions are mostly welcomed.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Eli Mesika
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Users mailing list
> > > > > > > > > > Users at ovirt.org
> > > > > > > > > > http://lists.ovirt.org/mailman/listinfo/users
> > > > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Users mailing list
> > > > > > > > Users at ovirt.org
> > > > > > > > http://lists.ovirt.org/mailman/listinfo/users
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140505/0713dd4b/attachment-0001.html>


More information about the Users mailing list