--XsQoSWH+UP9D9v3l
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On 04/17 23:55, Nadav Goldin wrote:
>
> I think that will change a lot per-project basis, if we can get that in=
fo
> per
> job, with grafana then we can aggregate and create secondary stats (like
> bilds
> per hour as you say).
> So I'd say just to collect the 'bare' data, like job built event, job
> ended,
> duration and such.
=20
agree. will need to improve that, right now it 'pulls' each X seconds via
the CLI,
instead of Jenkins sending the events, so it is limited to what the CLI c=
an
provide and not that efficient. I plan to install [1] and do the
opposite
(Jenkins will send a POST request with the data on each build
event and then it would be sent to graphite)
Amarchuk had already some ideas on integrating collectd with jenkins, imo t=
hat
will work well for 'master' related stats and more difficult for others like
job started, etc. but worth looking at it
=20
Have you checked the current ds fabric checks?
> There are already a bunch of fabric tasks that monitor jenkins, if we
> install
> the nagiosgraph (see ds for details) to send the nagios performance data
> into
> graphite, we can use them as is to also start alarms and such
>
Icinga2 has integrated graphite support, so after the upgrade we will
get all of our alarms data sent to graphite 'out-of-the-box'.
+1!
=20
>
> dcaro@akhos$ fab -l | grep nagi
> do.jenkins.nagios.check_build_load Checks if t=
he
> bui...
> do.jenkins.nagios.check_executors Checks if t=
he
> exe...
> do.jenkins.nagios.check_queue Check if the
> buil...
> do.provision.nagios_check Show a summ=
ary
> of...
>
> Though those will not give you the bare data (were designed with nagios=
in
> mind, not graphite so they are just checks, the stats were added
later)
>
> There's also a bunch of helpers functions to create nagios checks too.
>
=20
cool, wasn't aware of those fabric checks.
I think for simple metrics(loads and such) we could use that(i.e. query
Jenkins from fabric)
but for more complicated queries we'd need to query graphite itself,
with this[2] I could create scripts that query graphite and trigger Icinga
alerts.
such as: calculate the 'expected' slaves load for the next hour(in graphi=
te)
and then:
Icinga queries graphite -> triggers another Icinga alert -> triggers cust=
om
script(such as
fab task to create slaves)
I'd be careful with the reactions for now, but yes, that's great.
=20
for now, added two more metrics: top 10 jobs in past X time, and
avg number of builds running / builds waiting in queue in the past X time.
some metrics might 'glitch' from time to time as there is not a lot of da=
ta
yet
and it mainly counts integer values while graphite is oriented towards
floats, so the data has to be smoothed(usually with movingAverage())
=20
=20
=20
[1]
https://wiki.jenkins-ci.org/display/JENKINS/Statistics+Notification+Plugin
[2]
https://github.com/klen/graphite-beacon
=20
On Fri, Apr 15, 2016 at 9:39 AM, David Caro <dcaro(a)redhat.com> wrote:
=20
> On 04/15 01:24, Nadav Goldin wrote:
> > Hi,
> > I've created an experimental dashboard for Jenkins at our Grafana
> instance:
> >
http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring
> > (if you don't have an account, you can enrol with github/google)
>
> Nice! \o/
>
> >
> > currently it collects the following metrics:
> > 1) How many jobs in the Build Queue are waiting per slaves' label:
> >
> > for instance: if there are 4 builds of a job that is restricted to 'e=
l7'
> > and 2 builds of another job
> > which is restricted to 'el7' in the build queue we will see 6 for
'el=
7'
> in
> > the first graph.
> > 'No label' sums jobs which are waiting but are unrestricted.
> >
> > 2) How many slaves are idle per label.
> > note that the slave's labels are contained in the job's labels, but n=
ot
> > vice versa, as
> > we allow regex expressions such as (fc21 || fc22 ). right now it trea=
ts
> > them as simple
> > strings.
> >
> > 3) Total number of online/offline/idle slaves
> >
> > besides the normal monitoring, it can help us:
> > 1) minimize the difference between 'idle' slaves per label and jobs
> waiting
> > in the build queue per label.
> > this might be caused by unnecessary restrictions on the label, or may=
be
> by
> > the
> > 'Throttle Concurrent Builds' plugin.
> > 2) decide how many VMs and which OS to install on the new hosts.
> > 3) in the future, once we have the 'slave pools' implemented, we could
> > implement
> > auto-scaling based on thresholds or some other function.
> >
> >
> > 'experimental' - as it still needs to be tested for stability(it is b=
ased
> > on python-jenkins
> > and graphite-send) and also more metrics can be added(maybe avg runni=
ng
> > time
> > > per job? builds per hour? ) - will be happy to hear.
>
> I think that will change a lot per-project basis, if we can get that in=
fo
> per
> job, with grafana then we can aggregate and create secondary stats (like
> bilds
> per hour as you say).
> So I'd say just to collect the 'bare' data, like job built event, job
> ended,
> duration and such.
>
> >
> > I plan later to pack it all into independent fabric tasks(i.e. fab
> > do.jenkins.slaves.show)
>
> Have you checked the current ds fabric checks?
> There are already a bunch of fabric tasks that monitor jenkins, if we
> install
> the nagiosgraph (see ds for details) to send the nagios performance data
> into
> graphite, we can use them as is to also start alarms and such.
>
> dcaro@akhos$ fab -l | grep nagi
> do.jenkins.nagios.check_build_load Checks if t=
he
> bui...
> do.jenkins.nagios.check_executors Checks if t=
he
> exe...
> do.jenkins.nagios.check_queue Check if the
> buil...
> do.provision.nagios_check Show a summ=
ary
> of...
>
> Though those will not give you the bare data (were designed with nagios=
in
> mind, not graphite so they are just checks, the stats were added
later)
>
> There's also a bunch of helpers functions to create nagios checks too.
>
>
> >
> >
> > Nadav
>
> > _______________________________________________
> > Infra mailing list
> > Infra(a)ovirt.org
> >
http://lists.ovirt.org/mailman/listinfo/infra
>
>
> --
> David Caro
>
> Red Hat S.L.
> Continuous Integration Engineer - EMEA ENG Virtualization R&D
>
> Tel.: +420 532 294 605
> Email: dcaro(a)redhat.com
> IRC: dcaro|dcaroest@{freenode|oftc|redhat}
> Web:
www.redhat.com
> RHT Global #: 82-62605
>
_______________________________________________
Infra mailing list
Infra(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra
--=20
David Caro
Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D
Tel.: +420 532 294 605
Email: dcaro(a)redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web:
www.redhat.com
RHT Global #: 82-62605
--XsQoSWH+UP9D9v3l
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJXFI5mAAoJEEBxx+HSYmnDQOUIAIcV2CUhX+NvHn71J1h4qvXc
TbDRkk0OTBDfLlrE1v7oidRBVNbEcmCj2EhpniZHFWClAZGK3m9Tkmjc8qiu7Xr7
6VuvxmlZ7VpcJ6/NEpHccZ8tLYvajlzT0EJLZw4iYO2ICWmAHFLtbaAwZs3VR6j0
X5ERdv8Jco8uVY+fGmQi77K6qPAxMH6S0kf+3OnvdGaNNimqke4TKv6dtueZyo8p
qbnisSumok+EJ9oG//L58grzUVTAQCGrm1ioLLaOk27BxUwjQKlul3JNWJO2p6FT
9zp8whj4xi9vOpDSv+J6R5V+rGyrd+gl75tslulb5xUBbQnSYoZ3Ou1LjUgpwKg=
=Se8G
-----END PGP SIGNATURE-----
--XsQoSWH+UP9D9v3l--