Re: Experimental Jenkins monitoring

Friday, 15 April 2016

--maH1Gajj2nflutpK
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 04/15 01:24, Nadav Goldin wrote:
...
 Hi,
 I've created an experimental dashboard for Jenkins at our Grafana instanc= e:
...
 http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring
 (if you don't have an account, you can enrol with github/google) 
Nice! \o/

...
=20
 currently it collects the following metrics:
 1) How many jobs in the Build Queue are waiting per slaves' label:
=20
 for instance: if there are 4 builds of a job that is restricted to 'el7'
 and 2 builds of another job
 which is restricted to 'el7' in the build queue we will see 6 for 'el7'
in
 the first graph.
 'No label' sums jobs which are waiting but are unrestricted.
=20
 2) How many slaves are idle per label.
 note that the slave's labels are contained in the job's labels, but not
 vice versa, as
 we allow regex expressions such as (fc21 || fc22 ). right now it treats
 them as simple
 strings.
=20
 3) Total number of online/offline/idle slaves
=20
 besides the normal monitoring, it can help us:
 1) minimize the difference between 'idle' slaves per label and jobs waiti=
ng
...
 in the build queue per label.
 this might be caused by unnecessary restrictions on the label, or maybe by
 the
 'Throttle Concurrent Builds' plugin.
 2) decide how many VMs and which OS to install on the new hosts.
 3) in the future, once we have the 'slave pools' implemented, we could
 implement
 auto-scaling based on thresholds or some other function.
=20
=20
 'experimental' - as it still needs to be tested for stability(it is based
 on python-jenkins
 and graphite-send) and also more metrics can be added(maybe avg running t= ime
...
 per job? builds per hour? ) - will be happy to hear. 
I think that will change a lot per-project basis, if we can get that info p=
er
job, with grafana then we can aggregate and create secondary stats (like bi=
lds
per hour as you say).
So I'd say just to collect the 'bare' data, like job built event, job ended,
duration and such.

...
=20
 I plan later to pack it all into independent fabric tasks(i.e. fab
 do.jenkins.slaves.show) 
Have you checked the current ds fabric checks?
There are already a bunch of fabric tasks that monitor jenkins, if we insta=
ll
the nagiosgraph (see ds for details) to send the nagios performance data in=
to
graphite, we can use them as is to also start alarms and such.

    dcaro@akhos$ fab -l | grep nagi
    do.jenkins.nagios.check_build_load                      Checks if the b=
ui...
    do.jenkins.nagios.check_executors                       Checks if the e=
xe...
    do.jenkins.nagios.check_queue                           Check if the bu=
il...
    do.provision.nagios_check                               Show a summary =
of...

Though those will not give you the bare data (were designed with nagios in
mind, not graphite so they are just checks, the stats were added later)

There's also a bunch of helpers functions to create nagios checks too.

...
=20
=20
 Nadav 
...
 _______________________________________________
 Infra mailing list
 Infra(a)ovirt.org
 http://lists.ovirt.org/mailman/listinfo/infra 

--=20
David Caro

Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: dcaro(a)redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web: www.redhat.com
RHT Global #: 82-62605

--maH1Gajj2nflutpK
Content-Type: application/pgp-signature; name="signature.asc"

...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJXEIyMAAoJEEBxx+HSYmnDsdsIAIaO/MwrOuXctDSTokDvAxcS
dANMHoPIwQmJo2mLxk7Bh5CvVtgciF7oqYBQjNkkFXIbyqbv5fmbQ0cRgifX/tMB
Oilc3l/CIwChv+uoMKDjEn7BhdDT66d5Hrgrz4oX+DXsQB207g0fIm/X/MF1l8Wm
1c0g0+tk2h1Wd7R+REcYXCuS4GYdKWsZ7YEUcy+HUsicMNalCFACTfV5P8u402ci
5W5v/+RyWZ/9Ud2cqhgKADczm4kY1g4RwWvrFRhSoZ4tZA/BSpG0R0I5iVfaovTS
7dLCeks93VbsboWbnXX42mduGubzZmi/j50Co/28vwhFa1YygqoQjRovkANkDzU=
=Tsur
-----END PGP SIGNATURE----- 
--maH1Gajj2nflutpK--

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: Experimental Jenkins monitoring