<div dir="ltr"><div class="gmail_extra"><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">I think that will change a lot per-project basis, if we can get that info per<br>
job, with grafana then we can aggregate and create secondary stats (like bilds<br>
per hour as you say).<br>
So I'd say just to collect the 'bare' data, like job built event, job ended,<br>
duration and such.</blockquote><div class="gmail_quote">agree. will need to improve that, right now it 'pulls' each X seconds via the CLI,<br>instead of Jenkins sending the events, so it is limited to what the CLI can<br>provide and not that efficient. I plan to install [1] and do the opposite<br></div><div class="gmail_quote">(Jenkins will send a POST request with the data on each build<br></div><div class="gmail_quote">event and then it would be sent to graphite)<br><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div id=":173" class="">Have you checked the current ds fabric checks?<br>
There are already a bunch of fabric tasks that monitor <span class="">jenkins</span>, if we install<br>
the nagiosgraph (see ds for details) to send the nagios performance data into<br>
graphite, we can use them as is to also start alarms and such<br></div></blockquote>Icinga2 has integrated graphite support, so after the upgrade we will<br></div><div class="gmail_quote">get all of our alarms data sent to graphite 'out-of-the-box'.<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div id=":173" class="">
<br>
dcaro@akhos$ fab -l | grep nagi<br>
do.<span class="">jenkins</span>.nagios.check_build_load Checks if the bui...<br>
do.<span class="">jenkins</span>.nagios.check_executors Checks if the exe...<br>
do.<span class="">jenkins</span>.nagios.check_queue Check if the buil...<br>
do.provision.nagios_check Show a summary of...<br>
<br>
Though those will not give you the bare data (were designed with nagios in<br>
mind, not graphite so they are just checks, the stats were added later)<br>
<br>
There's also a bunch of helpers functions to create nagios checks too.</div></blockquote></div><br>cool, wasn't aware of those fabric checks.<br>I think for simple metrics(loads and such) we could use that(i.e. query Jenkins from fabric)<br></div><div class="gmail_extra">but for more complicated queries we'd need to query graphite itself,<br></div><div class="gmail_extra">with this[2] I could create scripts that query graphite and trigger Icinga alerts. <br></div><div class="gmail_extra">such as: calculate the 'expected' slaves load for the next hour(in graphite)<br></div><div class="gmail_extra">and then:<br></div><div class="gmail_extra">Icinga queries graphite -> triggers another Icinga alert -> triggers custom script(such as<br></div><div class="gmail_extra">fab task to create slaves)<br><br>for now, added two more metrics: top 10 jobs in past X time, and<br></div><div class="gmail_extra">avg number of builds running / builds waiting in queue in the past X time. <br>some metrics might 'glitch' from time to time as there is not a lot of data yet<br>and it mainly counts integer values while graphite is oriented towards<br>floats, so the data has to be smoothed(usually with movingAverage())<br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><br></div>[1] <a href="https://wiki.jenkins-ci.org/display/JENKINS/Statistics+Notification+Plugin">https://wiki.jenkins-ci.org/display/JENKINS/Statistics+Notification+Plugin</a><br>[2] <a href="https://github.com/klen/graphite-beacon">https://github.com/klen/graphite-beacon</a></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 15, 2016 at 9:39 AM, David Caro <span dir="ltr"><<a href="mailto:dcaro@redhat.com" target="_blank">dcaro@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 04/15 01:24, Nadav Goldin wrote:<br>
> Hi,<br>
> I've created an experimental dashboard for Jenkins at our Grafana instance:<br>
> <a href="http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring" rel="noreferrer" target="_blank">http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring</a><br>
> (if you don't have an account, you can enrol with github/google)<br>
<br>
</span>Nice! \o/<br>
<div><div class="h5"><br>
><br>
> currently it collects the following metrics:<br>
> 1) How many jobs in the Build Queue are waiting per slaves' label:<br>
><br>
> for instance: if there are 4 builds of a job that is restricted to 'el7'<br>
> and 2 builds of another job<br>
> which is restricted to 'el7' in the build queue we will see 6 for 'el7' in<br>
> the first graph.<br>
> 'No label' sums jobs which are waiting but are unrestricted.<br>
><br>
> 2) How many slaves are idle per label.<br>
> note that the slave's labels are contained in the job's labels, but not<br>
> vice versa, as<br>
> we allow regex expressions such as (fc21 || fc22 ). right now it treats<br>
> them as simple<br>
> strings.<br>
><br>
> 3) Total number of online/offline/idle slaves<br>
><br>
> besides the normal monitoring, it can help us:<br>
> 1) minimize the difference between 'idle' slaves per label and jobs waiting<br>
> in the build queue per label.<br>
> this might be caused by unnecessary restrictions on the label, or maybe by<br>
> the<br>
> 'Throttle Concurrent Builds' plugin.<br>
> 2) decide how many VMs and which OS to install on the new hosts.<br>
> 3) in the future, once we have the 'slave pools' implemented, we could<br>
> implement<br>
> auto-scaling based on thresholds or some other function.<br>
><br>
><br>
> 'experimental' - as it still needs to be tested for stability(it is based<br>
> on python-jenkins<br>
> and graphite-send) and also more metrics can be added(maybe avg running time<br>
> per job? builds per hour? ) - will be happy to hear.<br>
<br>
</div></div>I think that will change a lot per-project basis, if we can get that info per<br>
job, with grafana then we can aggregate and create secondary stats (like bilds<br>
per hour as you say).<br>
So I'd say just to collect the 'bare' data, like job built event, job ended,<br>
duration and such.<br>
<span class=""><br>
><br>
> I plan later to pack it all into independent fabric tasks(i.e. fab<br>
> do.jenkins.slaves.show)<br>
<br>
</span>Have you checked the current ds fabric checks?<br>
There are already a bunch of fabric tasks that monitor jenkins, if we install<br>
the nagiosgraph (see ds for details) to send the nagios performance data into<br>
graphite, we can use them as is to also start alarms and such.<br>
<br>
dcaro@akhos$ fab -l | grep nagi<br>
do.jenkins.nagios.check_build_load Checks if the bui...<br>
do.jenkins.nagios.check_executors Checks if the exe...<br>
do.jenkins.nagios.check_queue Check if the buil...<br>
do.provision.nagios_check Show a summary of...<br>
<br>
Though those will not give you the bare data (were designed with nagios in<br>
mind, not graphite so they are just checks, the stats were added later)<br>
<br>
There's also a bunch of helpers functions to create nagios checks too.<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
><br>
><br>
> Nadav<br>
<br>
> _______________________________________________<br>
> Infra mailing list<br>
> <a href="mailto:Infra@ovirt.org">Infra@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/infra" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/infra</a><br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
David Caro<br>
<br>
Red Hat S.L.<br>
Continuous Integration Engineer - EMEA ENG Virtualization R&D<br>
<br>
Tel.: <a href="tel:%2B420%20532%20294%20605" value="+420532294605">+420 532 294 605</a><br>
Email: <a href="mailto:dcaro@redhat.com">dcaro@redhat.com</a><br>
IRC: dcaro|dcaroest@{freenode|oftc|redhat}<br>
Web: <a href="http://www.redhat.com" rel="noreferrer" target="_blank">www.redhat.com</a><br>
RHT Global #: 82-62605<br>
</font></span></blockquote></div><br></div>