Hi,

I've created an experimental dashboard for Jenkins at our Grafana instance:
http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring

(if you don't have an account, you can enrol with github/google)

currently it collects the following metrics:

1) How many jobs in the Build Queue are waiting per slaves' label:

for instance: if there are 4 builds of a job that is restricted to 'el7' and 2 builds of another job
which is restricted to 'el7' in the build queue we will see 6 for 'el7' in the first graph.
'No label' sums jobs which are waiting but are unrestricted.

2) How many slaves are idle per label.

note that the slave's labels are contained in the job's labels, but not vice versa, as

we allow regex expressions such as (fc21 || fc22 ). right now it treats them as simple

strings.

3) Total number of online/offline/idle slaves

besides the normal monitoring, it can help us:

1) minimize the difference between 'idle' slaves per label and jobs waiting in the build queue per label.
this might be caused by unnecessary restrictions on the label, or maybe by the
'Throttle Concurrent Builds' plugin.

2) decide how many VMs and which OS to install on the new hosts.

3) in the future, once we have the 'slave pools' implemented, we could implement

auto-scaling based on thresholds or some other function.

'experimental' - as it still needs to be tested for stability(it is based on python-jenkins

and graphite-send) and also more metrics can be added(maybe avg running time

per job? builds per hour? ) - will be happy to hear.

I plan later to pack it all into independent fabric tasks(i.e. fab do.jenkins.slaves.show)

Nadav