Our STDCI V2 system make extensive use of the so-called 'loader' PODs except for very special cases, every job we have allocates and uses such a POD at least once. Those PODs are used for among other things:
  1. Loading Jenkins pipeline groovy code
  2. Loading the scripts from our `jenkins` repo
  3. Running the STDCI V2 logic to parse the YAML files and figure out what to run on which resources
  4. Rendering the STDCI V2 graphical report
The PODs are configured to require 500Mib of ram and run on the zone:ci, type:vm hosts. This means they end up running on one of the following VMs:

name                      memory

So if we make the simple calculation of how many such pods can run on 16Gib vms, we come up with the theoretical result of 96, but Running a query like the following on one of those hosts reveals that we share those hosts with many other containers:

oc get --all-namespaces pods --field-selector=spec.nodeName=shift-n04.phx.ovirt.org,status.phase==Running

I suspect allocation of the loader container is starting to be a bottleneck. I think we might have to either increase the amount of RAM the VMs have, or make the loader containers require less RAM. But we need to be able to measure some things better to make a decision. Do we have ongoing metrics for:
  • What does the RAM utilization on the relevant VMs looks like
  • How much ram is actually  used inside the loader containers
WDYT?

--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted