Our STDCI V2 system make extensive use of the so-called 'loader' PODs except for very special cases, every job we have allocates and uses such a POD at least once. Those PODs are used for among other things:

Loading Jenkins pipeline groovy code
Loading the scripts from our `jenkins` repo
Running the STDCI V2 logic to parse the YAML files and figure out what to run on which resources
Rendering the STDCI V2 graphical report

The PODs are configured to require 500Mib of ram and run on the zone:ci, type:vm hosts. This means they end up running on one of the following VMs:

name memory

shift-n04.phx.ovirt.org 16264540Ki

shift-n05.phx.ovirt.org 16264540Ki

shift-n06.phx.ovirt.org 16264528Ki

So if we make the simple calculation of how many such pods can run on 16Gib vms, we come up with the theoretical result of 96, but Running a query like the following on one of those hosts reveals that we share those hosts with many other containers:

oc get --all-namespaces pods --field-selector=spec.nodeName=shift-n04.phx.ovirt.org,status.phase==Running

I suspect allocation of the loader container is starting to be a bottleneck. I think we might have to either increase the amount of RAM the VMs have, or make the loader containers require less RAM. But we need to be able to measure some things better to make a decision. Do we have ongoing metrics for:

What does the RAM utilization on the relevant VMs looks like
How much ram is actually used inside the loader containers

WDYT?

Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted