I haven't seen any comments on this thread, so we are going to move forward
with the change.
On Mon, 2 Sep 2019 at 09:03, Barak Korren <bkorren(a)redhat.com> wrote:
Adding Evgeny and Shirly who are AFAIK the owners of the metrics
suit.
On Sun, 1 Sep 2019 at 17:07, Barak Korren <bkorren(a)redhat.com> wrote:
> If you have been using or monitoring any OST suits recently, you may have
> noticed we've been suffering from long delays in allocating CI hardware
> resources for running OST suits. I'd like to briefly discuss the reasons
> behind this, what are planning to do to resolve this and the implication of
> those actions for big suit owners.
>
> As you might know, we have moved a while ago from running OST suits each
> on its own dedicated server to running them inside containers managed by
> OpenShift. That had allowed us to run multiple OST suits on the same
> bare-metal host which in turn increased our overall capacity by 50% while
> still allowing us to free up hardware for accommodating the kubevirt
> project on our CI hardware.
>
> Our infrastructure is currently built in a way where we use the exact
> same POD specification (and therefore resource settings) for all suits.
> Making it more flexible at this point would require significant code
> changes we are not likely to make. What this means is that we need to make
> sure our PODs have enough resources to run the most demanding suits. It
> also means we waste some resources when running less demanding ones.
>
> Given the set of OST suits we have ATM, we sized our PODs to allocate
> 32Gibs of RAM. Given the servers we have, this means we can run 15 suits at
> a time in parallel. This was sufficient for a while, but given increasing
> demand, and the expectation for it to increase further once we introduce
> the patch gating features we've been working on, we must find a way to
> significantly increase our suit running capacity.
>
> We have measured the amount of RAM required by each suit and came to the
> conclusion that for the vast majority of suits, we could settle for PODs
> that allocate only 14Gibs of RAM. If we make that change, we would be able
> to run a total of 40 suits at a time, almost tripling our current capacity.
>
> The downside of making this change is that our STDCI V2 infrastructure
> will no longer be able to run suits that require more then 14Gib of RAM.
> This effectively means it would no longer be possible to run these suits
> from OST's check-patch job or from the OST manual job.
>
> The list of relevant suits that would be affected follows, the suit
> owners, as documented in the CI configuration, have be added as "to"
> recipients to the message:
>
> - hc-basic-suite-4.3
> - hc-basic-suite-master
> - metrics-suite-4.3
>
> Since we're aware people would still like to be able to work with the
> bigger suits, we will leverage the nightly suit invocation jobs to enable
> then to be run in the CI infra. We will support the following use cases:
>
> - *Periodically running the suit on the latest oVirt packages* - this
> will be done by the nightly job like it is done today
> - *Running the suit to test changes to the suit`s code* - while
> currently this is done automatically by check-patch, this would have to be
> done manually in the future by manually triggering the nightly job and
> setting the REFSPEC parameter to point to the examined patch
> - *Triggering the suit manually* - This would be done by triggering
> the suit-specific nightly job (as opposed to the general OST manual job)
>
> The patches listed below implement the changes outlined above:
>
> - 102757 <
https://gerrit.ovirt.org/102757> nightly-system-tests: big
> suits -> big containers
> - 102771 <
https://gerrit.ovirt.org/102771>: stdci: Drop `big` suits
> from check-patch
>
> We know that making the changes we presented will make things a little
> less convenient for users and maintainers of the big suits, but we believe
> the benefits of having vastly increased execution capacity for all other
> suits outweigh those shortcomings.
>
> We would like to hear all relevant comment and questions from the quite
> owners and other interested parties, especially is you think we should not
> carry out the changes we propose.
> Please take the time to respond on this thread, or on the linked patches.
>
> Thanks,
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
>
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
>
--
Barak Korren
RHV DevOps team , RHCE, RHCi
Red Hat EMEA
redhat.com | TRIED. TESTED. TRUSTED. |
redhat.com/trusted
| TRIED. TESTED. TRUSTED. |