]
Eyal Edri updated OVIRT-2593:
-----------------------------
Issue Type: Improvement (was: By-EMAIL)
How does stdci prevent regressions and proactively monitor the
cluster?
-----------------------------------------------------------------------
Key: OVIRT-2593
URL:
https://ovirt-jira.atlassian.net/browse/OVIRT-2593
Project: oVirt - virtualization made easy
Issue Type: Improvement
Reporter: Roman Mohr
Assignee: infra
We want to go one step further with KubeVirt and sooner or later only merge
when the tests are green (automatically).
Therefore we want to ensure that this CI system is the right system for us
and can be properly scaled, developed and operated.
Apart from requirements like, automatically re-run tests and a merge-pools
stability and QoS of the CI system are interesting for us.
Some examples:
* Sometimes jobs break with a system error shown in the logs (is that
monitored and worked on?)
* Sometimes things like "out-of-disk-space" show up. Is e.g. disk
utilization proactively handled?
* We had one issue where the docker installation was broken in a
build-slot and all jobs stopped fast. As a consequence all following builds
were scheduled there too. Is something like that monitored?
* We repeatedly have issues, connecting to jenkins. It is extremely slow
(not just Blue-Ocean-slow, really slow). Are such things monitored and
alarms raised, countermeasures taken?
* That did not happen for a while, but there were repeatedly bare-metal
machines whithout kvm-nesting added to the cluster. Are there measures in
place which prevent such regressions where the same issues happen multiple
times?
* How is the flexibility of the project ensured? Is it also tested and
maintained in a sane fashion to allow proper evolution in time? Automated
tests? Offline-testing of changes? And so on ...