[JIRA] (OVIRT-2593) How does stdci prevent regressions and
proactively monitor the cluster?
by Roman Mohr (oVirt JIRA)
Roman Mohr created OVIRT-2593:
---------------------------------
Summary: How does stdci prevent regressions and proactively monitor the cluster?
Key: OVIRT-2593
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2593
Project: oVirt - virtualization made easy
Issue Type: By-EMAIL
Reporter: Roman Mohr
Assignee: infra
We want to go one step further with KubeVirt and sooner or later only merge
when the tests are green (automatically).
Therefore we want to ensure that this CI system is the right system for us
and can be properly scaled, developed and operated.
Apart from requirements like, automatically re-run tests and a merge-pools
stability and QoS of the CI system are interesting for us.
Some examples:
* Sometimes jobs break with a system error shown in the logs (is that
monitored and worked on?)
* Sometimes things like "out-of-disk-space" show up. Is e.g. disk
utilization proactively handled?
* We had one issue where the docker installation was broken in a
build-slot and all jobs stopped fast. As a consequence all following builds
were scheduled there too. Is something like that monitored?
* We repeatedly have issues, connecting to jenkins. It is extremely slow
(not just Blue-Ocean-slow, really slow). Are such things monitored and
alarms raised, countermeasures taken?
* That did not happen for a while, but there were repeatedly bare-metal
machines whithout kvm-nesting added to the cluster. Are there measures in
place which prevent such regressions where the same issues happen multiple
times?
* How is the flexibility of the project ensured? Is it also tested and
maintained in a sane fashion to allow proper evolution in time? Automated
tests? Offline-testing of changes? And so on ...
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100095)
6 years, 4 months
[JIRA] (OVIRT-2592) Allow caching directories in the datacenter
by Roman Mohr (oVirt JIRA)
Roman Mohr created OVIRT-2592:
---------------------------------
Summary: Allow caching directories in the datacenter
Key: OVIRT-2592
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2592
Project: oVirt - virtualization made easy
Issue Type: By-EMAIL
Reporter: Roman Mohr
Assignee: infra
What?
Sometimes jobs create huge artifacts or need to download a lot in order
start building. For instance Maven, glide or bazel download a lot before
they can build.
These tools also include their own checks to ensure that the content of a
cached folder is in-sync.
It is helpful if people can specify directories to cache. Normally it makes
sense to have options to allow a cache-per-build-branch. When a PR is
merged into this branch the cache from that branch is used. There are very
rare conditions where it is needed to clear such a cache. A UI-Button to do
so is helpful. Just to higlight the difference to e.g. container whitelist:
This is about caching in the cluster, not in a build-slot.
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100095)
6 years, 4 months
[JIRA] (OVIRT-2591) Add a distributed docker-cache
by Roman Mohr (oVirt JIRA)
Roman Mohr created OVIRT-2591:
---------------------------------
Summary: Add a distributed docker-cache
Key: OVIRT-2591
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2591
Project: oVirt - virtualization made easy
Issue Type: By-EMAIL
Reporter: Roman Mohr
Assignee: infra
What?
If CI builds get heavy and things are running inside containers, I expect
that the CI system proactively tries to optimize when it can. Since the CI
system provides the docker installation, I would expect that under some
conditions, it automatically puts heavy docker builds in a distributed
cache in the cluster. Examples on how this can achieved are listed in [1]
and [2].
Why?
Dockerfiles have the advantage that we can isolate our biuld-steps in a
Dockerfile. This gives reproducibility, but also means that e.g. curl
downloads or RPM installs are not visible for the CI system. Therefore it
is beneficial for the CI system and the user (more speed and less
utilization), to put docker images with their build chain into a
distributed cache and pre-fetch the cache into the docker cache of the
build slot. Pre-fetching based on e.g. gibhub project probably makes sense.
[1] https://runnable.com/blog/distributing-docker-cache-across-hosts
[2] https://blog.codeship.com/building-a-remote-caching-system/
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100095)
6 years, 4 months
[JIRA] (OVIRT-2590) Cache Docker images in the datacenter
by Roman Mohr (oVirt JIRA)
Roman Mohr created OVIRT-2590:
---------------------------------
Summary: Cache Docker images in the datacenter
Key: OVIRT-2590
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2590
Project: oVirt - virtualization made easy
Issue Type: By-EMAIL
Reporter: Roman Mohr
Assignee: infra
What?
As a user, I expect that I don't have to care about caching to speed up
builds for the good of the CI system itself.
Right now there exists a whitelist for docker images, which will not be
remove from the build slot after the build. Instead of that I expect a
clean build environment and that in general all images which I regularly
use are cached in the cluster via e.g. a pull-through-cache [1].
Why?
1) Caching in a build slot is not very effective. CI runs do really-a-lot
of almost identical things in a small time-window (e.g. days). If caching
happens in the build-slot and many slots are present, then the cache
utilization will be very low.
2) Whitelisting docker images extra for a slot where the registry runs in,
is very error prone and since it is not cached across the cluster it is
also very intransparent what the clear benefit for the user is. Especially
when thinking about scaling a CI system, that seems to leak internal
optimizations to the user. Fast builds are twice as important for the CI
system than they are for the users (by default faster and lower utilization
is always better than asking people to optimize on their side).
[1] https://docs.docker.com/registry/recipes/mirror/
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100095)
6 years, 4 months
Fwd: ** PROBLEM Service Alert: ovirt-mirrorchecker/mirror.slu.cz/ovirt
mirror site last sync is CRITICAL **
by Emil Natan
Hello,
mirror.slu.cz has not synced with our mirror master for a few days now. Can
you please investigate.
Thank you very much in advance.
---------- Forwarded message ---------
From: <nagios(a)monitoring.ovirt.org>
Date: Sun, Nov 25, 2018 at 7:34 AM
Subject: ** PROBLEM Service Alert: ovirt-mirrorchecker/mirror.slu.cz/ovirt
mirror site last sync is CRITICAL **
To: <ena(a)redhat.com>
***** Nagios *****
Notification Type: PROBLEM
Service: mirror.slu.cz/ovirt mirror site last sync
Host: ovirt-mirrorchecker
Address: web-ovirt-mirrorchecker.apps.ovirt.org
State: CRITICAL
Date/Time: Sun Nov 25 05:33:54 UTC 2018
Additional Info:
CRITICAL - 367439 seconds since last sync, which are 102.0664 hours.
--
Emil Natan
RHV/CNV DevOps
6 years, 4 months