November 2018 - Infra - Ovirt List Archives

[JIRA] (OVIRT-2593) How does stdci prevent regressions and proactively monitor the cluster?

by Roman Mohr (oVirt JIRA)

Roman Mohr created OVIRT-2593: --------------------------------- Summary: How does stdci prevent regressions and proactively monitor the cluster? Key: OVIRT-2593 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2593 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Reporter: Roman Mohr Assignee: infra We want to go one step further with KubeVirt and sooner or later only merge when the tests are green (automatically). Therefore we want to ensure that this CI system is the right system for us and can be properly scaled, developed and operated. Apart from requirements like, automatically re-run tests and a merge-pools stability and QoS of the CI system are interesting for us. Some examples: * Sometimes jobs break with a system error shown in the logs (is that monitored and worked on?) * Sometimes things like "out-of-disk-space" show up. Is e.g. disk utilization proactively handled? * We had one issue where the docker installation was broken in a build-slot and all jobs stopped fast. As a consequence all following builds were scheduled there too. Is something like that monitored? * We repeatedly have issues, connecting to jenkins. It is extremely slow (not just Blue-Ocean-slow, really slow). Are such things monitored and alarms raised, countermeasures taken? * That did not happen for a while, but there were repeatedly bare-metal machines whithout kvm-nesting added to the cluster. Are there measures in place which prevent such regressions where the same issues happen multiple times? * How is the flexibility of the project ensured? Is it also tested and maintained in a sane fashion to allow proper evolution in time? Automated tests? Offline-testing of changes? And so on ... -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100095)

6 years, 7 months

1
0
0 / 0

[JIRA] (OVIRT-2592) Allow caching directories in the datacenter

by Roman Mohr (oVirt JIRA)

Roman Mohr created OVIRT-2592: --------------------------------- Summary: Allow caching directories in the datacenter Key: OVIRT-2592 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2592 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Reporter: Roman Mohr Assignee: infra What? Sometimes jobs create huge artifacts or need to download a lot in order start building. For instance Maven, glide or bazel download a lot before they can build. These tools also include their own checks to ensure that the content of a cached folder is in-sync. It is helpful if people can specify directories to cache. Normally it makes sense to have options to allow a cache-per-build-branch. When a PR is merged into this branch the cache from that branch is used. There are very rare conditions where it is needed to clear such a cache. A UI-Button to do so is helpful. Just to higlight the difference to e.g. container whitelist: This is about caching in the cluster, not in a build-slot. -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100095)

6 years, 7 months

1
0
0 / 0

[JIRA] (OVIRT-2591) Add a distributed docker-cache

by Roman Mohr (oVirt JIRA)

Roman Mohr created OVIRT-2591: --------------------------------- Summary: Add a distributed docker-cache Key: OVIRT-2591 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2591 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Reporter: Roman Mohr Assignee: infra What? If CI builds get heavy and things are running inside containers, I expect that the CI system proactively tries to optimize when it can. Since the CI system provides the docker installation, I would expect that under some conditions, it automatically puts heavy docker builds in a distributed cache in the cluster. Examples on how this can achieved are listed in [1] and [2]. Why? Dockerfiles have the advantage that we can isolate our biuld-steps in a Dockerfile. This gives reproducibility, but also means that e.g. curl downloads or RPM installs are not visible for the CI system. Therefore it is beneficial for the CI system and the user (more speed and less utilization), to put docker images with their build chain into a distributed cache and pre-fetch the cache into the docker cache of the build slot. Pre-fetching based on e.g. gibhub project probably makes sense. [1] https://runnable.com/blog/distributing-docker-cache-across-hosts [2] https://blog.codeship.com/building-a-remote-caching-system/ -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100095)

6 years, 7 months

1
0
0 / 0

[JIRA] (OVIRT-2590) Cache Docker images in the datacenter

by Roman Mohr (oVirt JIRA)

Roman Mohr created OVIRT-2590: --------------------------------- Summary: Cache Docker images in the datacenter Key: OVIRT-2590 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2590 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Reporter: Roman Mohr Assignee: infra What? As a user, I expect that I don't have to care about caching to speed up builds for the good of the CI system itself. Right now there exists a whitelist for docker images, which will not be remove from the build slot after the build. Instead of that I expect a clean build environment and that in general all images which I regularly use are cached in the cluster via e.g. a pull-through-cache [1]. Why? 1) Caching in a build slot is not very effective. CI runs do really-a-lot of almost identical things in a small time-window (e.g. days). If caching happens in the build-slot and many slots are present, then the cache utilization will be very low. 2) Whitelisting docker images extra for a slot where the registry runs in, is very error prone and since it is not cached across the cluster it is also very intransparent what the clear benefit for the user is. Especially when thinking about scaling a CI system, that seems to leak internal optimizations to the user. Fast builds are twice as important for the CI system than they are for the users (by default faster and lower utilization is always better than asking people to optimize on their side). [1] https://docs.docker.com/registry/recipes/mirror/ -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100095)

6 years, 7 months

1
0
0 / 0

[CQ]: 95383,2 (ovirt-lldp-labeler) failed "ovirt-4.2" system tests

by oVirt Jenkins

Change 95383,2 (ovirt-lldp-labeler) is probably the reason behind recent system test failures in the "ovirt-4.2" change queue and needs to be fixed. This change had been removed from the testing queue. Artifacts build from this change will not be released until it is fixed. For further details about the change see: https://gerrit.ovirt.org/#/c/95383/2 For failed test results see: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3535/

6 years, 7 months

1
0
0 / 0

[CQ]: 95680, 1 (ovirt-engine-extension-logger-log4j) failed "ovirt-master" system tests

by oVirt Jenkins

Change 95680,1 (ovirt-engine-extension-logger-log4j) is probably the reason behind recent system test failures in the "ovirt-master" change queue and needs to be fixed. This change had been removed from the testing queue. Artifacts build from this change will not be released until it is fixed. For further details about the change see: https://gerrit.ovirt.org/#/c/95680/1 For failed test results see: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/11638/

6 years, 7 months

1
0
0 / 0

[JIRA] (OVIRT-2586) Jenkins terribly slow and unresponsive

by Evgheni Dereveanchin (oVirt JIRA)

[ https://ovirt-jira.atlassian.net/browse/OVIRT-2586?page=com.atlassian.jir... ] Evgheni Dereveanchin reassigned OVIRT-2586: ------------------------------------------- Assignee: Evgheni Dereveanchin (was: infra) > Jenkins terribly slow and unresponsive > -------------------------------------- > > Key: OVIRT-2586 > URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2586 > Project: oVirt - virtualization made easy > Issue Type: By-EMAIL > Reporter: sbonazzo > Assignee: Evgheni Dereveanchin > Priority: Highest > > Hi, > jenkins is terribly slow and becoming worse every day. > I tried to gain some speed by adding 4 cores to the VM through engine-phx. > It's a bit better but the real issue doesn't seem related to CPU power. > Can anybody investigate? > -- > SANDRO BONAZZOLA > MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV > Red Hat EMEA <https://www.redhat.com/> > sbonazzo(a)redhat.com > <https://red.ht/sig> -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100095)

6 years, 7 months

1
0
0 / 0

[CQ]: 95722,1 (cockpit-ovirt) failed "ovirt-4.2" system tests

by oVirt Jenkins

Change 95722,1 (cockpit-ovirt) is probably the reason behind recent system test failures in the "ovirt-4.2" change queue and needs to be fixed. This change had been removed from the testing queue. Artifacts build from this change will not be released until it is fixed. For further details about the change see: https://gerrit.ovirt.org/#/c/95722/1 For failed test results see: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3534/

6 years, 7 months

1
0
0 / 0

Fwd: ** PROBLEM Service Alert: ovirt-mirrorchecker/mirror.slu.cz/ovirt mirror site last sync is CRITICAL **

by Emil Natan

Hello, mirror.slu.cz has not synced with our mirror master for a few days now. Can you please investigate. Thank you very much in advance. ---------- Forwarded message --------- From: <nagios(a)monitoring.ovirt.org> Date: Sun, Nov 25, 2018 at 7:34 AM Subject: ** PROBLEM Service Alert: ovirt-mirrorchecker/mirror.slu.cz/ovirt mirror site last sync is CRITICAL ** To: <ena(a)redhat.com> ***** Nagios ***** Notification Type: PROBLEM Service: mirror.slu.cz/ovirt mirror site last sync Host: ovirt-mirrorchecker Address: web-ovirt-mirrorchecker.apps.ovirt.org State: CRITICAL Date/Time: Sun Nov 25 05:33:54 UTC 2018 Additional Info: CRITICAL - 367439 seconds since last sync, which are 102.0664 hours. -- Emil Natan RHV/CNV DevOps

6 years, 7 months

2
2
0 / 0

[JIRA] (OVIRT-2589) CI random failures on VDSM check-patch

by Edward Haas (oVirt JIRA)

Edward Haas created OVIRT-2589: ---------------------------------- Summary: CI random failures on VDSM check-patch Key: OVIRT-2589 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2589 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Reporter: Edward Haas Assignee: infra Hello, It seems we are experiencing random failures on the CI VDSM check-patch, Could you please have a look? https://jenkins.ovirt.org/job/vdsm_standard-check-patch/108/ https://jenkins.ovirt.org/job/vdsm_master_check-patch-el7-x86_64/26203/ Some have errors of improper imports, as if the slave is not "clean". Thanks, Edy. -- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100095)

6 years, 7 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Infra November 2018