[JIRA] (OVIRT-2593) How does stdci prevent regressions and proactively monitor the cluster?

[ https://ovirt-jira.atlassian.net/browse/OVIRT-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38662#comment-38662 ] Eyal Edri commented on OVIRT-2593: ---------------------------------- [~ederevea] I believe we solved some of the issues here and some are in progress, for e.g: We've identified the source for slowness on the UI and its a memory leak on the SSE plugin blue ocean is using, [~dbelenky@redhat.com] please add a link to the ticket that refers to that. We've also applied JVM improvements to the master, and limit the session timeout ( it was unlimited so far ). Also, we're working on splitting the kubevirt Jenkins to be independent and not shared with oVirt, tracked on another ticket [~bkorren@redhat.com] can add links. We are also planning to add monitoring, hopefully soon, [~ederevea] please add link to the card on it. As for flexibility of the project, we're doing our best with the very limited resources we have available and the number of developers available to contribute. Having said that, we have staging systems and we try to add tests to any new code that we introduce, including testing on staging.
How does stdci prevent regressions and proactively monitor the cluster? -----------------------------------------------------------------------
Key: OVIRT-2593 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2593 Project: oVirt - virtualization made easy Issue Type: Improvement Reporter: Roman Mohr Assignee: infra
We want to go one step further with KubeVirt and sooner or later only merge when the tests are green (automatically). Therefore we want to ensure that this CI system is the right system for us and can be properly scaled, developed and operated. Apart from requirements like, automatically re-run tests and a merge-pools stability and QoS of the CI system are interesting for us. Some examples: * Sometimes jobs break with a system error shown in the logs (is that monitored and worked on?) * Sometimes things like "out-of-disk-space" show up. Is e.g. disk utilization proactively handled? * We had one issue where the docker installation was broken in a build-slot and all jobs stopped fast. As a consequence all following builds were scheduled there too. Is something like that monitored? * We repeatedly have issues, connecting to jenkins. It is extremely slow (not just Blue-Ocean-slow, really slow). Are such things monitored and alarms raised, countermeasures taken? * That did not happen for a while, but there were repeatedly bare-metal machines whithout kvm-nesting added to the cluster. Are there measures in place which prevent such regressions where the same issues happen multiple times? * How is the flexibility of the project ensured? Is it also tested and maintained in a sane fashion to allow proper evolution in time? Automated tests? Offline-testing of changes? And so on ...
-- This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100096)
participants (1)
-
Eyal Edri (oVirt JIRA)