[JIRA] (OVIRT-2498) Failing KubeVirt CI

Monday, 17 September 2018

    [
https://ovirt-jira.atlassian.net/browse/OVIRT-2498?page=com.atlassian.jir...
] 

Petr Kotas commented on OVIRT-2498:
-----------------------------------

[~bkorren(a)redhat.com] I would like to see the machine resources. The use of CPU, the use
of RAM to understand how the tests behave live. I am not sure whether this is working on
Blue Ocean.

WRT the docker, as I already pointer in the logs. The issue is way before our setup even
kicks in.
[Here|https://jenkins.ovirt.org/blue/organizations/jenkins/kubevirt_kubevi...]
is the direct link for that.
It seems that the jenkins project_setup.sh fails somehow. Again this is not our code, it
is part of standard ci located
[here|https://gerrit.ovirt.org/gitweb?p=jenkins.git;a=blob;f=jobs/confs/sh...].
It seems that the project setup, was doing its job and than randomly failed due to
networking issue. I have no idea why.

Also I do not thing the issue is due to proxy as the failures are totally random on random
tests.
So I am guessing something more hidden fails.

...
 Failing KubeVirt CI
 -------------------

                 Key: OVIRT-2498
                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2498
             Project: oVirt - virtualization made easy
          Issue Type: By-EMAIL
            Reporter: Petr Kotas
            Assignee: infra

 Hi,
 I am working on fixing the issues on the KubeVirt e2e test suites. This
 task is directly related to unstable CI, due to unknown errors.
 The progress is reported in the CNV trello:
 https://trello.com/c/HNXcMEQu/161-epic-improve-ci
 I am creating this issue since the KubeVirt experience random timeouts on
 random tests most of the times when test suites run.
 The issue from outside is showing as timeouts on difference part of tests.
 Sometimes the tests fails in set up phase, again due to random timeout.
 The example in the link bellow timed out for network connection on
 localhost.
 [check-patch.k8s-1.11.0-dev.el7.x86_64]
 requests.exceptions.ReadTimeout:
 UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
 (read timeout=60)
 Example of failing test suites is here
 https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/co...
 The list of errors related to the failing CI can be found in my notes

https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJ...
 I am not sure whether KubeVirt already shared the resource requirements, so
 I provide short summary:
 *Resources for KubeVirt e2e tests:*
    - at least 12GB of RAM - we start 3 nodes (3 docker images) each require
    4GB of RAM
    - exposed /dev/kvm to enable native virtualization
    - cached images, since these are used to build the test cluster:
       - kubevirtci/os-3.10.0-crio:latest
       - kubevirtci/os-3.10.0-multus:latest
       - kubevirtci/os-3.10.0:latest
       - kubevirtci/k8s-1.10.4:latest
       - kubevirtci/k8s-multus-1.11.1:latest
       - kubevirtci/k8s-1.11.0:latest
 How can we overcome this? Can we work together to build a suitable
 requirements for running the tests so it passes each time?
 Kind regards,
 Petr Kotas 

--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[JIRA] (OVIRT-2498) Failing KubeVirt CI