PLEASE READ: Stop merging to ovirt-engine master until further notice
by Dafna Ron
Hello to all,
I asked to stop merging last week but it seemed to have been ignored so
writing again.
*please stop merging to ovirt-engine master until further notice.*
the project has been failing for the last two weeks on various different
regressions.
You continue to merge changes which means that you are risking further
regressions and making it hard to stabilize the master branch for
ovirt-engine.
to make it clear:
1. all changes that were merged since sept. 4th to the master branch exist
only in the ovirt-engine master branch (i.e they are not included in your
project package).
2. any code merged to master in the past two weeks was most likely not
tested by CQ since we exit on first test failure (if your code effects
anything after the reported issues ran, they would not be tested).
3. you are putting other projects at risk for regressions since they are
tested with a 2 weeks old ovirt-engine packages.
Thanks you.
Dafna
6 years, 3 months
[JIRA] (OVIRT-2498) Failing KubeVirt CI
by Petr Kotas (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2498?page=com.atlassian.jir... ]
Petr Kotas commented on OVIRT-2498:
-----------------------------------
I have created the issue.
The failures in the stdci are only partially causing our failures. Most of
the failures are due to unknown timeouts.
For this I would like to see live load on the test machine. If this can be
done.
Thanks in advance.
Best,
Petr
On Mon, Sep 17, 2018 at 4:18 PM Barak Korren (oVirt JIRA) <
> Failing KubeVirt CI
> -------------------
>
> Key: OVIRT-2498
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2498
> Project: oVirt - virtualization made easy
> Issue Type: By-EMAIL
> Reporter: Petr Kotas
> Assignee: infra
>
> Hi,
> I am working on fixing the issues on the KubeVirt e2e test suites. This
> task is directly related to unstable CI, due to unknown errors.
> The progress is reported in the CNV trello:
> https://trello.com/c/HNXcMEQu/161-epic-improve-ci
> I am creating this issue since the KubeVirt experience random timeouts on
> random tests most of the times when test suites run.
> The issue from outside is showing as timeouts on difference part of tests.
> Sometimes the tests fails in set up phase, again due to random timeout.
> The example in the link bellow timed out for network connection on
> localhost.
> [check-patch.k8s-1.11.0-dev.el7.x86_64]
> requests.exceptions.ReadTimeout:
> UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
> (read timeout=60)
> Example of failing test suites is here
> https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/co...
> The list of errors related to the failing CI can be found in my notes
> https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJ...
> I am not sure whether KubeVirt already shared the resource requirements, so
> I provide short summary:
> *Resources for KubeVirt e2e tests:*
> - at least 12GB of RAM - we start 3 nodes (3 docker images) each require
> 4GB of RAM
> - exposed /dev/kvm to enable native virtualization
> - cached images, since these are used to build the test cluster:
> - kubevirtci/os-3.10.0-crio:latest
> - kubevirtci/os-3.10.0-multus:latest
> - kubevirtci/os-3.10.0:latest
> - kubevirtci/k8s-1.10.4:latest
> - kubevirtci/k8s-multus-1.11.1:latest
> - kubevirtci/k8s-1.11.0:latest
> How can we overcome this? Can we work together to build a suitable
> requirements for running the tests so it passes each time?
> Kind regards,
> Petr Kotas
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)
6 years, 3 months
[JIRA] (OVIRT-2503) Testing automatic Jira ticket logging for CQ
monitoring
by Dafna Ron (oVirt JIRA)
Dafna Ron created OVIRT-2503:
--------------------------------
Summary: Testing automatic Jira ticket logging for CQ monitoring
Key: OVIRT-2503
URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2503
Project: oVirt - virtualization made easy
Issue Type: Bug
Reporter: Dafna Ron
Assignee: infra
As part of monitoring improvement and shifting partial responsibility for CQ monitoring to the developers I would like to do some experiments on connecting CQ alerts to create Jira tickets automatically.
As we have the ost monitoring project we start by doing some tests on that project so that we do not overflow our jira with alerts.
For now I would like to start with Jira's opening with the following specifications:
1. Subject: [ CQ ] [$patch number] [ oVirt $VER ($project) ] [ TEST NAME ]
2. at the beginning, email sent to: dron(a)redhat.com
3. jira description [cq message]
4. all jira's has to have label=ost_failure and infra-owner as default.
I would like to change the project type to allow:
1. easy closing of Jira's (one/two clicks if we can)
2. view of Jira's like service tickets (rather then bugs)
There is a plugin called zapier that allows to easily connect a jira from an email and also allow to add some rules to the Jira which may make this easier for us.
can you also install it and link it to the ost jira? I have an email account that we can use for that.
cq.ovirt(a)gmail.com
https://zapier.com/apps/jira/integrations
[~bkorren(a)redhat.com]
once I do some tests on my own on this, I wanted to try and collaborate with one of the projects (maybe networking or one of sandro's teams) where CQ failures would automatically open a ticket to their team and they can handle the monitoring and escalate issues to us if needed.
Any advice on configurations we should be thinking of for that?
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)
6 years, 3 months
[JIRA] (OVIRT-2498) Failing KubeVirt CI
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2498?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-2498:
-------------------------------------
[~pkotas] ok, I see its failing in the docketr-cleaup script... hmm.. we'll need to debug that...
Can you please open a specific ticket on that and include logs and any other specific information that can help us figure out why it may be failing there... (What containers might be on the machine that its failing to remove...)
But that is not what is causing all the failures right? We already fixed a couple of issues with that script...
> Failing KubeVirt CI
> -------------------
>
> Key: OVIRT-2498
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2498
> Project: oVirt - virtualization made easy
> Issue Type: By-EMAIL
> Reporter: Petr Kotas
> Assignee: infra
>
> Hi,
> I am working on fixing the issues on the KubeVirt e2e test suites. This
> task is directly related to unstable CI, due to unknown errors.
> The progress is reported in the CNV trello:
> https://trello.com/c/HNXcMEQu/161-epic-improve-ci
> I am creating this issue since the KubeVirt experience random timeouts on
> random tests most of the times when test suites run.
> The issue from outside is showing as timeouts on difference part of tests.
> Sometimes the tests fails in set up phase, again due to random timeout.
> The example in the link bellow timed out for network connection on
> localhost.
> [check-patch.k8s-1.11.0-dev.el7.x86_64]
> requests.exceptions.ReadTimeout:
> UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
> (read timeout=60)
> Example of failing test suites is here
> https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/co...
> The list of errors related to the failing CI can be found in my notes
> https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJ...
> I am not sure whether KubeVirt already shared the resource requirements, so
> I provide short summary:
> *Resources for KubeVirt e2e tests:*
> - at least 12GB of RAM - we start 3 nodes (3 docker images) each require
> 4GB of RAM
> - exposed /dev/kvm to enable native virtualization
> - cached images, since these are used to build the test cluster:
> - kubevirtci/os-3.10.0-crio:latest
> - kubevirtci/os-3.10.0-multus:latest
> - kubevirtci/os-3.10.0:latest
> - kubevirtci/k8s-1.10.4:latest
> - kubevirtci/k8s-multus-1.11.1:latest
> - kubevirtci/k8s-1.11.0:latest
> How can we overcome this? Can we work together to build a suitable
> requirements for running the tests so it passes each time?
> Kind regards,
> Petr Kotas
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)
6 years, 3 months
[JIRA] (OVIRT-2498) Failing KubeVirt CI
by Petr Kotas (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-2498?page=com.atlassian.jir... ]
Petr Kotas commented on OVIRT-2498:
-----------------------------------
[~bkorren(a)redhat.com] I would like to see the machine resources. The use of CPU, the use of RAM to understand how the tests behave live. I am not sure whether this is working on Blue Ocean.
WRT the docker, as I already pointer in the logs. The issue is way before our setup even kicks in.
[Here|https://jenkins.ovirt.org/blue/organizations/jenkins/kubevirt_kubevi...] is the direct link for that.
It seems that the jenkins project_setup.sh fails somehow. Again this is not our code, it is part of standard ci located [here|https://gerrit.ovirt.org/gitweb?p=jenkins.git;a=blob;f=jobs/confs/sh...].
It seems that the project setup, was doing its job and than randomly failed due to networking issue. I have no idea why.
Also I do not thing the issue is due to proxy as the failures are totally random on random tests.
So I am guessing something more hidden fails.
> Failing KubeVirt CI
> -------------------
>
> Key: OVIRT-2498
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-2498
> Project: oVirt - virtualization made easy
> Issue Type: By-EMAIL
> Reporter: Petr Kotas
> Assignee: infra
>
> Hi,
> I am working on fixing the issues on the KubeVirt e2e test suites. This
> task is directly related to unstable CI, due to unknown errors.
> The progress is reported in the CNV trello:
> https://trello.com/c/HNXcMEQu/161-epic-improve-ci
> I am creating this issue since the KubeVirt experience random timeouts on
> random tests most of the times when test suites run.
> The issue from outside is showing as timeouts on difference part of tests.
> Sometimes the tests fails in set up phase, again due to random timeout.
> The example in the link bellow timed out for network connection on
> localhost.
> [check-patch.k8s-1.11.0-dev.el7.x86_64]
> requests.exceptions.ReadTimeout:
> UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
> (read timeout=60)
> Example of failing test suites is here
> https://jenkins.ovirt.org/job/kubevirt_kubevirt_standard-check-pr/1916/co...
> The list of errors related to the failing CI can be found in my notes
> https://docs.google.com/document/d/1_ll1DOMHgCRHn_Df9i4uvtRFyMK-bDCHEeGfJ...
> I am not sure whether KubeVirt already shared the resource requirements, so
> I provide short summary:
> *Resources for KubeVirt e2e tests:*
> - at least 12GB of RAM - we start 3 nodes (3 docker images) each require
> 4GB of RAM
> - exposed /dev/kvm to enable native virtualization
> - cached images, since these are used to build the test cluster:
> - kubevirtci/os-3.10.0-crio:latest
> - kubevirtci/os-3.10.0-multus:latest
> - kubevirtci/os-3.10.0:latest
> - kubevirtci/k8s-1.10.4:latest
> - kubevirtci/k8s-multus-1.11.1:latest
> - kubevirtci/k8s-1.11.0:latest
> How can we overcome this? Can we work together to build a suitable
> requirements for running the tests so it passes each time?
> Kind regards,
> Petr Kotas
--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100092)
6 years, 3 months