[JIRA] (OVIRT-919) Fwd: CI slaves extremely slow - overloaded slaves?

Barak Korren (oVirt JIRA) jira at ovirt-jira.atlassian.net
Thu Dec 8 07:11:02 UTC 2016


     [ https://ovirt-jira.atlassian.net/browse/OVIRT-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Barak Korren updated OVIRT-919:
-------------------------------
    Description: 
From: Nir Soffer <nsoffer at redhat.com>
Date: 7 December 2016 at 21:33
Subject: CI slaves extremely slow - overloaded slaves?
To: infra <infra at ovirt.org>, Eyal Edri <eedri at redhat.com>, Dan
Kenigsberg <danken at redhat.com>


Hi all,

In the last weeks we see more and more test failures due to timeouts in the CI.

For example:

17:19:49 ======================================================================
17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
17:19:49 ----------------------------------------------------------------------
17:19:49 Traceback (most recent call last):
17:19:49   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
line 165, in test_scale
17:19:49     self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
% elapsed)
17:19:49 AssertionError: Elapsed time: 1.105877 seconds
17:19:49 -------------------- >> begin captured stdout << ---------------------
17:19:49 1.105877 seconds

This test runs in 0.048 seconds on my laptop:

$ ./run_tests_local.sh storage_filesd_test.py -s
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
storage_filesd_test.GetAllVolumesTests
    test_no_templates                                           OK
    test_no_volumes                                             OK
    test_scale                                                  0.047932 seconds
OK
    test_with_template                                          OK

----------------------------------------------------------------------
Ran 4 tests in 0.189s

It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.

We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.

Nir
_______________________________________________
Infra mailing list
Infra at ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


-- 
Barak Korren
bkorren at redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/

  was:
More discussion and tracking needed, moving to Jira (please put any
further discussion there)

---------- Forwarded message ----------
From: Nir Soffer <nsoffer at redhat.com>
Date: 7 December 2016 at 21:33
Subject: CI slaves extremely slow - overloaded slaves?
To: infra <infra at ovirt.org>, Eyal Edri <eedri at redhat.com>, Dan
Kenigsberg <danken at redhat.com>


Hi all,

In the last weeks we see more and more test failures due to timeouts in the CI.

For example:

17:19:49 ======================================================================
17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
17:19:49 ----------------------------------------------------------------------
17:19:49 Traceback (most recent call last):
17:19:49   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
line 165, in test_scale
17:19:49     self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
% elapsed)
17:19:49 AssertionError: Elapsed time: 1.105877 seconds
17:19:49 -------------------- >> begin captured stdout << ---------------------
17:19:49 1.105877 seconds

This test runs in 0.048 seconds on my laptop:

$ ./run_tests_local.sh storage_filesd_test.py -s
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
storage_filesd_test.GetAllVolumesTests
    test_no_templates                                           OK
    test_no_volumes                                             OK
    test_scale                                                  0.047932 seconds
OK
    test_with_template                                          OK

----------------------------------------------------------------------
Ran 4 tests in 0.189s

It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.

We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.

Nir
_______________________________________________
Infra mailing list
Infra at ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


-- 
Barak Korren
bkorren at redhat.com
RHCE, RHCi, RHV-DevOps Team
https://ifireball.wordpress.com/


> Fwd: CI slaves extremely slow - overloaded slaves?
> --------------------------------------------------
>
>                 Key: OVIRT-919
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-919
>             Project: oVirt - virtualization made easy
>          Issue Type: By-EMAIL
>            Reporter: Barak Korren
>            Assignee: infra
>
> From: Nir Soffer <nsoffer at redhat.com>
> Date: 7 December 2016 at 21:33
> Subject: CI slaves extremely slow - overloaded slaves?
> To: infra <infra at ovirt.org>, Eyal Edri <eedri at redhat.com>, Dan
> Kenigsberg <danken at redhat.com>
> Hi all,
> In the last weeks we see more and more test failures due to timeouts in the CI.
> For example:
> 17:19:49 ======================================================================
> 17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
> 17:19:49 ----------------------------------------------------------------------
> 17:19:49 Traceback (most recent call last):
> 17:19:49   File
> "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
> line 165, in test_scale
> 17:19:49     self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
> % elapsed)
> 17:19:49 AssertionError: Elapsed time: 1.105877 seconds
> 17:19:49 -------------------- >> begin captured stdout << ---------------------
> 17:19:49 1.105877 seconds
> This test runs in 0.048 seconds on my laptop:
> $ ./run_tests_local.sh storage_filesd_test.py -s
> nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
> storage_filesd_test.GetAllVolumesTests
>     test_no_templates                                           OK
>     test_no_volumes                                             OK
>     test_scale                                                  0.047932 seconds
> OK
>     test_with_template                                          OK
> ----------------------------------------------------------------------
> Ran 4 tests in 0.189s
> It seems that we are overloading the CI slaves. We should not use nested kvm
> for the CI, such vms are much slower then regular vms, and we probably run
> too many vms per cpu.
> We can disable such tests in the CI, but we do want to know when there is
> a regression in this code. Before it was fixed, the same test took 9 seconds
> on my laptop. We need fast machines in the CI for this.
> Nir
> _______________________________________________
> Infra mailing list
> Infra at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
> -- 
> Barak Korren
> bkorren at redhat.com
> RHCE, RHCi, RHV-DevOps Team
> https://ifireball.wordpress.com/



--
This message was sent by Atlassian JIRA
(v1000.620.0#100023)



More information about the Infra mailing list