CI slaves extremely slow - overloaded slaves?

Nir Soffer nsoffer at redhat.com
Wed Dec 7 19:33:08 UTC 2016


Hi all,

In the last weeks we see more and more test failures due to timeouts in the CI.

For example:

17:19:49 ======================================================================
17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
17:19:49 ----------------------------------------------------------------------
17:19:49 Traceback (most recent call last):
17:19:49   File
"/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
line 165, in test_scale
17:19:49     self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
% elapsed)
17:19:49 AssertionError: Elapsed time: 1.105877 seconds
17:19:49 -------------------- >> begin captured stdout << ---------------------
17:19:49 1.105877 seconds

This test runs in 0.048 seconds on my laptop:

$ ./run_tests_local.sh storage_filesd_test.py -s
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
storage_filesd_test.GetAllVolumesTests
    test_no_templates                                           OK
    test_no_volumes                                             OK
    test_scale                                                  0.047932 seconds
OK
    test_with_template                                          OK

----------------------------------------------------------------------
Ran 4 tests in 0.189s

It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.

We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.

Nir



More information about the Infra mailing list