[
https://ovirt-jira.atlassian.net/browse/OVIRT-919?page=com.atlassian.jira...
]
Barak Korren commented on OVIRT-919:
------------------------------------
{quote}
... snip ...
17:19:49 1.105877 seconds
This test runs in 0.048 seconds on my laptop:
... snip ...
Ran 4 tests in 0.189s
{quote}
[~nsoffer(a)redhat.com] Lets make sure we're comparing apples and apples first, did you
run this using "{{mock_runner.sh}}" on your laptop? Perhaps try this in a VM on
your laptop?
{quote}
It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.
{quote}
Not sure where nested kvm comes into play here, "{{check_patch.sh}}" runs on
plain VMs.
(As an OT side-note: Are you sure nested VMs should be slower CPU-wise? AFAIK those are
just VMs with nested I/O routing... it may be interesting to check this with Lago...)
WRT "overloaded slaves" - that can hardly happen, no slave ever runs more then
one job at at time.
The hypervisors themselves may be loaded, we'll try to gather some more data here:
[~ngoldin(a)redhat.com] Do we have hypervisor load graphs in Graphite?
[~ederevea] What is out vm/core ratio? Are we running oVirt DWH? maybe it can help us
here...
{quote}
We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.
{quote}
As far as machines go, our upstream hardware is quite beefy these days, so lets not jump
to any conclusions. More data is needed, what is this test actually doing? is this pure
CPU, or is there so I/O involved here as well?
[~rgolan(a)redhat.com] Are there any oVirt features that can help us here? Higher priority
VMs? Realtime? Better scheduling?
Fwd: CI slaves extremely slow - overloaded slaves?
--------------------------------------------------
Key: OVIRT-919
URL:
https://ovirt-jira.atlassian.net/browse/OVIRT-919
Project: oVirt - virtualization made easy
Issue Type: By-EMAIL
Components: Jenkins
Reporter: Barak Korren
Assignee: infra
Labels: hypervisors, performance, slaves, standard-ci
From: Nir Soffer <nsoffer(a)redhat.com>
Date: 7 December 2016 at 21:33
Subject: CI slaves extremely slow - overloaded slaves?
To: infra <infra(a)ovirt.org>, Eyal Edri <eedri(a)redhat.com>, Dan
Kenigsberg <danken(a)redhat.com>
Hi all,
In the last weeks we see more and more test failures due to timeouts in the CI.
For example:
17:19:49 ======================================================================
17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
17:19:49 ----------------------------------------------------------------------
17:19:49 Traceback (most recent call last):
17:19:49 File
"/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
line 165, in test_scale
17:19:49 self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
% elapsed)
17:19:49 AssertionError: Elapsed time: 1.105877 seconds
17:19:49 -------------------- >> begin captured stdout <<
---------------------
17:19:49 1.105877 seconds
This test runs in 0.048 seconds on my laptop:
$ ./run_tests_local.sh storage_filesd_test.py -s
nose.config: INFO: Ignoring files matching ['^\\.', '^_',
'^setup\\.py$']
storage_filesd_test.GetAllVolumesTests
test_no_templates OK
test_no_volumes OK
test_scale 0.047932 seconds
OK
test_with_template OK
----------------------------------------------------------------------
Ran 4 tests in 0.189s
It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.
We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.
Nir
--
This message was sent by Atlassian JIRA
(v1000.620.0#100023)