[JIRA] (OVIRT-919) Fwd: CI slaves extremely slow - overloaded slaves?

Thursday, 8 December 2016

    [
https://ovirt-jira.atlassian.net/browse/OVIRT-919?page=com.atlassian.jira...
] 

Barak Korren commented on OVIRT-919:
------------------------------------

{quote}
  ... snip ...
17:19:49 1.105877 seconds

This test runs in 0.048 seconds on my laptop:
  ... snip ...
Ran 4 tests in 0.189s
{quote}

[~nsoffer(a)redhat.com] Lets make sure we're comparing apples and apples first, did you
run this using "{{mock_runner.sh}}" on your laptop? Perhaps try this in a VM on
your laptop?

{quote}
It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.
{quote}

Not sure where nested kvm comes into play here, "{{check_patch.sh}}" runs on
plain VMs. 
(As an OT side-note: Are you sure nested VMs should be slower CPU-wise? AFAIK those are
just VMs with nested I/O routing... it may be interesting to check this with Lago...)

WRT "overloaded slaves" - that can hardly happen, no slave ever runs more then
one job at at time.
The hypervisors themselves may be loaded, we'll try to gather some more data here:
[~ngoldin(a)redhat.com] Do we have hypervisor load graphs in Graphite?
[~ederevea] What is out vm/core ratio? Are we running oVirt DWH? maybe it can help us
here...

{quote}
We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.
{quote}

As far as machines go, our upstream hardware is quite beefy these days, so lets not jump
to any conclusions. More data is needed, what is this test actually doing? is this pure
CPU, or is there so I/O involved here as well?

[~rgolan(a)redhat.com] Are there any oVirt features that can help us here? Higher priority
VMs? Realtime? Better scheduling?

...
 Fwd: CI slaves extremely slow - overloaded slaves?
 --------------------------------------------------

                 Key: OVIRT-919
                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-919
             Project: oVirt - virtualization made easy
          Issue Type: By-EMAIL
          Components: Jenkins
            Reporter: Barak Korren
            Assignee: infra
              Labels: hypervisors, performance, slaves, standard-ci

 From: Nir Soffer <nsoffer(a)redhat.com&gt;
 Date: 7 December 2016 at 21:33
 Subject: CI slaves extremely slow - overloaded slaves?
 To: infra <infra(a)ovirt.org&gt;, Eyal Edri <eedri(a)redhat.com&gt;, Dan
 Kenigsberg <danken(a)redhat.com&gt;
 Hi all,
 In the last weeks we see more and more test failures due to timeouts in the CI.
 For example:
 17:19:49 ======================================================================
 17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
 17:19:49 ----------------------------------------------------------------------
 17:19:49 Traceback (most recent call last):
 17:19:49   File

"/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
 line 165, in test_scale
 17:19:49     self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
 % elapsed)
 17:19:49 AssertionError: Elapsed time: 1.105877 seconds
 17:19:49 -------------------- >> begin captured stdout <<
---------------------
 17:19:49 1.105877 seconds
 This test runs in 0.048 seconds on my laptop:
 $ ./run_tests_local.sh storage_filesd_test.py -s
 nose.config: INFO: Ignoring files matching ['^\\.', '^_',
'^setup\\.py$']
 storage_filesd_test.GetAllVolumesTests
     test_no_templates                                           OK
     test_no_volumes                                             OK
     test_scale                                                  0.047932 seconds
 OK
     test_with_template                                          OK
 ----------------------------------------------------------------------
 Ran 4 tests in 0.189s
 It seems that we are overloading the CI slaves. We should not use nested kvm
 for the CI, such vms are much slower then regular vms, and we probably run
 too many vms per cpu.
 We can disable such tests in the CI, but we do want to know when there is
 a regression in this code. Before it was fixed, the same test took 9 seconds
 on my laptop. We need fast machines in the CI for this.
 Nir 

--
This message was sent by Atlassian JIRA
(v1000.620.0#100023)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[JIRA] (OVIRT-919) Fwd: CI slaves extremely slow - overloaded slaves?