[JIRA] (OVIRT-919) Fwd: CI slaves extremely slow - overloaded slaves?

Barak Korren (oVirt JIRA) jira at ovirt-jira.atlassian.net
Thu Dec 8 07:25:02 UTC 2016


    [ https://ovirt-jira.atlassian.net/browse/OVIRT-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=24101#comment-24101 ] 

Barak Korren commented on OVIRT-919:
------------------------------------

{quote}
  ... snip ...
17:19:49 1.105877 seconds

This test runs in 0.048 seconds on my laptop:
  ... snip ...
Ran 4 tests in 0.189s
{quote}

[~nsoffer at redhat.com] Lets make sure we're comparing apples and apples first, did you run this using "{{mock_runner.sh}}" on your laptop? Perhaps try this in a VM on your laptop?

{quote}
It seems that we are overloading the CI slaves. We should not use nested kvm
for the CI, such vms are much slower then regular vms, and we probably run
too many vms per cpu.
{quote}

Not sure where nested kvm comes into play here, "{{check_patch.sh}}" runs on plain VMs. 
(As an OT side-note: Are you sure nested VMs should be slower CPU-wise? AFAIK those are just VMs with nested I/O routing... it may be interesting to check this with Lago...)

WRT "overloaded slaves" - that can hardly happen, no slave ever runs more then one job at at time.
The hypervisors themselves may be loaded, we'll try to gather some more data here:
[~ngoldin at redhat.com] Do we have hypervisor load graphs in Graphite?
[~ederevea] What is out vm/core ratio? Are we running oVirt DWH? maybe it can help us here...

{quote}
We can disable such tests in the CI, but we do want to know when there is
a regression in this code. Before it was fixed, the same test took 9 seconds
on my laptop. We need fast machines in the CI for this.
{quote}

As far as machines go, our upstream hardware is quite beefy these days, so lets not jump to any conclusions. More data is needed, what is this test actually doing? is this pure CPU, or is there so I/O involved here as well?

[~rgolan at redhat.com] Are there any oVirt features that can help us here? Higher priority VMs? Realtime? Better scheduling?

> Fwd: CI slaves extremely slow - overloaded slaves?
> --------------------------------------------------
>
>                 Key: OVIRT-919
>                 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-919
>             Project: oVirt - virtualization made easy
>          Issue Type: By-EMAIL
>          Components: Jenkins
>            Reporter: Barak Korren
>            Assignee: infra
>              Labels: hypervisors, performance, slaves, standard-ci
>
> From: Nir Soffer <nsoffer at redhat.com>
> Date: 7 December 2016 at 21:33
> Subject: CI slaves extremely slow - overloaded slaves?
> To: infra <infra at ovirt.org>, Eyal Edri <eedri at redhat.com>, Dan
> Kenigsberg <danken at redhat.com>
> Hi all,
> In the last weeks we see more and more test failures due to timeouts in the CI.
> For example:
> 17:19:49 ======================================================================
> 17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests)
> 17:19:49 ----------------------------------------------------------------------
> 17:19:49 Traceback (most recent call last):
> 17:19:49   File
> "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py",
> line 165, in test_scale
> 17:19:49     self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds"
> % elapsed)
> 17:19:49 AssertionError: Elapsed time: 1.105877 seconds
> 17:19:49 -------------------- >> begin captured stdout << ---------------------
> 17:19:49 1.105877 seconds
> This test runs in 0.048 seconds on my laptop:
> $ ./run_tests_local.sh storage_filesd_test.py -s
> nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
> storage_filesd_test.GetAllVolumesTests
>     test_no_templates                                           OK
>     test_no_volumes                                             OK
>     test_scale                                                  0.047932 seconds
> OK
>     test_with_template                                          OK
> ----------------------------------------------------------------------
> Ran 4 tests in 0.189s
> It seems that we are overloading the CI slaves. We should not use nested kvm
> for the CI, such vms are much slower then regular vms, and we probably run
> too many vms per cpu.
> We can disable such tests in the CI, but we do want to know when there is
> a regression in this code. Before it was fixed, the same test took 9 seconds
> on my laptop. We need fast machines in the CI for this.
> Nir



--
This message was sent by Atlassian JIRA
(v1000.620.0#100023)



More information about the Infra mailing list