
[ https://ovirt-jira.atlassian.net/browse/OVIRT-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=24101#comment-24101 ] Barak Korren commented on OVIRT-919: ------------------------------------ {quote} ... snip ... 17:19:49 1.105877 seconds This test runs in 0.048 seconds on my laptop: ... snip ... Ran 4 tests in 0.189s {quote} [~nsoffer@redhat.com] Lets make sure we're comparing apples and apples first, did you run this using "{{mock_runner.sh}}" on your laptop? Perhaps try this in a VM on your laptop? {quote} It seems that we are overloading the CI slaves. We should not use nested kvm for the CI, such vms are much slower then regular vms, and we probably run too many vms per cpu. {quote} Not sure where nested kvm comes into play here, "{{check_patch.sh}}" runs on plain VMs. (As an OT side-note: Are you sure nested VMs should be slower CPU-wise? AFAIK those are just VMs with nested I/O routing... it may be interesting to check this with Lago...) WRT "overloaded slaves" - that can hardly happen, no slave ever runs more then one job at at time. The hypervisors themselves may be loaded, we'll try to gather some more data here: [~ngoldin@redhat.com] Do we have hypervisor load graphs in Graphite? [~ederevea] What is out vm/core ratio? Are we running oVirt DWH? maybe it can help us here... {quote} We can disable such tests in the CI, but we do want to know when there is a regression in this code. Before it was fixed, the same test took 9 seconds on my laptop. We need fast machines in the CI for this. {quote} As far as machines go, our upstream hardware is quite beefy these days, so lets not jump to any conclusions. More data is needed, what is this test actually doing? is this pure CPU, or is there so I/O involved here as well? [~rgolan@redhat.com] Are there any oVirt features that can help us here? Higher priority VMs? Realtime? Better scheduling?
Fwd: CI slaves extremely slow - overloaded slaves? --------------------------------------------------
Key: OVIRT-919 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-919 Project: oVirt - virtualization made easy Issue Type: By-EMAIL Components: Jenkins Reporter: Barak Korren Assignee: infra Labels: hypervisors, performance, slaves, standard-ci
From: Nir Soffer <nsoffer@redhat.com> Date: 7 December 2016 at 21:33 Subject: CI slaves extremely slow - overloaded slaves? To: infra <infra@ovirt.org>, Eyal Edri <eedri@redhat.com>, Dan Kenigsberg <danken@redhat.com> Hi all, In the last weeks we see more and more test failures due to timeouts in the CI. For example: 17:19:49 ====================================================================== 17:19:49 FAIL: test_scale (storage_filesd_test.GetAllVolumesTests) 17:19:49 ---------------------------------------------------------------------- 17:19:49 Traceback (most recent call last): 17:19:49 File "/home/jenkins/workspace/vdsm_master_check-patch-fc24-x86_64/vdsm/tests/storage_filesd_test.py", line 165, in test_scale 17:19:49 self.assertTrue(elapsed < 1.0, "Elapsed time: %f seconds" % elapsed) 17:19:49 AssertionError: Elapsed time: 1.105877 seconds 17:19:49 -------------------- >> begin captured stdout << --------------------- 17:19:49 1.105877 seconds This test runs in 0.048 seconds on my laptop: $ ./run_tests_local.sh storage_filesd_test.py -s nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] storage_filesd_test.GetAllVolumesTests test_no_templates OK test_no_volumes OK test_scale 0.047932 seconds OK test_with_template OK ---------------------------------------------------------------------- Ran 4 tests in 0.189s It seems that we are overloading the CI slaves. We should not use nested kvm for the CI, such vms are much slower then regular vms, and we probably run too many vms per cpu. We can disable such tests in the CI, but we do want to know when there is a regression in this code. Before it was fixed, the same test took 9 seconds on my laptop. We need fast machines in the CI for this. Nir
-- This message was sent by Atlassian JIRA (v1000.620.0#100023)