Re: [ovirt-devel] [vdsm] strange network test failure on FC23

30 Nov 2015


      ----- Original Message -----
...
From: "Michal Skrivanek" <mskrivan@redhat.com>
To: "Nir Soffer" <nsoffer@redhat.com>, "Francesco Romani" <fromani@redhat.com>
Cc: "Yaniv Kaul" <ykaul@redhat.com>, "infra" <infra@ovirt.org>, "devel" <devel@ovirt.org>
Sent: Monday, November 30, 2015 9:52:59 AM
Subject: Re: [ovirt-devel] [vdsm] strange network test failure on FC23
...
On 29 Nov 2015, at 17:34, Nir Soffer <nsoffer@redhat.com> wrote:
On Sun, Nov 29, 2015 at 6:01 PM, Yaniv Kaul <ykaul@redhat.com> wrote:
...
On Sun, Nov 29, 2015 at 5:37 PM, Nir Soffer <nsoffer@redhat.com> wrote:
...
On Sun, Nov 29, 2015 at 10:37 AM, Yaniv Kaul <ykaul@redhat.com> wrote:
...
On Fri, Nov 27, 2015 at 6:55 PM, Francesco Romani <fromani@redhat.com>
wrote:
...
Using taskset, the ip command now takes a little longer to complete.
I fail to find the original reference for this.
Why does it take longer? is it purely the additional taskset executable
invocation? On busy system we do have these issues all the time, with lvm,
etc…so I don’t think it’s significant
Yep, that's only the overhead of taskset executable.
...
...
...
...
...
Since we always use the same set of CPUs, I assume using a mask (for 0
&
1,
just use 0x3, as the man suggests) might be a tiny of a fraction
faster
to
execute taskset with, instead of the need to translate the numeric CPU
list.
Creating the string "0-<last cpu index>" is one line in vdsm. The code
handling this in
taskset is written in C, so the parsing time is practically zero. Even
if it was non-zero,
this code run once when we run a child process, so the cost is
insignificant.
I think it's easier to just to have it as a mask in a config item
somewhere,
without need to create it or parse it anywhere.
For us and for the user.
We have this option in /etc/vdsm/vdsm.conf:
# Comma separated whitelist of CPU cores on which VDSM is allowed to
    # run. The default is "", meaning VDSM can be scheduled by  the OS to
    # run on any core. Valid examples: "1", "0,1", "0,2,3"
    # cpu_affinity = 1
I think this is the easiest option for users.
+1
+1, modulo the changes we need to fix https://bugzilla.redhat.com/show_bug.cgi?id=1286462
(patch is coming)
...
...
...
I assume those are cores, which probably in a multi-socket will be in the
first socket only.
There's a good chance that the FC and or network/cards will also bind
their
interrupts to core0 & core 1 (check /proc/interrupts) on the same socket.
From my poor laptop (1s, 4c):
Yes, especially core0 (since 0 is nice defaults). This was the rationale behind
the choice of cpu #1 in the first place.
...
...
It seems that our default (CPU1) is fine.
I think it’s safe enough.
Numbers above (and I checked the same on ppc with similar pattern) are for a
reasonablt epty system. We can get a different picture when vdsm is busy. In
general I think it’s indeed best to use the second online CPU for vdsm and
all CPUs for child processes
Agreed - except for cases like bz1286462 - but let's discuss this on gerrit/bz
...
regarding exposing to users in UI - I think that’s way too low level.
vdsm.conf is good enough
Agreed. This is one thing that "just works".

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani