[ovirt-devel] [vdsm] strange network test failure on FC23

Mon Nov 30 09:19:24 UTC 2015

----- Original Message -----
> From: "Michal Skrivanek" <mskrivan at redhat.com>
> To: "Nir Soffer" <nsoffer at redhat.com>, "Francesco Romani" <fromani at redhat.com>
> Cc: "Yaniv Kaul" <ykaul at redhat.com>, "infra" <infra at ovirt.org>, "devel" <devel at ovirt.org>
> Sent: Monday, November 30, 2015 9:52:59 AM
> Subject: Re: [ovirt-devel] [vdsm] strange network test failure on FC23
> 
> 
> > On 29 Nov 2015, at 17:34, Nir Soffer <nsoffer at redhat.com> wrote:
> > 
> > On Sun, Nov 29, 2015 at 6:01 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
> > > On Sun, Nov 29, 2015 at 5:37 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> > >>
> > >> On Sun, Nov 29, 2015 at 10:37 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
> > >> >
> > >> > On Fri, Nov 27, 2015 at 6:55 PM, Francesco Romani <fromani at redhat.com>
> > >> > wrote:
> > >> >>
> > >> >> Using taskset, the ip command now takes a little longer to complete.
> 
> I fail to find the original reference for this.
> Why does it take longer? is it purely the additional taskset executable
> invocation? On busy system we do have these issues all the time, with lvm,
> etc…so I don’t think it’s significant

Yep, that's only the overhead of taskset executable.

> > >> > Since we always use the same set of CPUs, I assume using a mask (for 0
> > >> > &
> > >> > 1,
> > >> > just use 0x3, as the man suggests) might be a tiny of a fraction
> > >> > faster
> > >> > to
> > >> > execute taskset with, instead of the need to translate the numeric CPU
> > >> > list.
> > >>
> > >> Creating the string "0-<last cpu index>" is one line in vdsm. The code
> > >> handling this in
> > >> taskset is written in C, so the parsing time is practically zero. Even
> > >> if it was non-zero,
> > >> this code run once when we run a child process, so the cost is
> > >> insignificant.
> > >
> > >
> > > I think it's easier to just to have it as a mask in a config item
> > > somewhere,
> > > without need to create it or parse it anywhere.
> > > For us and for the user.
> > 
> > We have this option in /etc/vdsm/vdsm.conf:
> > 
> >     # Comma separated whitelist of CPU cores on which VDSM is allowed to
> >     # run. The default is "", meaning VDSM can be scheduled by  the OS to
> >     # run on any core. Valid examples: "1", "0,1", "0,2,3"
> >     # cpu_affinity = 1
> > 
> > I think this is the easiest option for users.
> 
> +1

+1, modulo the changes we need to fix https://bugzilla.redhat.com/show_bug.cgi?id=1286462
(patch is coming)

> > > I assume those are cores, which probably in a multi-socket will be in the
> > > first socket only.
> > > There's a good chance that the FC and or network/cards will also bind
> > > their
> > > interrupts to core0 & core 1 (check /proc/interrupts) on the same socket.
> > > From my poor laptop (1s, 4c):

Yes, especially core0 (since 0 is nice defaults). This was the rationale behind
the choice of cpu #1 in the first place.

> > It seems that our default (CPU1) is fine.
> 
> I think it’s safe enough.
> Numbers above (and I checked the same on ppc with similar pattern) are for a
> reasonablt epty system. We can get a different picture when vdsm is busy. In
> general I think it’s indeed best to use the second online CPU for vdsm and
> all CPUs for child processes

Agreed - except for cases like bz1286462 - but let's discuss this on gerrit/bz

> regarding exposing to users in UI - I think that’s way too low level.
> vdsm.conf is good enough

Agreed. This is one thing that "just works".

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani