[ovirt-devel] [vdsm] strange network test failure on FC23

Sun Nov 29 16:34:32 UTC 2015

On Sun, Nov 29, 2015 at 6:01 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
> On Sun, Nov 29, 2015 at 5:37 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>
>> On Sun, Nov 29, 2015 at 10:37 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
>> >
>> > On Fri, Nov 27, 2015 at 6:55 PM, Francesco Romani <fromani at redhat.com>
>> > wrote:
>> >>
>> >> Using taskset, the ip command now takes a little longer to complete.
>> >
>> >
>> > Since we always use the same set of CPUs, I assume using a mask (for 0
&
>> > 1,
>> > just use 0x3, as the man suggests) might be a tiny of a fraction faster
>> > to
>> > execute taskset with, instead of the need to translate the numeric CPU
>> > list.
>>
>> Creating the string "0-<last cpu index>" is one line in vdsm. The code
>> handling this in
>> taskset is written in C, so the parsing time is practically zero. Even
>> if it was non-zero,
>> this code run once when we run a child process, so the cost is
>> insignificant.
>
>
> I think it's easier to just to have it as a mask in a config item
somewhere,
> without need to create it or parse it anywhere.
> For us and for the user.

We have this option in /etc/vdsm/vdsm.conf:

    # Comma separated whitelist of CPU cores on which VDSM is allowed to
    # run. The default is "", meaning VDSM can be scheduled by  the OS to
    # run on any core. Valid examples: "1", "0,1", "0,2,3"
    # cpu_affinity = 1

I think this is the easiest option for users.

>> > However, the real concern is making sure CPUs 0 & 1 are not really too
>> > busy
>> > with stuff (including interrupt handling, etc.)
>>
>> This code is used when we run a child process, to allow the child
>> process to run on
>> all cpus (in this case, cpu 0 and cpu 1). So I think there is no concern
>> here.
>>
>> Vdsm itself is running by default on cpu 1, which should be less busy
>> then cpu 0.
>
>
> I assume those are cores, which probably in a multi-socket will be in the
> first socket only.
> There's a good chance that the FC and or network/cards will also bind
their
> interrupts to core0 & core 1 (check /proc/interrupts) on the same socket.
> From my poor laptop (1s, 4c):
> 42:    1487104       9329       4042       3598  IR-PCI-MSI 512000-edge

> 0000:00:1f.2
>
> (my SATA controller)
>
> 43:   14664923         34         18         13  IR-PCI-MSI 327680-edge

> xhci_hcd
> (my dock station connector)
>
> 45:    6754579       4437       2501       2419  IR-PCI-MSI 32768-edge
> i915
> (GPU)
>
> 47:     187409      11627       1235       1259  IR-PCI-MSI 2097152-edge

> iwlwifi
> (NIC, wifi)

Interesting, here an example from a 8 cores machine running my vms:

[nsoffer at jumbo ~]$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
  CPU6       CPU7
  0:         31          0          0          0          0          0
     0          0  IR-IO-APIC-edge      timer
  1:          2          0          0          1          0          0
     0          0  IR-IO-APIC-edge      i8042
  8:          0          0          0          0          0          0
     0          1  IR-IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0
     0          0  IR-IO-APIC-fasteoi   acpi
 12:          3          0          0          0          0          0
     1          0  IR-IO-APIC-edge      i8042
 16:          4          4          9          0          9          1
     1          3  IR-IO-APIC  16-fasteoi   ehci_hcd:usb3
 23:         13          1          5          0         12          1
     1          0  IR-IO-APIC  23-fasteoi   ehci_hcd:usb4
 24:          0          0          0          0          0          0
     0          0  DMAR_MSI-edge      dmar0
 25:          0          0          0          0          0          0
     0          0  DMAR_MSI-edge      dmar1
 26:       3670        354        215    9062370        491        124
   169         54  IR-PCI-MSI-edge      0000:00:1f.2
 27:          0          0          0          0          0          0
     0          0  IR-PCI-MSI-edge      xhci_hcd
 28:  166285414          0          3          0          4          0
     0          0  IR-PCI-MSI-edge      em1
 29:         18          0          0          0          4          3
     0          0  IR-PCI-MSI-edge      mei_me
 30:          1        151         17          0          3        169
    26         94  IR-PCI-MSI-edge      snd_hda_intel
NMI:       2508       2296       2317       2356        867        918
   912        903   Non-maskable interrupts
LOC:  302996116  312923350  312295375  312089303   86282447   94046427
90847792   91761277   Local timer interrupts
SPU:          0          0          0          0          0          0
     0          0   Spurious interrupts
PMI:       2508       2296       2317       2356        867        918
   912        903   Performance monitoring interrupts
IWI:          1          0          0          5          0          0
     0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0
     0          0   APIC ICR read retries
RES:   34480637   12953645   13139863   14309885    8881861   10110753
 9709070    9703933   Rescheduling interrupts
CAL:    7387779    7682087    7283716    7135792    2771105    1785528
 1887493    1843734   Function call interrupts
TLB:      11121      16458      17923      15216       8534       8173
  8639       7837   TLB shootdowns
TRM:          0          0          0          0          0          0
     0          0   Thermal event interrupts
THR:          0          0          0          0          0          0
     0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0
     0          0   Machine check exceptions
MCP:       7789       7789       7789       7789       7789       7789
  7789       7789   Machine check polls
HYP:          0          0          0          0          0          0
     0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0

It seems that our default (CPU1) is fine.

Francesco, what do you think?

Nir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20151129/5c231569/attachment-0001.html>