[ovirt-devel] [vdsm] strange network test failure on FC23
Nir Soffer
nsoffer at redhat.com
Sun Nov 29 16:34:32 UTC 2015
On Sun, Nov 29, 2015 at 6:01 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
> On Sun, Nov 29, 2015 at 5:37 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>
>> On Sun, Nov 29, 2015 at 10:37 AM, Yaniv Kaul <ykaul at redhat.com> wrote:
>> >
>> > On Fri, Nov 27, 2015 at 6:55 PM, Francesco Romani <fromani at redhat.com>
>> > wrote:
>> >>
>> >> Using taskset, the ip command now takes a little longer to complete.
>> >
>> >
>> > Since we always use the same set of CPUs, I assume using a mask (for 0
&
>> > 1,
>> > just use 0x3, as the man suggests) might be a tiny of a fraction faster
>> > to
>> > execute taskset with, instead of the need to translate the numeric CPU
>> > list.
>>
>> Creating the string "0-<last cpu index>" is one line in vdsm. The code
>> handling this in
>> taskset is written in C, so the parsing time is practically zero. Even
>> if it was non-zero,
>> this code run once when we run a child process, so the cost is
>> insignificant.
>
>
> I think it's easier to just to have it as a mask in a config item
somewhere,
> without need to create it or parse it anywhere.
> For us and for the user.
We have this option in /etc/vdsm/vdsm.conf:
# Comma separated whitelist of CPU cores on which VDSM is allowed to
# run. The default is "", meaning VDSM can be scheduled by the OS to
# run on any core. Valid examples: "1", "0,1", "0,2,3"
# cpu_affinity = 1
I think this is the easiest option for users.
>> > However, the real concern is making sure CPUs 0 & 1 are not really too
>> > busy
>> > with stuff (including interrupt handling, etc.)
>>
>> This code is used when we run a child process, to allow the child
>> process to run on
>> all cpus (in this case, cpu 0 and cpu 1). So I think there is no concern
>> here.
>>
>> Vdsm itself is running by default on cpu 1, which should be less busy
>> then cpu 0.
>
>
> I assume those are cores, which probably in a multi-socket will be in the
> first socket only.
> There's a good chance that the FC and or network/cards will also bind
their
> interrupts to core0 & core 1 (check /proc/interrupts) on the same socket.
> From my poor laptop (1s, 4c):
> 42: 1487104 9329 4042 3598 IR-PCI-MSI 512000-edge
> 0000:00:1f.2
>
> (my SATA controller)
>
> 43: 14664923 34 18 13 IR-PCI-MSI 327680-edge
> xhci_hcd
> (my dock station connector)
>
> 45: 6754579 4437 2501 2419 IR-PCI-MSI 32768-edge
> i915
> (GPU)
>
> 47: 187409 11627 1235 1259 IR-PCI-MSI 2097152-edge
> iwlwifi
> (NIC, wifi)
Interesting, here an example from a 8 cores machine running my vms:
[nsoffer at jumbo ~]$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
CPU6 CPU7
0: 31 0 0 0 0 0
0 0 IR-IO-APIC-edge timer
1: 2 0 0 1 0 0
0 0 IR-IO-APIC-edge i8042
8: 0 0 0 0 0 0
0 1 IR-IO-APIC-edge rtc0
9: 0 0 0 0 0 0
0 0 IR-IO-APIC-fasteoi acpi
12: 3 0 0 0 0 0
1 0 IR-IO-APIC-edge i8042
16: 4 4 9 0 9 1
1 3 IR-IO-APIC 16-fasteoi ehci_hcd:usb3
23: 13 1 5 0 12 1
1 0 IR-IO-APIC 23-fasteoi ehci_hcd:usb4
24: 0 0 0 0 0 0
0 0 DMAR_MSI-edge dmar0
25: 0 0 0 0 0 0
0 0 DMAR_MSI-edge dmar1
26: 3670 354 215 9062370 491 124
169 54 IR-PCI-MSI-edge 0000:00:1f.2
27: 0 0 0 0 0 0
0 0 IR-PCI-MSI-edge xhci_hcd
28: 166285414 0 3 0 4 0
0 0 IR-PCI-MSI-edge em1
29: 18 0 0 0 4 3
0 0 IR-PCI-MSI-edge mei_me
30: 1 151 17 0 3 169
26 94 IR-PCI-MSI-edge snd_hda_intel
NMI: 2508 2296 2317 2356 867 918
912 903 Non-maskable interrupts
LOC: 302996116 312923350 312295375 312089303 86282447 94046427
90847792 91761277 Local timer interrupts
SPU: 0 0 0 0 0 0
0 0 Spurious interrupts
PMI: 2508 2296 2317 2356 867 918
912 903 Performance monitoring interrupts
IWI: 1 0 0 5 0 0
0 0 IRQ work interrupts
RTR: 0 0 0 0 0 0
0 0 APIC ICR read retries
RES: 34480637 12953645 13139863 14309885 8881861 10110753
9709070 9703933 Rescheduling interrupts
CAL: 7387779 7682087 7283716 7135792 2771105 1785528
1887493 1843734 Function call interrupts
TLB: 11121 16458 17923 15216 8534 8173
8639 7837 TLB shootdowns
TRM: 0 0 0 0 0 0
0 0 Thermal event interrupts
THR: 0 0 0 0 0 0
0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0
0 0 Machine check exceptions
MCP: 7789 7789 7789 7789 7789 7789
7789 7789 Machine check polls
HYP: 0 0 0 0 0 0
0 0 Hypervisor callback interrupts
ERR: 0
MIS: 0
It seems that our default (CPU1) is fine.
Francesco, what do you think?
Nir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/infra/attachments/20151129/5c231569/attachment.html>
More information about the Infra
mailing list