On 29 Nov 2015, at 17:34, Nir Soffer <nsoffer(a)redhat.com>
wrote:
On Sun, Nov 29, 2015 at 6:01 PM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
> On Sun, Nov 29, 2015 at 5:37 PM, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> On Sun, Nov 29, 2015 at 10:37 AM, Yaniv Kaul <ykaul(a)redhat.com> wrote:
>> >
>> > On Fri, Nov 27, 2015 at 6:55 PM, Francesco Romani
<fromani(a)redhat.com>
>> > wrote:
>> >>
>> >> Using taskset, the ip command now takes a little longer to complete.
I fail to find the original reference for this.
Why does it take longer? is it purely the additional taskset executable invocation? On
busy system we do have these issues all the time, with lvm, etc…so I don’t think it’s
significant
>> >
>> >
>> > Since we always use the same set of CPUs, I assume using a mask (for 0
&
>> > 1,
>> > just use 0x3, as the man suggests) might be a tiny of a fraction faster
>> > to
>> > execute taskset with, instead of the need to translate the numeric CPU
>> > list.
>>
>> Creating the string "0-<last cpu index>" is one line in vdsm.
The code
>> handling this in
>> taskset is written in C, so the parsing time is practically zero. Even
>> if it was non-zero,
>> this code run once when we run a child process, so the cost is
>> insignificant.
>
>
> I think it's easier to just to have it as a mask in a config item somewhere,
> without need to create it or parse it anywhere.
> For us and for the user.
We have this option in /etc/vdsm/vdsm.conf:
# Comma separated whitelist of CPU cores on which VDSM is allowed to
# run. The default is "", meaning VDSM can be scheduled by the OS to
# run on any core. Valid examples: "1", "0,1", "0,2,3"
# cpu_affinity = 1
I think this is the easiest option for users.
+1
>> > However, the real concern is making sure CPUs 0 & 1 are not really too
>> > busy
>> > with stuff (including interrupt handling, etc.)
>>
>> This code is used when we run a child process, to allow the child
>> process to run on
>> all cpus (in this case, cpu 0 and cpu 1). So I think there is no concern
>> here.
>>
>> Vdsm itself is running by default on cpu 1, which should be less busy
>> then cpu 0.
>
>
> I assume those are cores, which probably in a multi-socket will be in the
> first socket only.
> There's a good chance that the FC and or network/cards will also bind their
> interrupts to core0 & core 1 (check /proc/interrupts) on the same socket.
> From my poor laptop (1s, 4c):
> 42: 1487104 9329 4042 3598 IR-PCI-MSI 512000-edge
> 0000:00:1f.2
>
> (my SATA controller)
>
> 43: 14664923 34 18 13 IR-PCI-MSI 327680-edge
> xhci_hcd
> (my dock station connector)
>
> 45: 6754579 4437 2501 2419 IR-PCI-MSI 32768-edge
> i915
> (GPU)
>
> 47: 187409 11627 1235 1259 IR-PCI-MSI 2097152-edge
> iwlwifi
> (NIC, wifi)
Interesting, here an example from a 8 cores machine running my vms:
[nsoffer@jumbo ~]$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6
CPU7
0: 31 0 0 0 0 0 0
0 IR-IO-APIC-edge timer
1: 2 0 0 1 0 0 0
0 IR-IO-APIC-edge i8042
8: 0 0 0 0 0 0 0
1 IR-IO-APIC-edge rtc0
9: 0 0 0 0 0 0 0
0 IR-IO-APIC-fasteoi acpi
12: 3 0 0 0 0 0 1
0 IR-IO-APIC-edge i8042
16: 4 4 9 0 9 1 1
3 IR-IO-APIC 16-fasteoi ehci_hcd:usb3
23: 13 1 5 0 12 1 1
0 IR-IO-APIC 23-fasteoi ehci_hcd:usb4
24: 0 0 0 0 0 0 0
0 DMAR_MSI-edge dmar0
25: 0 0 0 0 0 0 0
0 DMAR_MSI-edge dmar1
26: 3670 354 215 9062370 491 124 169
54 IR-PCI-MSI-edge 0000:00:1f.2
27: 0 0 0 0 0 0 0
0 IR-PCI-MSI-edge xhci_hcd
28: 166285414 0 3 0 4 0 0
0 IR-PCI-MSI-edge em1
29: 18 0 0 0 4 3 0
0 IR-PCI-MSI-edge mei_me
30: 1 151 17 0 3 169 26
94 IR-PCI-MSI-edge snd_hda_intel
NMI: 2508 2296 2317 2356 867 918 912
903 Non-maskable interrupts
LOC: 302996116 312923350 312295375 312089303 86282447 94046427 90847792
91761277 Local timer interrupts
SPU: 0 0 0 0 0 0 0
0 Spurious interrupts
PMI: 2508 2296 2317 2356 867 918 912
903 Performance monitoring interrupts
IWI: 1 0 0 5 0 0 0
0 IRQ work interrupts
RTR: 0 0 0 0 0 0 0
0 APIC ICR read retries
RES: 34480637 12953645 13139863 14309885 8881861 10110753 9709070
9703933 Rescheduling interrupts
CAL: 7387779 7682087 7283716 7135792 2771105 1785528 1887493
1843734 Function call interrupts
TLB: 11121 16458 17923 15216 8534 8173 8639
7837 TLB shootdowns
TRM: 0 0 0 0 0 0 0
0 Thermal event interrupts
THR: 0 0 0 0 0 0 0
0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0
0 Machine check exceptions
MCP: 7789 7789 7789 7789 7789 7789 7789
7789 Machine check polls
HYP: 0 0 0 0 0 0 0
0 Hypervisor callback interrupts
ERR: 0
MIS: 0
It seems that our default (CPU1) is fine.
I think it’s safe enough.
Numbers above (and I checked the same on ppc with similar pattern) are for a reasonablt
epty system. We can get a different picture when vdsm is busy. In general I think it’s
indeed best to use the second online CPU for vdsm and all CPUs for child processes
regarding exposing to users in UI - I think that’s way too low level. vdsm.conf is good
enough
Thanks,
michal
Francesco, what do you think?
Nir