[ovirt-devel] CPU sockets, threads, cores and NUMA

Martin Polednik mpolednik at redhat.com
Fri Dec 11 13:20:43 UTC 2015


On 11/12/15 12:31 +0100, Michal Skrivanek wrote:
>
>> On 10 Dec 2015, at 16:36, Yaniv Kaul <ykaul at redhat.com> wrote:
>>
>> On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik <mpolednik at redhat.com <mailto:mpolednik at redhat.com>> wrote:
>> Hello developers,
>>
>> tl;dr version:
>> * deprecate report_host_threads_as_cores
>> * remove cpuSockets, use sum(numaNodes.keys())
>> * report threadsPerCore for ppc64le / report total number of threads
>>  for ppc64le
>> * work on our naming issues
>>
>> I've been going over our capabilities reporting code in VDSM due to
>> specific threading requirements on ppc64le platform and noticed few
>> issues. Before trying to fix something that "works", I'm sending this
>> mail to start a discussion regarding current and future state of the
>> code.
>>
>> First thing is the terminology. What we consider cpu sockets, cores and threads are in fact NUMA cells, sum of cores present in NUMA
>> nodes and the same for threads. I'd like to see the code moving in a
>> direction that is correct in this sense.
>>
>> Note that I think users are more familiar with sockets-cores-threads than NUMA cells, terminology-wise.
>
>we do report numa separately today, and we should keep doing that. I consider it another level of detail/complexity which many users do not care about.
>So we should keep both

The issue is not removing one or the other, but rather that what we
report as CPU sockets/cores/threads are an actual NUMA
sockets/cores/threads. As long as 1 CPU == 1 NUMA cell we're fine, but
the POWER8 CPUs are 1 chip (socket) = 4 cores = 4 NUMA cells -
reporting 4 as the number of sockets per cpu.

>>
>>
>> More important are the actual calculations. I believe we should draw
>> an uncrossable line between cores and threads and not interfere with
>> it at least on VDSM's side. That would mean deprecating
>> report_host_threads_as_cores option. The actual algorithm used at
>> present does calculate the numa cores and numa threads correctly given
>> that there are no offline CPUs - most likely fine enough. We don't
>> have to report the actual number of sockets though, as it is reported
>> in numa* keys.
>>
>> There is a reason for report_host_threads_as_cores option. I don't remember it right now, but it had to do with some limitation of some OS or license or something.
>> I don't think we should deprecate it.
>
>the idea was to remove that option from VDSM conf (as it’s cumbersome to use), and rather report all relevant information so engine can decide later on whether to count it this way or another
>Today it’s used as a simple “core multiplier” if your workload is running "good enough” in parallel on 2 threads within one core, we just consider it as additional available “cpu”. For some workloads where this assumption is not working well, and also for licensing or any other reason, you can disable it and see “half” of the cpus on x86 despite having HT enabled in BIOS.
>
>On PPC this is more tricky as MArtin says below - (threads are not able to run multiple VMs simultaneously) - so we need to push that decision from vdsm up the chain.
>
>>
>> It does fail to provide us with information that can be used in
>> ppc64le environment, where for POWER8 we want to run the host without
>> SMT while VMs would have multiple CPUs assigned. There are various
>> configurations of so-called subcores in POWER8, where each CPU core
>> can contain 1, 2 or 4 subcores. This configuration must be taken in
>> consideration as given e.g. 160 threads overall, it is possible to run
>> either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2
>> mode. We have to report either the total number of threads OR just the
>> threadsPerCore setting, so the users know how many "CPUs" should be
>> assigned to machines for optimal performance.
>
>x per y sounds best to me
>but I think it’s even more complicated, if we consider offline CPUs(we don’t do that today) then the default picture on POWER8 currently looks like 20 cores in 4 numa cells, 8 threads per core. SMT is disabled altogether, so CPUs 1-7,9-15,… are offline. So should we report them or not? On x86 I would not do that as they are administratively disabled and can’t be used, however on ppc since RHEL 7.2 they are dynamically enabled on demand (if the guest topology uses threads as well), so they should be reported as available (or "sort-of-available”:)

If we report threads per core, the offline CPUs can be calculated from
the available online CPUs and given threads per core value.

>still, I think we should go with simple "sockets, cores/socket, threads/core” numbers,
>the rest needs to be computed or chosen from, based on additional detailed report of NUMA topology and online/offline CPU status
>perhaps with different behavior/capabilities on x86 and on power
>
>>
>> YAY... do we have a comparison what libvirt knows / looks at (or they ignore it altogether?)
>> Y.
>>
>>
>> As always, I welcome any opinions regarding the proposed ideas. Also
>> note that all of the changes can be done via deprecation to be fully
>> backwards compatible - except for the ppc part.
>>
>> Regards,
>> mpolednik
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>
>>
>> _______________________________________________
>> Devel mailing list
>> Devel at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
>



More information about the Devel mailing list