[ovirt-devel] CPU sockets, threads, cores and NUMA
Doron Fediuck
dfediuck at redhat.com
Wed Dec 30 14:30:07 UTC 2015
Top posting.
Martin, I believe you have several important points.
Since I was responsible for adding NUMA and the threading support in the
engine (and yes, the vdsm conf is deprecated
for several versions now), I can tell you that what we had up until now was
sufficient to our needs.
Since we have PPC and other technologies, we should give this a proper
review. For example, with Cluster-on-die you
can have 2 NUMA nodes on a single socket (we're about to order it to our
lab) and AMD allows similar concepts in
some of their processors these days.
So what I suggest it that we'll review it together is order to provide a
coherent representation
which works for both architectures. I'd like Roy Golan to work on this, as
it has direct implications on NUMA, CPU pinning,
scheduling and other SLA related features.
I'll arrange a meeting and once we have all the information we'll publish
it in a wiki page.
Thanks for bringing this up,
Doron
On Fri, Dec 11, 2015 at 3:20 PM, Martin Polednik <mpolednik at redhat.com>
wrote:
> On 11/12/15 12:31 +0100, Michal Skrivanek wrote:
>
>>
>> On 10 Dec 2015, at 16:36, Yaniv Kaul <ykaul at redhat.com> wrote:
>>>
>>> On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik <mpolednik at redhat.com
>>> <mailto:mpolednik at redhat.com>> wrote:
>>> Hello developers,
>>>
>>> tl;dr version:
>>> * deprecate report_host_threads_as_cores
>>> * remove cpuSockets, use sum(numaNodes.keys())
>>> * report threadsPerCore for ppc64le / report total number of threads
>>> for ppc64le
>>> * work on our naming issues
>>>
>>> I've been going over our capabilities reporting code in VDSM due to
>>> specific threading requirements on ppc64le platform and noticed few
>>> issues. Before trying to fix something that "works", I'm sending this
>>> mail to start a discussion regarding current and future state of the
>>> code.
>>>
>>> First thing is the terminology. What we consider cpu sockets, cores and
>>> threads are in fact NUMA cells, sum of cores present in NUMA
>>> nodes and the same for threads. I'd like to see the code moving in a
>>> direction that is correct in this sense.
>>>
>>> Note that I think users are more familiar with sockets-cores-threads
>>> than NUMA cells, terminology-wise.
>>>
>>
>> we do report numa separately today, and we should keep doing that. I
>> consider it another level of detail/complexity which many users do not care
>> about.
>> So we should keep both
>>
>
> The issue is not removing one or the other, but rather that what we
> report as CPU sockets/cores/threads are an actual NUMA
> sockets/cores/threads. As long as 1 CPU == 1 NUMA cell we're fine, but
> the POWER8 CPUs are 1 chip (socket) = 4 cores = 4 NUMA cells -
> reporting 4 as the number of sockets per cpu.
>
>
>
>>>
>>> More important are the actual calculations. I believe we should draw
>>> an uncrossable line between cores and threads and not interfere with
>>> it at least on VDSM's side. That would mean deprecating
>>> report_host_threads_as_cores option. The actual algorithm used at
>>> present does calculate the numa cores and numa threads correctly given
>>> that there are no offline CPUs - most likely fine enough. We don't
>>> have to report the actual number of sockets though, as it is reported
>>> in numa* keys.
>>>
>>> There is a reason for report_host_threads_as_cores option. I don't
>>> remember it right now, but it had to do with some limitation of some OS or
>>> license or something.
>>> I don't think we should deprecate it.
>>>
>>
>> the idea was to remove that option from VDSM conf (as it’s cumbersome to
>> use), and rather report all relevant information so engine can decide later
>> on whether to count it this way or another
>> Today it’s used as a simple “core multiplier” if your workload is running
>> "good enough” in parallel on 2 threads within one core, we just consider it
>> as additional available “cpu”. For some workloads where this assumption is
>> not working well, and also for licensing or any other reason, you can
>> disable it and see “half” of the cpus on x86 despite having HT enabled in
>> BIOS.
>>
>> On PPC this is more tricky as MArtin says below - (threads are not able
>> to run multiple VMs simultaneously) - so we need to push that decision from
>> vdsm up the chain.
>>
>>
>>> It does fail to provide us with information that can be used in
>>> ppc64le environment, where for POWER8 we want to run the host without
>>> SMT while VMs would have multiple CPUs assigned. There are various
>>> configurations of so-called subcores in POWER8, where each CPU core
>>> can contain 1, 2 or 4 subcores. This configuration must be taken in
>>> consideration as given e.g. 160 threads overall, it is possible to run
>>> either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2
>>> mode. We have to report either the total number of threads OR just the
>>> threadsPerCore setting, so the users know how many "CPUs" should be
>>> assigned to machines for optimal performance.
>>>
>>
>> x per y sounds best to me
>> but I think it’s even more complicated, if we consider offline CPUs(we
>> don’t do that today) then the default picture on POWER8 currently looks
>> like 20 cores in 4 numa cells, 8 threads per core. SMT is disabled
>> altogether, so CPUs 1-7,9-15,… are offline. So should we report them or
>> not? On x86 I would not do that as they are administratively disabled and
>> can’t be used, however on ppc since RHEL 7.2 they are dynamically enabled
>> on demand (if the guest topology uses threads as well), so they should be
>> reported as available (or "sort-of-available”:)
>>
>
> If we report threads per core, the offline CPUs can be calculated from
> the available online CPUs and given threads per core value.
>
> still, I think we should go with simple "sockets, cores/socket,
>> threads/core” numbers,
>> the rest needs to be computed or chosen from, based on additional
>> detailed report of NUMA topology and online/offline CPU status
>> perhaps with different behavior/capabilities on x86 and on power
>>
>>
>>> YAY... do we have a comparison what libvirt knows / looks at (or they
>>> ignore it altogether?)
>>> Y.
>>>
>>>
>>> As always, I welcome any opinions regarding the proposed ideas. Also
>>> note that all of the changes can be done via deprecation to be fully
>>> backwards compatible - except for the ppc part.
>>>
>>> Regards,
>>> mpolednik
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org <mailto:Devel at ovirt.org>
>>> http://lists.ovirt.org/mailman/listinfo/devel <
>>> http://lists.ovirt.org/mailman/listinfo/devel>
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>>
>>
>> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20151230/0df47b5c/attachment-0001.html>
More information about the Devel
mailing list