Top posting.

Martin, I believe you have several important points.
Since I was responsible for adding NUMA and the threading support in the engine (and yes, the vdsm conf is deprecated
for several versions now), I can tell you that what we had up until now was sufficient to our needs.

Since we have PPC and other technologies, we should give this a proper review. For example, with Cluster-on-die you
can have 2 NUMA nodes on a single socket (we're about to order it to our lab) and AMD allows similar concepts in
some of their processors these days. 

So what I suggest it that we'll review it together is order to provide a coherent representation
which works for both architectures. I'd like Roy Golan to work on this, as it has direct implications on NUMA, CPU pinning,
scheduling and other SLA related features. 

I'll arrange a meeting and once we have all the information we'll publish it in a wiki page.

Thanks for bringing this up,
Doron


On Fri, Dec 11, 2015 at 3:20 PM, Martin Polednik <mpolednik@redhat.com> wrote:
On 11/12/15 12:31 +0100, Michal Skrivanek wrote:

On 10 Dec 2015, at 16:36, Yaniv Kaul <ykaul@redhat.com> wrote:

On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik <mpolednik@redhat.com <mailto:mpolednik@redhat.com>> wrote:
Hello developers,

tl;dr version:
* deprecate report_host_threads_as_cores
* remove cpuSockets, use sum(numaNodes.keys())
* report threadsPerCore for ppc64le / report total number of threads
 for ppc64le
* work on our naming issues

I've been going over our capabilities reporting code in VDSM due to
specific threading requirements on ppc64le platform and noticed few
issues. Before trying to fix something that "works", I'm sending this
mail to start a discussion regarding current and future state of the
code.

First thing is the terminology. What we consider cpu sockets, cores and threads are in fact NUMA cells, sum of cores present in NUMA
nodes and the same for threads. I'd like to see the code moving in a
direction that is correct in this sense.

Note that I think users are more familiar with sockets-cores-threads than NUMA cells, terminology-wise.

we do report numa separately today, and we should keep doing that. I consider it another level of detail/complexity which many users do not care about.
So we should keep both

The issue is not removing one or the other, but rather that what we
report as CPU sockets/cores/threads are an actual NUMA
sockets/cores/threads. As long as 1 CPU == 1 NUMA cell we're fine, but
the POWER8 CPUs are 1 chip (socket) = 4 cores = 4 NUMA cells -
reporting 4 as the number of sockets per cpu.




More important are the actual calculations. I believe we should draw
an uncrossable line between cores and threads and not interfere with
it at least on VDSM's side. That would mean deprecating
report_host_threads_as_cores option. The actual algorithm used at
present does calculate the numa cores and numa threads correctly given
that there are no offline CPUs - most likely fine enough. We don't
have to report the actual number of sockets though, as it is reported
in numa* keys.

There is a reason for report_host_threads_as_cores option. I don't remember it right now, but it had to do with some limitation of some OS or license or something.
I don't think we should deprecate it.

the idea was to remove that option from VDSM conf (as it’s cumbersome to use), and rather report all relevant information so engine can decide later on whether to count it this way or another
Today it’s used as a simple “core multiplier” if your workload is running "good enough” in parallel on 2 threads within one core, we just consider it as additional available “cpu”. For some workloads where this assumption is not working well, and also for licensing or any other reason, you can disable it and see “half” of the cpus on x86 despite having HT enabled in BIOS.

On PPC this is more tricky as MArtin says below - (threads are not able to run multiple VMs simultaneously) - so we need to push that decision from vdsm up the chain.


It does fail to provide us with information that can be used in
ppc64le environment, where for POWER8 we want to run the host without
SMT while VMs would have multiple CPUs assigned. There are various
configurations of so-called subcores in POWER8, where each CPU core
can contain 1, 2 or 4 subcores. This configuration must be taken in
consideration as given e.g. 160 threads overall, it is possible to run
either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2
mode. We have to report either the total number of threads OR just the
threadsPerCore setting, so the users know how many "CPUs" should be
assigned to machines for optimal performance.

x per y sounds best to me
but I think it’s even more complicated, if we consider offline CPUs(we don’t do that today) then the default picture on POWER8 currently looks like 20 cores in 4 numa cells, 8 threads per core. SMT is disabled altogether, so CPUs 1-7,9-15,… are offline. So should we report them or not? On x86 I would not do that as they are administratively disabled and can’t be used, however on ppc since RHEL 7.2 they are dynamically enabled on demand (if the guest topology uses threads as well), so they should be reported as available (or "sort-of-available”:)

If we report threads per core, the offline CPUs can be calculated from
the available online CPUs and given threads per core value.

still, I think we should go with simple "sockets, cores/socket, threads/core” numbers,
the rest needs to be computed or chosen from, based on additional detailed report of NUMA topology and online/offline CPU status
perhaps with different behavior/capabilities on x86 and on power


YAY... do we have a comparison what libvirt knows / looks at (or they ignore it altogether?)
Y.


As always, I welcome any opinions regarding the proposed ideas. Also
note that all of the changes can be done via deprecation to be fully
backwards compatible - except for the ppc part.

Regards,
mpolednik
_______________________________________________
Devel mailing list
Devel@ovirt.org <mailto:Devel@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/devel <http://lists.ovirt.org/mailman/listinfo/devel>

_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel