<div dir="ltr">Top posting.<div><br></div><div>Martin, I believe you have several important points.</div><div>Since I was responsible for adding NUMA and the threading support in the engine (and yes, the vdsm conf is deprecated</div><div>for several versions now), I can tell you that what we had up until now was sufficient to our needs.</div><div><br></div><div>Since we have PPC and other technologies, we should give this a proper review. For example, with Cluster-on-die you</div><div>can have 2 NUMA nodes on a single socket (we're about to order it to our lab) and AMD allows similar concepts in</div><div>some of their processors these days. </div><div><br></div><div>So what I suggest it that we'll review it together is order to provide a coherent representation</div><div>which works for both architectures. I'd like Roy Golan to work on this, as it has direct implications on NUMA, CPU pinning,</div><div>scheduling and other SLA related features. </div><div><br></div><div>I'll arrange a meeting and once we have all the information we'll publish it in a wiki page.</div><div><br></div><div>Thanks for bringing this up,</div><div>Doron</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 11, 2015 at 3:20 PM, Martin Polednik <span dir="ltr"><<a href="mailto:mpolednik@redhat.com" target="_blank">mpolednik@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 11/12/15 12:31 +0100, Michal Skrivanek wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
On 10 Dec 2015, at 16:36, Yaniv Kaul <<a href="mailto:ykaul@redhat.com" target="_blank">ykaul@redhat.com</a>> wrote:<br>
<br></span><span class="">
On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik <<a href="mailto:mpolednik@redhat.com" target="_blank">mpolednik@redhat.com</a> <mailto:<a href="mailto:mpolednik@redhat.com" target="_blank">mpolednik@redhat.com</a>>> wrote:<br>
Hello developers,<br>
<br>
tl;dr version:<br>
* deprecate report_host_threads_as_cores<br>
* remove cpuSockets, use sum(numaNodes.keys())<br>
* report threadsPerCore for ppc64le / report total number of threads<br>
for ppc64le<br>
* work on our naming issues<br>
<br>
I've been going over our capabilities reporting code in VDSM due to<br>
specific threading requirements on ppc64le platform and noticed few<br>
issues. Before trying to fix something that "works", I'm sending this<br>
mail to start a discussion regarding current and future state of the<br>
code.<br>
<br>
First thing is the terminology. What we consider cpu sockets, cores and threads are in fact NUMA cells, sum of cores present in NUMA<br>
nodes and the same for threads. I'd like to see the code moving in a<br>
direction that is correct in this sense.<br>
<br>
Note that I think users are more familiar with sockets-cores-threads than NUMA cells, terminology-wise.<br>
</span></blockquote><span class="">
<br>
we do report numa separately today, and we should keep doing that. I consider it another level of detail/complexity which many users do not care about.<br>
So we should keep both<br>
</span></blockquote>
<br>
The issue is not removing one or the other, but rather that what we<br>
report as CPU sockets/cores/threads are an actual NUMA<br>
sockets/cores/threads. As long as 1 CPU == 1 NUMA cell we're fine, but<br>
the POWER8 CPUs are 1 chip (socket) = 4 cores = 4 NUMA cells -<br>
reporting 4 as the number of sockets per cpu.<div><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<br>
More important are the actual calculations. I believe we should draw<br>
an uncrossable line between cores and threads and not interfere with<br>
it at least on VDSM's side. That would mean deprecating<br>
report_host_threads_as_cores option. The actual algorithm used at<br>
present does calculate the numa cores and numa threads correctly given<br>
that there are no offline CPUs - most likely fine enough. We don't<br>
have to report the actual number of sockets though, as it is reported<br>
in numa* keys.<br>
<br>
There is a reason for report_host_threads_as_cores option. I don't remember it right now, but it had to do with some limitation of some OS or license or something.<br>
I don't think we should deprecate it.<br>
</blockquote>
<br>
the idea was to remove that option from VDSM conf (as it’s cumbersome to use), and rather report all relevant information so engine can decide later on whether to count it this way or another<br>
Today it’s used as a simple “core multiplier” if your workload is running "good enough” in parallel on 2 threads within one core, we just consider it as additional available “cpu”. For some workloads where this assumption is not working well, and also for licensing or any other reason, you can disable it and see “half” of the cpus on x86 despite having HT enabled in BIOS.<br>
<br>
On PPC this is more tricky as MArtin says below - (threads are not able to run multiple VMs simultaneously) - so we need to push that decision from vdsm up the chain.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
It does fail to provide us with information that can be used in<br>
ppc64le environment, where for POWER8 we want to run the host without<br>
SMT while VMs would have multiple CPUs assigned. There are various<br>
configurations of so-called subcores in POWER8, where each CPU core<br>
can contain 1, 2 or 4 subcores. This configuration must be taken in<br>
consideration as given e.g. 160 threads overall, it is possible to run<br>
either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2<br>
mode. We have to report either the total number of threads OR just the<br>
threadsPerCore setting, so the users know how many "CPUs" should be<br>
assigned to machines for optimal performance.<br>
</blockquote>
<br>
x per y sounds best to me<br>
but I think it’s even more complicated, if we consider offline CPUs(we don’t do that today) then the default picture on POWER8 currently looks like 20 cores in 4 numa cells, 8 threads per core. SMT is disabled altogether, so CPUs 1-7,9-15,… are offline. So should we report them or not? On x86 I would not do that as they are administratively disabled and can’t be used, however on ppc since RHEL 7.2 they are dynamically enabled on demand (if the guest topology uses threads as well), so they should be reported as available (or "sort-of-available”:)<br>
</blockquote>
<br></div></div>
If we report threads per core, the offline CPUs can be calculated from<br>
the available online CPUs and given threads per core value.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
still, I think we should go with simple "sockets, cores/socket, threads/core” numbers,<br>
the rest needs to be computed or chosen from, based on additional detailed report of NUMA topology and online/offline CPU status<br>
perhaps with different behavior/capabilities on x86 and on power<br>
<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
<br>
YAY... do we have a comparison what libvirt knows / looks at (or they ignore it altogether?)<br>
Y.<br>
<br>
<br>
As always, I welcome any opinions regarding the proposed ideas. Also<br>
note that all of the changes can be done via deprecation to be fully<br>
backwards compatible - except for the ppc part.<br>
<br>
Regards,<br>
mpolednik<br>
_______________________________________________<br>
Devel mailing list<br>
</span><a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a> <mailto:<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a>><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a> <<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a>><span class=""><br>
<br>
_______________________________________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a><br>
</span></blockquote>
<br>
</blockquote><div class="HOEnZb"><div class="h5">
_______________________________________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org" target="_blank">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman/listinfo/devel</a></div></div></blockquote></div><br></div>