--Apple-Mail=_643EEE38-B765-4581-8070-B2E414E4F79B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
On 10 Dec 2015, at 16:36, Yaniv Kaul <ykaul(a)redhat.com> wrote:
=20
On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik <mpolednik(a)redhat.com =
<mailto:mpolednik@redhat.com>> wrote:
Hello developers,
=20
tl;dr version:
* deprecate report_host_threads_as_cores
* remove cpuSockets, use sum(numaNodes.keys())
* report threadsPerCore for ppc64le / report total number of threads
for ppc64le
* work on our naming issues
=20
I've been going over our capabilities reporting code in VDSM due to
specific threading requirements on ppc64le platform and noticed few
issues. Before trying to fix something that "works", I'm sending this
mail to start a discussion regarding current and future state of the
code.
=20
First thing is the terminology. What we consider cpu sockets, cores =
and threads
are in fact NUMA cells, sum of cores present in NUMA
nodes and the same for threads. I'd like to see the code moving
in a
direction that is correct in this sense.
=20
Note that I think users are more familiar with sockets-cores-threads =
than NUMA
cells, terminology-wise.
we do report numa separately today, and we should keep doing that. I =
consider it another level of detail/complexity which many users do not =
care about.=20
So we should keep both
=20
=20
More important are the actual calculations. I believe we should draw
an uncrossable line between cores and threads and not interfere with
it at least on VDSM's side. That would mean deprecating
report_host_threads_as_cores option. The actual algorithm used at
present does calculate the numa cores and numa threads correctly given
that there are no offline CPUs - most likely fine enough. We don't
have to report the actual number of sockets though, as it is reported
in numa* keys.
=20
There is a reason for report_host_threads_as_cores option. I don't =
remember
it right now, but it had to do with some limitation of some OS =
or license or something.
I don't think we should deprecate it.
the idea was to remove that option from VDSM conf (as it=E2=80=99s =
cumbersome to use), and rather report all relevant information so engine =
can decide later on whether to count it this way or another
Today it=E2=80=99s used as a simple =E2=80=9Ccore multiplier=E2=80=9D if =
your workload is running "good enough=E2=80=9D in parallel on 2 threads =
within one core, we just consider it as additional available =E2=80=9Ccpu=E2=
=80=9D. For some workloads where this assumption is not working well, =
and also for licensing or any other reason, you can disable it and see =
=E2=80=9Chalf=E2=80=9D of the cpus on x86 despite having HT enabled in =
BIOS.
On PPC this is more tricky as MArtin says below - (threads are not able =
to run multiple VMs simultaneously) - so we need to push that decision =
from vdsm up the chain.
=20
It does fail to provide us with information that can be used in
ppc64le environment, where for POWER8 we want to run the host without
SMT while VMs would have multiple CPUs assigned. There are various
configurations of so-called subcores in POWER8, where each CPU core
can contain 1, 2 or 4 subcores. This configuration must be taken in
consideration as given e.g. 160 threads overall, it is possible to run
either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2
mode. We have to report either the total number of threads OR just the
threadsPerCore setting, so the users know how many "CPUs" should be
assigned to machines for optimal performance.
x per y sounds best to me
but I think it=E2=80=99s even more complicated, if we consider offline =
CPUs(we don=E2=80=99t do that today) then the default picture on POWER8 =
currently looks like 20 cores in 4 numa cells, 8 threads per core. SMT =
is disabled altogether, so CPUs 1-7,9-15,=E2=80=A6 are offline. So =
should we report them or not? On x86 I would not do that as they are =
administratively disabled and can=E2=80=99t be used, however on ppc =
since RHEL 7.2 they are dynamically enabled on demand (if the guest =
topology uses threads as well), so they should be reported as available =
(or "sort-of-available=E2=80=9D:)
still, I think we should go with simple "sockets, cores/socket, =
threads/core=E2=80=9D numbers,
the rest needs to be computed or chosen from, based on additional =
detailed report of NUMA topology and online/offline CPU status
perhaps with different behavior/capabilities on x86 and on power
=20
YAY... do we have a comparison what libvirt knows / looks at (or they =
ignore it
altogether?)
Y.
=20
=20
As always, I welcome any opinions regarding the proposed ideas. Also
note that all of the changes can be done via deprecation to be fully
backwards compatible - except for the ppc part.
=20
Regards,
mpolednik
_______________________________________________
Devel mailing list
Devel(a)ovirt.org <mailto:Devel@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/devel =
<
http://lists.ovirt.org/mailman/listinfo/devel>
=20
_______________________________________________
Devel mailing list
Devel(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel
--Apple-Mail=_643EEE38-B765-4581-8070-B2E414E4F79B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html><head><meta http-equiv=3D"Content-Type"
content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><br class=3D""><div><blockquote
type=3D"cite" class=3D""><div =
class=3D"">On 10 Dec 2015, at 16:36, Yaniv Kaul <<a =
href=3D"mailto:ykaul@redhat.com"
class=3D"">ykaul(a)redhat.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div
class=3D""><div =
dir=3D"ltr" class=3D""><div
class=3D"gmail_extra"><div =
class=3D"gmail_quote">On Thu, Dec 10, 2015 at 5:07 PM, Martin Polednik =
<span dir=3D"ltr" class=3D""><<a
href=3D"mailto:mpolednik@redhat.com" =
target=3D"_blank"
class=3D"">mpolednik(a)redhat.com</a>&gt;</span> =
wrote:<br class=3D""><blockquote class=3D"gmail_quote"
style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello =
developers,<br class=3D"">
<br class=3D"">
tl;dr version:<br class=3D"">
* deprecate report_host_threads_as_cores<br class=3D"">
* remove cpuSockets, use sum(numaNodes.keys())<br class=3D"">
* report threadsPerCore for ppc64le / report total number of threads<br =
class=3D"">
for ppc64le<br class=3D"">
* work on our naming issues<br class=3D"">
<br class=3D"">
I've been going over our capabilities reporting code in VDSM due to<br =
class=3D"">
specific threading requirements on ppc64le platform and noticed few<br =
class=3D"">
issues. Before trying to fix something that "works", I'm sending this<br
=
class=3D"">
mail to start a discussion regarding current and future state of the<br =
class=3D"">
code.<br class=3D"">
<br class=3D"">
First thing is the terminology. What we consider cpu sockets, cores and =
threads are in fact NUMA cells, sum of cores present in NUMA<br =
class=3D"">
nodes and the same for threads. I'd like to see the code moving in a<br =
class=3D"">
direction that is correct in this sense.<br
class=3D""></blockquote><div =
class=3D""><br class=3D""></div><div
class=3D"">Note that I think users =
are more familiar with sockets-cores-threads than NUMA cells, =
terminology-wise.</div></div></div></div></div></blockquote><div><br
=
class=3D""></div></div><div>we do report numa separately
today, and we =
should keep doing that. I consider it another level of detail/complexity =
which many users do not care about. </div><div>So we should keep =
both</div><div><br class=3D""><blockquote
type=3D"cite" class=3D""><div =
class=3D""><div dir=3D"ltr" class=3D""><div
class=3D"gmail_extra"><div =
class=3D"gmail_quote"><div
class=3D""> </div><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">
<br class=3D"">
More important are the actual calculations. I believe we should draw<br =
class=3D"">
an uncrossable line between cores and threads and not interfere with<br =
class=3D"">
it at least on VDSM's side. That would mean deprecating<br
class=3D"">
report_host_threads_as_cores option. The actual algorithm used at<br =
class=3D"">
present does calculate the numa cores and numa threads correctly =
given<br class=3D"">
that there are no offline CPUs - most likely fine enough. We don't<br =
class=3D"">
have to report the actual number of sockets though, as it is reported<br =
class=3D"">
in numa* keys.<br class=3D""></blockquote><div
class=3D""><br =
class=3D""></div><div class=3D"">There is a reason for
=
report_host_threads_as_cores option. I don't remember it right now, but =
it had to do with some limitation of some OS or license or =
something.</div><div class=3D"">I don't think we should
deprecate =
it.</div></div></div></div></div></blockquote><div><br
=
class=3D""></div><div>the idea was to remove that option from
VDSM conf =
(as it=E2=80=99s cumbersome to use), and rather report all relevant =
information so engine can decide later on whether to count it this way =
or another</div><div>Today it=E2=80=99s used as a simple =E2=80=9Ccore =
multiplier=E2=80=9D if your workload is running "good enough=E2=80=9D in =
parallel on 2 threads within one core, we just consider it as additional =
available =E2=80=9Ccpu=E2=80=9D. For some workloads where this =
assumption is not working well, and also for licensing or any other =
reason, you can disable it and see =E2=80=9Chalf=E2=80=9D of the cpus on =
x86 despite having HT enabled in BIOS.</div><div><br
class=3D""></div>On =
PPC this is more tricky as MArtin says below - (threads are not able to =
run multiple VMs simultaneously) - so we need to push that decision from =
vdsm up the chain.</div><div><br class=3D""><blockquote
type=3D"cite" =
class=3D""><div class=3D""><div dir=3D"ltr"
class=3D""><div =
class=3D"gmail_extra"><div
class=3D"gmail_quote"><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">
<br class=3D"">
It does fail to provide us with information that can be used in<br =
class=3D"">
ppc64le environment, where for POWER8 we want to run the host without<br =
class=3D"">
SMT while VMs would have multiple CPUs assigned. There are various<br =
class=3D"">
configurations of so-called subcores in POWER8, where each CPU core<br =
class=3D"">
can contain 1, 2 or 4 subcores. This configuration must be taken in<br =
class=3D"">
consideration as given e.g. 160 threads overall, it is possible to =
run<br class=3D"">
either 20 VMs in smt8 mode, 40 VMs in smt4 mode or 80 VMs in smt2<br =
class=3D"">
mode. We have to report either the total number of threads OR just =
the<br class=3D"">
threadsPerCore setting, so the users know how many "CPUs" should be<br =
class=3D"">
assigned to machines for optimal performance.<br =
class=3D""></blockquote></div></div></div></div></blockquote><div><br
=
class=3D""></div>x per y sounds best to me</div><div>but I
think it=E2=80=99=
s even more complicated, if we consider offline CPUs(we don=E2=80=99t do =
that today) then the default picture on POWER8 currently looks like 20 =
cores in 4 numa cells, 8 threads per core. SMT is disabled altogether, =
so CPUs 1-7,9-15,=E2=80=A6 are offline. So should we report them or not? =
On x86 I would not do that as they are administratively disabled and =
can=E2=80=99t be used, however on ppc since RHEL 7.2 they are =
dynamically enabled on demand (if the guest topology uses threads as =
well), so they should be reported as available (or =
"sort-of-available=E2=80=9D:)</div><div><br
class=3D""></div><div>still, =
I think we should go with simple "sockets, cores/socket, threads/core=E2=80=
=9D numbers,</div><div>the rest needs to be computed or chosen from, =
based on additional detailed report of NUMA topology and online/offline =
CPU status</div><div>perhaps with different behavior/capabilities on x86 =
and on power</div><div><br class=3D""><blockquote
type=3D"cite" =
class=3D""><div class=3D""><div dir=3D"ltr"
class=3D""><div =
class=3D"gmail_extra"><div class=3D"gmail_quote"><div
class=3D""><br =
class=3D""></div><div class=3D"">YAY... do we have a
comparison what =
libvirt knows / looks at (or they ignore it altogether?)</div><div =
class=3D"">Y.</div><div
class=3D""> </div><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">
<br class=3D"">
As always, I welcome any opinions regarding the proposed ideas. Also<br =
class=3D"">
note that all of the changes can be done via deprecation to be fully<br =
class=3D"">
backwards compatible - except for the ppc part.<br class=3D"">
<br class=3D"">
Regards,<br class=3D"">
mpolednik<br class=3D"">
_______________________________________________<br class=3D"">
Devel mailing list<br class=3D"">
<a href=3D"mailto:Devel@ovirt.org" target=3D"_blank" =
class=3D"">Devel(a)ovirt.org</a><br class=3D"">
<a
href=3D"http://lists.ovirt.org/mailman/listinfo/devel" =
rel=3D"noreferrer" target=3D"_blank" =
class=3D"">http://lists.ovirt.org/mailman/listinfo/devel<... =
class=3D"">
</blockquote></div><br class=3D""></div></div>
_______________________________________________<br class=3D"">Devel =
mailing list<br class=3D""><a href=3D"mailto:Devel@ovirt.org"
=
class=3D"">Devel(a)ovirt.org</a><br =
class=3D"">http://lists.ovirt.org/mailman/listinfo/devel<...
</div><br class=3D""></body></html>=
--Apple-Mail=_643EEE38-B765-4581-8070-B2E414E4F79B--