cpu, core and thread mappings

Hello, I was talking with a guy expert in VMware and discussing performance of VMs in respect of virtual cpus assigned to them in relation with mapping with the real hw of the hypervisor underneath. One of the topics was numa usage and its overheads in case of a "too" big VM, in terms of both number of vcpus and memory amount. Eg: suppose host has 2 intel based sockets, with 6 cores and HT enabled and has 96Gb of ram (distributed 48+48 between the 2 processors) suppose I configure a VM with 16 vcpus (2:4:2): would be the mapping respected at physical level or only a sort of "hint" for the hypervisor? Can I say that it would perform better if I configure it 12 vcpus and mapping 1:6:2, because it can stay all inside one cpu? And what if I define a VM with 52Gb of ram? Can I say that it would perform in general better if I try to get it all in one cpu related memory slots (eg not more than 48Gb in my example)? Are there any documents going more deeply in these sort of considerations? Also, if one goes and sizes so that the biggest VM is able to all-stay inside one cpu-memory, does it make sense to say that it will perform better in this scenario a cluster composed by 4 nodes, each one with 1 socket and 48Gb of memory instead of a cluster of 2 nodes, each one with 2 sockets and 96Gb of ram? Hope I have clarified my questions/doubts. Thanks in advance for any insight, Gianluca

--Apple-Mail-FC5CF2F4-77D9-4AB3-AECA-43D5365F3844 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable
Are there any documents going more deeply in these sort of considerations?=
This is exactly what I have been searching for lately, too. Please let me k= now if you find anything (or blog posts, forums, books, etc). Thank you! Sent from my mobile phone
On 6 Sep 2017, at 12:47, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote= : =20 Hello, I was talking with a guy expert in VMware and discussing performance of VM= s in respect of virtual cpus assigned to them in relation with mapping with t= he real hw of the hypervisor underneath. =20 One of the topics was numa usage and its overheads in case of a "too" big V= M, in terms of both number of vcpus and memory amount. Eg:=20 suppose host has 2 intel based sockets, with 6 cores and HT enabled and ha= s 96Gb of ram (distributed 48+48 between the 2 processors) suppose I configure a VM with 16 vcpus (2:4:2): would be the mapping respe= cted at physical level or only a sort of "hint" for the hypervisor? Can I say that it would perform better if I configure it 12 vcpus and mapp= ing 1:6:2, because it can stay all inside one cpu? =20 And what if I define a VM with 52Gb of ram? Can I say that it would perfor= m in general better if I try to get it all in one cpu related memory slots (= eg not more than 48Gb in my example)? =20 Are there any documents going more deeply in these sort of considerations?=
=20 Also, if one goes and sizes so that the biggest VM is able to all-stay ins= ide one cpu-memory, does it make sense to say that it will perform better in= this scenario a cluster composed by 4 nodes, each one with 1 socket and 48G= b of memory instead of a cluster of 2 nodes, each one with 2 sockets and 96G= b of ram? =20 Hope I have clarified my questions/doubts. =20 =20 Thanks in advance for any insight, Gianluca =20 _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
<div><br>On 6 Sep 2017, at 12:47, Gianluca Cecchi <<a href=3D"mailto:gia= nluca.cecchi@gmail.com">gianluca.cecchi@gmail.com</a>> wrote:<br><br></di= v><blockquote type=3D"cite"><div><div dir=3D"ltr">Hello,<div>I was talking w= ith a guy expert in VMware and discussing performance of VMs in respect of v= irtual cpus assigned to them in relation with mapping with the real hw of th= e hypervisor underneath.</div><div><br></div><div>One of the topics was numa= usage and its overheads in case of a "too" big VM, in terms of both number o= f vcpus and memory amount.</div><div>Eg: </div><div>suppose host has 2 i= ntel based sockets, with 6 cores and HT enabled and has 96Gb of ram (distrib= uted 48+48 between the 2 processors)</div><div>suppose I configure a VM with= 16 vcpus (2:4:2): would be the mapping respected at physical level or only a= sort of "hint" for the hypervisor?</div><div>Can I say that it would perfor= m better if I configure it 12 vcpus and mapping 1:6:2, because it can stay a= ll inside one cpu?</div><div><br></div><div>And what if I define a VM with 5= 2Gb of ram? Can I say that it would perform in general better if I try to ge= t it all in one cpu related memory slots (eg not more than 48Gb in my exampl= e)?</div><div><br></div><div>Are there any documents going more deeply in th= ese sort of considerations?</div><div><br></div><div>Also, if one goes and s= izes so that the biggest VM is able to all-stay inside one cpu-memory, does i= t make sense to say that it will perform better in this scenario a cluster c= omposed by 4 nodes, each one with 1 socket and 48Gb of memory instead of a c= luster of 2 nodes, each one with 2 sockets and 96Gb of ram?</div><div><br></=
<br></div><div>Thanks in advance for any insight,</div><div>Gianluca = </div></div> </div></blockquote><blockquote type=3D"cite"><div><span>____________________= ___________________________</span><br><span>Users mailing list</span><br><sp= an><a href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a></span><br><span><a=
--Apple-Mail-FC5CF2F4-77D9-4AB3-AECA-43D5365F3844 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto"><div><span></span></div><div><meta http-equ= iv=3D"content-type" content=3D"text/html; charset=3Dutf-8"><div><blockquote t= ype=3D"cite"><div dir=3D"ltr"><font color=3D"#000000"><span style=3D"backgro= und-color: rgba(255, 255, 255, 0);">Are there any documents going more deepl= y in these sort of considerations?</span></font></div></blockquote><div id=3D= "AppleMailSignature"><br></div>This is exactly what I have been searching fo= r lately, too. Please let me know if you find anything (or blog posts,= forums, books, etc). Thank you!<br><br>Sent from my mobile phone</div= div><div>Hope I have clarified my questions/doubts.</div><div><br></div><div= href=3D"http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.o= rg/mailman/listinfo/users</a></span><br></div></blockquote></div></body></ht= ml>= --Apple-Mail-FC5CF2F4-77D9-4AB3-AECA-43D5365F3844--

On Wed, Sep 6, 2017 at 2:47 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
Hello, I was talking with a guy expert in VMware and discussing performance of VMs in respect of virtual cpus assigned to them in relation with mapping with the real hw of the hypervisor underneath.
One of the topics was numa usage and its overheads in case of a "too" big VM, in terms of both number of vcpus and memory amount. Eg: suppose host has 2 intel based sockets, with 6 cores and HT enabled and has 96Gb of ram (distributed 48+48 between the 2 processors) suppose I configure a VM with 16 vcpus (2:4:2): would be the mapping respected at physical level or only a sort of "hint" for the hypervisor? Can I say that it would perform better if I configure it 12 vcpus and mapping 1:6:2, because it can stay all inside one cpu?
Hard to say without relationship to the workload. You are losing 4 vCPUs - perhaps those can be used for something (the OS) while the rest of them could be used by the application, for example?
And what if I define a VM with 52Gb of ram? Can I say that it would perform in general better if I try to get it all in one cpu related memory slots (eg not more than 48Gb in my example)?
Hard to say without relationship to the workload - will it need all the
memory? Will it be accessing all of it, in random order? If you've maxed out one node, you just need more memory from the other node.
Are there any documents going more deeply in these sort of considerations?
It is so workload dependent that there will not be a one size fit all. Also, if one goes and sizes so that the biggest VM is able to all-stay
inside one cpu-memory, does it make sense to say that it will perform better in this scenario a cluster composed by 4 nodes, each one with 1 socket and 48Gb of memory instead of a cluster of 2 nodes, each one with 2 sockets and 96Gb of ram?
You could have used affinity. See [1] for some details on Redis. Note that IO (specifically network) is just as important - and its impact is much more profound. Y. [1] https://redis.io/topics/benchmarks
Hope I have clarified my questions/doubts.
Thanks in advance for any insight, Gianluca
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (3)
-
Andrew Nesbit
-
Gianluca Cecchi
-
Yaniv Kaul