Hello,
I was talking with a guy expert in VMware and discussing performance of VMs in respect of virtual cpus assigned to them in relation with mapping with the real hw of the hypervisor underneath.

One of the topics was numa usage and its overheads in case of a "too" big VM, in terms of both number of vcpus and memory amount.
Eg: 
suppose host has 2 intel based sockets, with 6 cores and HT enabled and has 96Gb of ram (distributed 48+48 between the 2 processors)
suppose I configure a VM with 16 vcpus (2:4:2): would be the mapping respected at physical level or only a sort of "hint" for the hypervisor?
Can I say that it would perform better if I configure it 12 vcpus and mapping 1:6:2, because it can stay all inside one cpu?

And what if I define a VM with 52Gb of ram? Can I say that it would perform in general better if I try to get it all in one cpu related memory slots (eg not more than 48Gb in my example)?

Are there any documents going more deeply in these sort of considerations?

Also, if one goes and sizes so that the biggest VM is able to all-stay inside one cpu-memory, does it make sense to say that it will perform better in this scenario a cluster composed by 4 nodes, each one with 1 socket and 48Gb of memory instead of a cluster of 2 nodes, each one with 2 sockets and 96Gb of ram?

Hope I have clarified my questions/doubts.


Thanks in advance for any insight,
Gianluca