[Engine-devel] NUMA support action items

Gilad Chaplik gchaplik at redhat.com
Thu Apr 3 14:32:38 UTC 2014


----- Original Message -----
> From: "Chegu Vinod" <chegu_vinod at hp.com>
> To: "Gilad Chaplik" <gchaplik at redhat.com>
> Cc: "Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ)" <xiao-lei.shi at hp.com>, "Einav Cohen" <ecohen at redhat.com>, "Shang-Chun
> Liang (David Liang, HPservers-Core-OE-PSC)" <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao,
> HPservers-Core-OE-PSC)" <chuan.liao at hp.com>, msivak at redhat.com, "Da-huai Tang (Gary, MCXS-CQ)"
> <da-huai.tang at hp.com>, "Malini Rao" <mrao at redhat.com>, "Eldan Hildesheim" <ehildesh at redhat.com>, "Doron Fediuck"
> <dfediuck at redhat.com>, sherold at redhat.com, "Alexander Wels" <awels at redhat.com>, "engine-devel"
> <engine-devel at ovirt.org>
> Sent: Thursday, April 3, 2014 5:21:49 PM
> Subject: Re: NUMA support action items
> 
> On 4/3/2014 7:11 AM, Gilad Chaplik wrote:
> > ----- Original Message -----
> >> From: "Chegu Vinod" <chegu_vinod at hp.com>
> >> To: "Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ)" <xiao-lei.shi at hp.com>
> >> Cc: "Einav Cohen" <ecohen at redhat.com>, "Shang-Chun Liang (David Liang,
> >> HPservers-Core-OE-PSC)"
> >> <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao, HPservers-Core-OE-PSC)"
> >> <chuan.liao at hp.com>, msivak at redhat.com,
> >> "Da-huai Tang (Gary, MCXS-CQ)" <da-huai.tang at hp.com>, "Malini Rao"
> >> <mrao at redhat.com>, "Eldan Hildesheim"
> >> <ehildesh at redhat.com>, "Doron Fediuck" <dfediuck at redhat.com>,
> >> sherold at redhat.com, "Alexander Wels"
> >> <awels at redhat.com>, "Gilad Chaplik" <gchaplik at redhat.com>
> >> Sent: Thursday, April 3, 2014 3:28:03 PM
> >> Subject: RE: NUMA support action items
> >>
> >> Hi Bruce,
> >>
> >> The virtual NUMA layout in the guest is a very simple one (not multi-level
> >> etc). It is generated by qemu+seabios... and there is no relationship with
> >> the host NUMA node distances etc.  Let us not worry about gathering
> >> Virtual
> >> NUMA node distances for now.
> >>
> >> Vinod
> >>
> > CC'ing devel list as well.
> >
> > Having said that, I don't see a reason why not to prepare an infrastructure
> > for that (if it's free) for future versions (guest agent will collect
> > vNuma data in some point in time).
> 
> If you think having this Virtual NUMA topology (along with the virtual
> numa node *distance* info.) really helps some future use cases then pl.
> go ahead...
> 
> Vinod
> 
> 

I really don't know. but IMO, me as a user that gets some machine (transparent to the fact it's VM or bare metal), it would be very nice to see the NUMA stats outside of the machine.

Thanks, 
Gilad.
> 
> >
> > Thanks,
> > Gilad.
> >
> >> -----Original Message-----
> >> From: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ)
> >> Sent: Thursday, April 03, 2014 12:41 AM
> >> To: Vinod, Chegu
> >> Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
> >> Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Tang,
> >> Da-huai (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim; Doron Fediuck;
> >> sherold at redhat.com; Alexander Wels; Gilad Chaplik
> >> Subject: RE: NUMA support action items
> >>
> >> Hi Vinod,
> >>
> >> Is it meaningful for us to collect the distance information of vm numa
> >> node
> >> (maybe in future, not now)?
> >> In my understanding, vm numa topology is a simulation of numa topology,
> >> since
> >> the vcpus are just threads, I don't know how the vm numa node distances
> >> are
> >> calculated in vm. Is there any relationship between the vNode distances
> >> and
> >> host node distances?
> >>
> >> Thanks & Best Regards
> >> Shi, Xiao-Lei (Bruce)
> >>
> >> Hewlett-Packard Co., Ltd.
> >> HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile
> >> +86
> >> 18696583447 Email xiao-lei.shi at hp.com
> >>
> >>
> >> -----Original Message-----
> >> From: Vinod, Chegu
> >> Sent: Thursday, April 03, 2014 7:18 AM
> >> To: Gilad Chaplik
> >> Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
> >> Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Shi,
> >> Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini
> >> Rao; Eldan Hildesheim; Doron Fediuck; sherold at redhat.com; Alexander Wels
> >> Subject: RE: NUMA support action items
> >>
> >> Not sure what the correct way to do this is....but here is a suggestion.
> >>
> >> Let a given host server diagram shown be very generic...i.e. show the N
> >> sockets/nodes numbered from 0 thru N-1.  Show the amount of memory and the
> >> list of CPUs in each of those sockets/nodes.
> >> Draw a generic Interconnect fabric [box] in between which all the sockets
> >> connect to....
> >>
> >> Ideally ... Under that host diagram we could show the NUMA node distances
> >> in
> >> text format (as you know this is derived from the "numactl -H" and then
> >> conveyed from VDSM-> oVIrt engine etc).
> >> That distance info. will tell the user what the distance between a pair of
> >> sockets/nodes are (and they can then do what they wish after that :)).
> >>
> >> Vinod
> >>
> >> -----Original Message-----
> >> From: Gilad Chaplik [mailto:gchaplik at redhat.com]
> >> Sent: Wednesday, April 02, 2014 4:09 PM
> >> To: Vinod, Chegu
> >> Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
> >> Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Shi,
> >> Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini
> >> Rao; Eldan Hildesheim; Doron Fediuck; sherold at redhat.com; Alexander Wels
> >> Subject: Re: NUMA support action items
> >>
> >> Thank you Vinod for the much elaborate explanation.
> >> GUI-wise, do you want to show those numbers? maybe for first phase, enough
> >> to
> >> show them via API?
> >>
> >> A thought, According to your example there could be up to 2 distances, so
> >> maybe the 'closer' nodes can be on the same column or sth; I mean to try
> >> an
> >> illustrate it graphically rather than with numbers (we have enough of
> >> those
> >> :)).
> >>
> >> Thanks,
> >> Gilad.
> >>
> >> ----- Original Message -----
> >>> From: "Chegu Vinod" <chegu_vinod at hp.com>
> >>> To: "Einav Cohen" <ecohen at redhat.com>
> >>> Cc: "Gilad Chaplik" <gchaplik at redhat.com>, "Shang-Chun Liang (David
> >>> Liang,
> >>> HPservers-Core-OE-PSC)"
> >>> <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao,
> >>> HPservers-Core-OE-PSC)" <chuan.liao at hp.com>, msivak at redhat.com, "Xiao-Lei
> >>> Shi (Bruce, HP Servers-PSC-CQ)" <xiao-lei.shi at hp.com>, "Da-huai Tang
> >>> (Gary, MCXS-CQ)"
> >>> <da-huai.tang at hp.com>, "Malini Rao" <mrao at redhat.com>, "Eldan Hildesheim"
> >>> <ehildesh at redhat.com>, "Doron Fediuck"
> >>> <dfediuck at redhat.com>, sherold at redhat.com, "Alexander Wels"
> >>> <awels at redhat.com>
> >>> Sent: Saturday, March 29, 2014 8:15:56 AM
> >>> Subject: Re: NUMA support action items
> >>>
> >>> On 3/27/2014 10:42 AM, Einav Cohen wrote:
> >>>> Hi Vinod, thank you very much for that extra information.
> >>>>
> >>>> unfortunately, we are not familiar with what are levels of NUMA
> >>>> (local socket/node, buddy socket/node, remote socket/
> >>>> node) and/or what "distance" is - I assume that these are
> >>>> definitions that are related to the physical layout of the
> >>>> sockets/cores/nodes/RAM and/or to their physical proximity to each
> >>>> other, but we will need more detailed explanations if this would
> >>>> need to be incorporated into the UX design.
> >>>>
> >>>> Will you be able to explain it to us / refer us to some material on
> >>>> that?
> >>> Sorry for the delay in response (I was in a conference).
> >>>
> >>> Not sure if the following hi-level explanation would help (I will look
> >>> for some references in the mean time..or perhaps you can ask someone
> >>> like Joe Mario in Shak's performance group to explain it to you).
> >>>
> >>> In the smaller NUMA servers each socket is directly connected (i.e.
> >>> single "hop" away) to any other socket in the server.. This is typical
> >>> of all 2 socket Intel servers and a vast majority of 4 socket Intel
> >>> servers.
> >>>
> >>> In some larger NUMA servers a socket could either be directly
> >>> connected (single "hop" away) to another socket (or) may have to go
> >>> through an interconnect fabric (like a crossbar fabric agent chip.
> >>> etc). to get to another socket in the system (i.e. several "hops"
> >>> away).  The sockets that are directly connected (i.e. single "hop"
> >>> away) are the buddy sockets...and those that aren't are the remote
> >>> sockets.  Some call this type of a server as having a multi-level NUMA
> >>> topology...
> >>>
> >>> The way to decipher all of this is by looking at the NUMA node
> >>> distance table (I had included a sample of that in the slides that I sent
> >>> earlier).
> >>>
> >>> For e.g. in the example 4 socket server..where all sockets are just
> >>> one hop away the node distances are as follows
> >>>
> >>> node distances:
> >>> node   0   1   2   3
> >>>     0:  10  21  21  21
> >>>     1:  21  10  21  21
> >>>     2:  21  21  10  21
> >>>     3:  21  21  21  10
> >>>
> >>> Going from node0 to nodes[1-3] (or for that matter any pair of nodes)
> >>> the node distance is the same. i.e. 2.1x latency
> >>>
> >>> In another example of a different (larger 8 socket server) the node
> >>> distances looked something like this :
> >>>
> >>> node distances:
> >>> node   0   1   2   3     4    5    6    7
> >>>     0:   10  16  30  30  30  30  30  30
> >>>     1:   16  10  30  30  30  30  30  30
> >>>     2:   30  30  10  16  30  30  30  30
> >>>     3:   30  30  16  10  30  30  30  30
> >>>     4:   30  30  30  30  10  16  30  30
> >>>     5:   30  30  30  30  16  10  30  30
> >>>     6:   30  30  30  30  30  30  10  16
> >>>     7:   30  30  30  30  30  30  16  10
> >>>
> >>> Going from node 0 to node 1 (buddy)  which is just one hop away had a
> >>> node distance of 1.6x... but going from node 0 to nodes 3-7 meant
> >>> going through the interconnect fabric and it was expensive i.e. 3x.
> >>> The nodes
> >>> 3-7 are the remote nodes for node 0.
> >>>
> >>> HTH
> >>> Vinod
> >>>
> >>>> Many thanks in advance.
> >>>>
> >>>> ----
> >>>> Regards,
> >>>> Einav
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Chegu Vinod" <chegu_vinod at hp.com>
> >>>>> To: "Gilad Chaplik" <gchaplik at redhat.com>, "Shang-Chun Liang (David
> >>>>> Liang, HPservers-Core-OE-PSC)"
> >>>>> <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao,
> >>>>> HPservers-Core-OE-PSC)"
> >>>>> <chuan.liao at hp.com>, msivak at redhat.com, "Xiao-Lei Shi (Bruce, HP
> >>>>> Servers-PSC-CQ)" <xiao-lei.shi at hp.com>, "Da-huai Tang (Gary,
> >>>>> MCXS-CQ)"
> >>>>> <da-huai.tang at hp.com>, "Malini Rao" <mrao at redhat.com>, "Eldan
> >>>>> Hildesheim"
> >>>>> <ehildesh at redhat.com>
> >>>>> Cc: "Doron Fediuck" <dfediuck at redhat.com>, "Einav Cohen"
> >>>>> <ecohen at redhat.com>, sherold at redhat.com, "Alexander Wels"
> >>>>> <awels at redhat.com>
> >>>>> Sent: Thursday, March 27, 2014 12:00:51 AM
> >>>>> Subject: RE: NUMA support action items
> >>>>>
> >>>>> Thanks for sharing the UX info.
> >>>>>
> >>>>> There is one thing that I forgot to mention in today's morning
> >>>>> meeting...
> >>>>>
> >>>>> There are hosts that will  have one level of NUMA (i.e.   local
> >>>>> socket/node
> >>>>> and then  remote socket/node). Most  <= 4 socket hosts belong to
> >>>>> this category.  (I consider this as the sweet spot servers)
> >>>>>
> >>>>> When it comes to larger hosts with 8 sockets and more...there can
> >>>>> be some hosts with multiple levels of NUMA  (i.e. local
> >>>>> socket/node,  buddy socket/node, and then remote socket/node).
> >>>>>
> >>>>> Pl. see attached.... (the 8 socket prototype system is a HP
> >>>>> platform...and its actually only showing half of the system...the
> >>>>> actual system is 16 sockets but has a similar NUMA topology). The
> >>>>> NUMA  node distances of a given host will provide information about the
> >>>>> # of levels of NUMA ...
> >>>>>
> >>>>> Something to keep in mind when you  folks choose to display the
> >>>>> host NUMA toplogy in the UX.
> >>>>>
> >>>>> Thanks
> >>>>> Vinod
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Gilad Chaplik [mailto:gchaplik at redhat.com]
> >>>>> Sent: Wednesday, March 26, 2014 9:26 AM
> >>>>> To: Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Liao,
> >>>>> Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Shi,
> >>>>> Xiao-Lei (Bruce, HP Servers-PSC-CQ); Vinod, Chegu; Tang, Da-huai
> >>>>> (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim
> >>>>> Cc: Doron Fediuck; Einav Cohen; sherold at redhat.com; Alexander Wels
> >>>>> Subject: NUMA support action items
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> First of all I'd like to thank Malini and Eldan for their great
> >>>>> work, I'm sure we'll have a cool UI thanks to them, and Vinod for great
> >>>>> insights.
> >>>>>
> >>>>> Keep on with the great work :-)
> >>>>>
> >>>>> Action items (as I see it) for next couple of weeks (in parasitism
> >>>>> the
> >>>>> owner):
> >>>>>
> >>>>> 0) Resolve community design comments, and finish design phase
> >>>>> including sketches (All).
> >>>>> 1) Finish UX design and sketches (Malini and Eldan, all to assist).
> >>>>> * focus on VM dialog (biggest gap as I see it).
> >>>>> * 'default host' topology view, where we don't pin a host.
> >>>>> * NUMA in cluster level.
> >>>>> 2) Engine Core API, merge BE patch [1], and prepare patches for
> >>>>> other APIs (commands (VdcActionType), queries (VdcQueryType),
> >>>>> including parameter classes).
> >>>>> note that the actual implementation can be mock-ups of fake NUMA
> >>>>> entities, in order to start GUI/RESTful development in parallel (HP
> >>>>> development team).
> >>>>> 3) Test VDSM API (vdcClient) including very basic benchmarks and
> >>>>> publish a report (HP development team).
> >>>>> 4) VDSM - engine core integration (HP development team, Martin and
> >>>>> Gilad to assist).
> >>>>> 5) DB scripts and store proc - post maintainer (Eli M) acking the
> >>>>> design (HP development team, Gilad to assist).
> >>>>> 6) RESTful API impl - post maintainer (Juan H) acking the design
> >>>>> (HP development team, Gilad to assist).
> >>>>> 7) GUI programmatic design and starting implementation - in order
> >>>>> to start it ASAP, the engine's API should be available ASAP see
> >>>>> action item #2 (Gilad, assistance from Einav's UX team).
> >>>>> 8) MOM and KSM integration, continue current thread and reach
> >>>>> conclusions (HP development team, Martin to assist).
> >>>>>
> >>>>> You are more than welcome to comment :-) nothing's carved in stone.
> >>>>> if I forgot someone, please reply to all and CC him.
> >>>>>
> >>>>> Thanks,
> >>>>> Gilad.
> >>>>>
> >>>>> [1] http://gerrit.ovirt.org/#/c/23702/
> >>>>>
> >>>> .
> >>>>
> >>>
> > .
> >
> 
> 



More information about the Engine-devel mailing list