[Engine-devel] NUMA support action items
Gilad Chaplik
gchaplik at redhat.com
Thu Apr 3 14:11:36 UTC 2014
----- Original Message -----
> From: "Chegu Vinod" <chegu_vinod at hp.com>
> To: "Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ)" <xiao-lei.shi at hp.com>
> Cc: "Einav Cohen" <ecohen at redhat.com>, "Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC)"
> <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao, HPservers-Core-OE-PSC)" <chuan.liao at hp.com>, msivak at redhat.com,
> "Da-huai Tang (Gary, MCXS-CQ)" <da-huai.tang at hp.com>, "Malini Rao" <mrao at redhat.com>, "Eldan Hildesheim"
> <ehildesh at redhat.com>, "Doron Fediuck" <dfediuck at redhat.com>, sherold at redhat.com, "Alexander Wels"
> <awels at redhat.com>, "Gilad Chaplik" <gchaplik at redhat.com>
> Sent: Thursday, April 3, 2014 3:28:03 PM
> Subject: RE: NUMA support action items
>
> Hi Bruce,
>
> The virtual NUMA layout in the guest is a very simple one (not multi-level
> etc). It is generated by qemu+seabios... and there is no relationship with
> the host NUMA node distances etc. Let us not worry about gathering Virtual
> NUMA node distances for now.
>
> Vinod
>
CC'ing devel list as well.
Having said that, I don't see a reason why not to prepare an infrastructure for that (if it's free) for future versions (guest agent will collect vNuma data in some point in time).
Thanks,
Gilad.
>
> -----Original Message-----
> From: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ)
> Sent: Thursday, April 03, 2014 12:41 AM
> To: Vinod, Chegu
> Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
> Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Tang,
> Da-huai (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim; Doron Fediuck;
> sherold at redhat.com; Alexander Wels; Gilad Chaplik
> Subject: RE: NUMA support action items
>
> Hi Vinod,
>
> Is it meaningful for us to collect the distance information of vm numa node
> (maybe in future, not now)?
> In my understanding, vm numa topology is a simulation of numa topology, since
> the vcpus are just threads, I don't know how the vm numa node distances are
> calculated in vm. Is there any relationship between the vNode distances and
> host node distances?
>
> Thanks & Best Regards
> Shi, Xiao-Lei (Bruce)
>
> Hewlett-Packard Co., Ltd.
> HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86
> 18696583447 Email xiao-lei.shi at hp.com
>
>
> -----Original Message-----
> From: Vinod, Chegu
> Sent: Thursday, April 03, 2014 7:18 AM
> To: Gilad Chaplik
> Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
> Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Shi,
> Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini
> Rao; Eldan Hildesheim; Doron Fediuck; sherold at redhat.com; Alexander Wels
> Subject: RE: NUMA support action items
>
> Not sure what the correct way to do this is....but here is a suggestion.
>
> Let a given host server diagram shown be very generic...i.e. show the N
> sockets/nodes numbered from 0 thru N-1. Show the amount of memory and the
> list of CPUs in each of those sockets/nodes.
> Draw a generic Interconnect fabric [box] in between which all the sockets
> connect to....
>
> Ideally ... Under that host diagram we could show the NUMA node distances in
> text format (as you know this is derived from the "numactl -H" and then
> conveyed from VDSM-> oVIrt engine etc).
> That distance info. will tell the user what the distance between a pair of
> sockets/nodes are (and they can then do what they wish after that :)).
>
> Vinod
>
> -----Original Message-----
> From: Gilad Chaplik [mailto:gchaplik at redhat.com]
> Sent: Wednesday, April 02, 2014 4:09 PM
> To: Vinod, Chegu
> Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
> Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Shi,
> Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini
> Rao; Eldan Hildesheim; Doron Fediuck; sherold at redhat.com; Alexander Wels
> Subject: Re: NUMA support action items
>
> Thank you Vinod for the much elaborate explanation.
> GUI-wise, do you want to show those numbers? maybe for first phase, enough to
> show them via API?
>
> A thought, According to your example there could be up to 2 distances, so
> maybe the 'closer' nodes can be on the same column or sth; I mean to try an
> illustrate it graphically rather than with numbers (we have enough of those
> :)).
>
> Thanks,
> Gilad.
>
> ----- Original Message -----
> > From: "Chegu Vinod" <chegu_vinod at hp.com>
> > To: "Einav Cohen" <ecohen at redhat.com>
> > Cc: "Gilad Chaplik" <gchaplik at redhat.com>, "Shang-Chun Liang (David Liang,
> > HPservers-Core-OE-PSC)"
> > <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao,
> > HPservers-Core-OE-PSC)" <chuan.liao at hp.com>, msivak at redhat.com, "Xiao-Lei
> > Shi (Bruce, HP Servers-PSC-CQ)" <xiao-lei.shi at hp.com>, "Da-huai Tang
> > (Gary, MCXS-CQ)"
> > <da-huai.tang at hp.com>, "Malini Rao" <mrao at redhat.com>, "Eldan Hildesheim"
> > <ehildesh at redhat.com>, "Doron Fediuck"
> > <dfediuck at redhat.com>, sherold at redhat.com, "Alexander Wels"
> > <awels at redhat.com>
> > Sent: Saturday, March 29, 2014 8:15:56 AM
> > Subject: Re: NUMA support action items
> >
> > On 3/27/2014 10:42 AM, Einav Cohen wrote:
> > > Hi Vinod, thank you very much for that extra information.
> > >
> > > unfortunately, we are not familiar with what are levels of NUMA
> > > (local socket/node, buddy socket/node, remote socket/
> > > node) and/or what "distance" is - I assume that these are
> > > definitions that are related to the physical layout of the
> > > sockets/cores/nodes/RAM and/or to their physical proximity to each
> > > other, but we will need more detailed explanations if this would
> > > need to be incorporated into the UX design.
> > >
> > > Will you be able to explain it to us / refer us to some material on
> > > that?
> >
> > Sorry for the delay in response (I was in a conference).
> >
> > Not sure if the following hi-level explanation would help (I will look
> > for some references in the mean time..or perhaps you can ask someone
> > like Joe Mario in Shak's performance group to explain it to you).
> >
> > In the smaller NUMA servers each socket is directly connected (i.e.
> > single "hop" away) to any other socket in the server.. This is typical
> > of all 2 socket Intel servers and a vast majority of 4 socket Intel
> > servers.
> >
> > In some larger NUMA servers a socket could either be directly
> > connected (single "hop" away) to another socket (or) may have to go
> > through an interconnect fabric (like a crossbar fabric agent chip.
> > etc). to get to another socket in the system (i.e. several "hops"
> > away). The sockets that are directly connected (i.e. single "hop"
> > away) are the buddy sockets...and those that aren't are the remote
> > sockets. Some call this type of a server as having a multi-level NUMA
> > topology...
> >
> > The way to decipher all of this is by looking at the NUMA node
> > distance table (I had included a sample of that in the slides that I sent
> > earlier).
> >
> > For e.g. in the example 4 socket server..where all sockets are just
> > one hop away the node distances are as follows
> >
> > node distances:
> > node 0 1 2 3
> > 0: 10 21 21 21
> > 1: 21 10 21 21
> > 2: 21 21 10 21
> > 3: 21 21 21 10
> >
> > Going from node0 to nodes[1-3] (or for that matter any pair of nodes)
> > the node distance is the same. i.e. 2.1x latency
> >
> > In another example of a different (larger 8 socket server) the node
> > distances looked something like this :
> >
> > node distances:
> > node 0 1 2 3 4 5 6 7
> > 0: 10 16 30 30 30 30 30 30
> > 1: 16 10 30 30 30 30 30 30
> > 2: 30 30 10 16 30 30 30 30
> > 3: 30 30 16 10 30 30 30 30
> > 4: 30 30 30 30 10 16 30 30
> > 5: 30 30 30 30 16 10 30 30
> > 6: 30 30 30 30 30 30 10 16
> > 7: 30 30 30 30 30 30 16 10
> >
> > Going from node 0 to node 1 (buddy) which is just one hop away had a
> > node distance of 1.6x... but going from node 0 to nodes 3-7 meant
> > going through the interconnect fabric and it was expensive i.e. 3x.
> > The nodes
> > 3-7 are the remote nodes for node 0.
> >
> > HTH
> > Vinod
> >
> > > Many thanks in advance.
> > >
> > > ----
> > > Regards,
> > > Einav
> > >
> > >
> > > ----- Original Message -----
> > >> From: "Chegu Vinod" <chegu_vinod at hp.com>
> > >> To: "Gilad Chaplik" <gchaplik at redhat.com>, "Shang-Chun Liang (David
> > >> Liang, HPservers-Core-OE-PSC)"
> > >> <shangchun.liang at hp.com>, "Chuan Liao (Jason Liao,
> > >> HPservers-Core-OE-PSC)"
> > >> <chuan.liao at hp.com>, msivak at redhat.com, "Xiao-Lei Shi (Bruce, HP
> > >> Servers-PSC-CQ)" <xiao-lei.shi at hp.com>, "Da-huai Tang (Gary,
> > >> MCXS-CQ)"
> > >> <da-huai.tang at hp.com>, "Malini Rao" <mrao at redhat.com>, "Eldan
> > >> Hildesheim"
> > >> <ehildesh at redhat.com>
> > >> Cc: "Doron Fediuck" <dfediuck at redhat.com>, "Einav Cohen"
> > >> <ecohen at redhat.com>, sherold at redhat.com, "Alexander Wels"
> > >> <awels at redhat.com>
> > >> Sent: Thursday, March 27, 2014 12:00:51 AM
> > >> Subject: RE: NUMA support action items
> > >>
> > >> Thanks for sharing the UX info.
> > >>
> > >> There is one thing that I forgot to mention in today's morning
> > >> meeting...
> > >>
> > >> There are hosts that will have one level of NUMA (i.e. local
> > >> socket/node
> > >> and then remote socket/node). Most <= 4 socket hosts belong to
> > >> this category. (I consider this as the sweet spot servers)
> > >>
> > >> When it comes to larger hosts with 8 sockets and more...there can
> > >> be some hosts with multiple levels of NUMA (i.e. local
> > >> socket/node, buddy socket/node, and then remote socket/node).
> > >>
> > >> Pl. see attached.... (the 8 socket prototype system is a HP
> > >> platform...and its actually only showing half of the system...the
> > >> actual system is 16 sockets but has a similar NUMA topology). The
> > >> NUMA node distances of a given host will provide information about the
> > >> # of levels of NUMA ...
> > >>
> > >> Something to keep in mind when you folks choose to display the
> > >> host NUMA toplogy in the UX.
> > >>
> > >> Thanks
> > >> Vinod
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: Gilad Chaplik [mailto:gchaplik at redhat.com]
> > >> Sent: Wednesday, March 26, 2014 9:26 AM
> > >> To: Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Liao,
> > >> Chuan (Jason Liao, HPservers-Core-OE-PSC); msivak at redhat.com; Shi,
> > >> Xiao-Lei (Bruce, HP Servers-PSC-CQ); Vinod, Chegu; Tang, Da-huai
> > >> (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim
> > >> Cc: Doron Fediuck; Einav Cohen; sherold at redhat.com; Alexander Wels
> > >> Subject: NUMA support action items
> > >>
> > >> Hi All,
> > >>
> > >> First of all I'd like to thank Malini and Eldan for their great
> > >> work, I'm sure we'll have a cool UI thanks to them, and Vinod for great
> > >> insights.
> > >>
> > >> Keep on with the great work :-)
> > >>
> > >> Action items (as I see it) for next couple of weeks (in parasitism
> > >> the
> > >> owner):
> > >>
> > >> 0) Resolve community design comments, and finish design phase
> > >> including sketches (All).
> > >> 1) Finish UX design and sketches (Malini and Eldan, all to assist).
> > >> * focus on VM dialog (biggest gap as I see it).
> > >> * 'default host' topology view, where we don't pin a host.
> > >> * NUMA in cluster level.
> > >> 2) Engine Core API, merge BE patch [1], and prepare patches for
> > >> other APIs (commands (VdcActionType), queries (VdcQueryType),
> > >> including parameter classes).
> > >> note that the actual implementation can be mock-ups of fake NUMA
> > >> entities, in order to start GUI/RESTful development in parallel (HP
> > >> development team).
> > >> 3) Test VDSM API (vdcClient) including very basic benchmarks and
> > >> publish a report (HP development team).
> > >> 4) VDSM - engine core integration (HP development team, Martin and
> > >> Gilad to assist).
> > >> 5) DB scripts and store proc - post maintainer (Eli M) acking the
> > >> design (HP development team, Gilad to assist).
> > >> 6) RESTful API impl - post maintainer (Juan H) acking the design
> > >> (HP development team, Gilad to assist).
> > >> 7) GUI programmatic design and starting implementation - in order
> > >> to start it ASAP, the engine's API should be available ASAP see
> > >> action item #2 (Gilad, assistance from Einav's UX team).
> > >> 8) MOM and KSM integration, continue current thread and reach
> > >> conclusions (HP development team, Martin to assist).
> > >>
> > >> You are more than welcome to comment :-) nothing's carved in stone.
> > >> if I forgot someone, please reply to all and CC him.
> > >>
> > >> Thanks,
> > >> Gilad.
> > >>
> > >> [1] http://gerrit.ovirt.org/#/c/23702/
> > >>
> > > .
> > >
> >
> >
>
More information about the Engine-devel
mailing list