numa_intro - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> numa_intro (3)

numa_intro(3)

NAME [Toc] [Back]

       numa_intro - Introduction to NUMA support

DESCRIPTION [Toc] [Back]

       NUMA,  or  Non-Uniform Memory Access, refers to a hardware
       architectural feature in modern  multiprocessor  platforms
       that  attempts to address the increasing disparity between
       requirements for processor speed  and  bandwidth  and  the
       bandwidth  capabilities  of  memory systems, including the
       interconnect between processors and memory.  NUMA  systems
       address  this  problem  by grouping resources--processors,
       I/O buses, and memory--into building blocks  that  balance
       an  appropriate  number of processors and I/O buses with a
       local memory system that delivers the necessary bandwidth.
       The  local building blocks are combined into a larger system
 by means of a system-level interconnect with  a  platform-specific
 topology.

       The  local  processor  and  I/O components on a particular
       building block can access their own  "local"  memory  with
       the  lowest  possible  latency  for  a  particular  system
       design. The local building block can in  turn  access  the
       resources (processors, I/O, and memory) of remote building
       blocks  at  the  cost  of  increased  access  latency  and
       decreased  global  access bandwidth. The term "Non-Uniform
       Memory Access" refers to the difference in latency between
       "local"  and  "remote" memory accesses that can occur on a
       NUMA platform.

       Overall system throughput and individual application  performance
 is optimized on a NUMA platform by maximizing the
       ratio of local resource accesses to remote accesses.  This
       is  achieved  by recognizing and preserving the "affinity"
       that processes have for the various resources on the  system
 building blocks.  For this reason, the building blocks
       are called "Resource Affinity Domains" or RADs.

       RADs are supported only on a class of platforms  known  as
       Cache  Coherent  NUMA,  or  CC  NUMA,  where all memory is
       accessible and cache coherent with respect to all  processors
  and  I/O  buses.  The  Tru64  UNIX  operating system
       includes enhancements to optimize  system  throughput  and
       application  performance  on  CC NUMA platforms for legacy
       applications as well as those that  use  NUMA-aware  APIs.
       System  enhancements  to support NUMA are discussed in the
       following subsections.  Along with system performance monitoring
  and  tuning  facilities, these enhancements allow
       the operating system to make a "best effort"  to  optimize
       the performance of any given collection of applications or
       application components on a CC-NUMA platform.

   NUMA Enhancements to Basic UNIX Algorithms and Default  Behav-
       iors
       For NUMA, modifications to basic UNIX algorithms (scheduling,
 memory allocation,  and  so  forth)  and  to  default
       behaviors  maximize local accesses transparently to applications.
 These modifications, which include the following,
       directly  benefit  legacy  and non-NUMA-aware applications
       that were designed for  uniprocessors  or  Uniform  Memory
       Access  Symmetric Multiprocessors but run on CC NUMA platforms:
 Topology-aware placement of data

              The operating system attempts  to  allocate  memory
              for  application (and kernel) data on the RAD closest
 to where the data will  be  accessed;  or,  for
              data  that is globally accessed, the operating system
 may allocate memory across the available  RADs.
              When  there  is insufficient free memory on optimal
              RADs, the memory allocations for  data  may  "overflow"
  onto  nearby RADs.  Replication of read-only
              code and data

              The operating system will attempt to make  a  local
              copy  of read-only text, such as shared library and
              program code. Kernel code and kernel read-only data
              are  replicated on all RADs at boot time. If insufficient
 free local memory is available, the operating
  system  may  choose  to  utilize a remote copy
              rather than wait for  free  local  memory.   Memory
              affinity-aware scheduling

              The  operating system scheduler takes "cache affinity"
 into account when choosing a processor to  run
              a process thread on multiprocessor platforms. Cache
              affinity assumes that a  process  thread  builds  a
              "memory  footprint"  in  a  particular  processor's
              cache. On CC NUMA  platforms,  the  scheduler  also
              takes  into  account  the  fact that processes will
              have memory allocated on particular RADs, and  will
              attempt  to  keep  processes  running on processors
              that are in the same  RAD  as  their  memory  footprints.
  Load balancing

              To minimize the requirement for remote memory allocation
 (overflow), the  scheduler  will  take  into
              account memory availability on a RAD as well as the
              processor load average for the RAD. Although  these
              two factors may at times conflict with one another,
              the scheduler will attempt to balance the  load  so
              that  processes run where there are memory pages as
              well as processor cycles available. This  balancing
              involves  both  the  initial  selection of a RAD at
              process creation  and  migration  of  processes  or
              individual  pages  in response to changing loads as
              processes come and go or  their  resource  requirements
 or access patterns change.

   NUMA Enhancements to Application Programming Interfaces    [Toc]    [Back]
       Application  programmers  can  use new or modified library
       routines to further increase local  accesses  on  CC  NUMA
       platforms.  Using  these  APIs,  programmers can write new
       applications or modify  old  ones  to  provide  additional
       information  to  the  operating system or to take explicit
       control over process, thread, memory object placement,  or
       some combination of these.

       Following  are  tables that list the NUMA library routines
       that deal with RADs and RAD sets, processes  and  threads,
       memory  management, CPUs and CPU sets, and NUMA Scheduling
       Groups. Routines are listed alphabetically in each  table,
       and some routines are listed in more than one table.

       For information about NUMA types, structures, and symbolic
       values, see numa_types(4).   For  information  about  NUMA
       Scheduling Groups, see numa_scheduling_groups(4).

       RADs and RAD Sets    [Toc]    [Back]


       -----------------------------------------------------------------------
       Function            Purpose              Library   Reference Page
       -----------------------------------------------------------------------
       nloc()              Returns   the  RAD   libnuma    nloc(3)
                           set  that   is   a
                           specified distance
                           from a resource.

       rad_attach_pid()    Attaches a process   libnuma    rad_attach_pid(3)
                           to  a RAD (assigns
                           a  home  RAD   but
                           allows   execution
                           on other RADs).

       rad_bind_pid()      Binds a process to   libnuma    rad_attach_pid(3)
                           a  RAD  (assigns a
                           home    RAD    and
                           restricts   execution
 to  the  home
                           RAD).

       rad_foreach()       Scans  a  RAD  set   libnuma    rad_foreach(3)
                           for  members   and
                           returns  the first
                           member found.

       rad_get_cur-        Returns        the   libnuma   rad_get_current_home()
         caller's home RAD.             rent_home(3)

       rad_get_cpus()      Returns the set of   libnuma    rad_get_num(3)
                           CPUs that are in a
                           RAD.

       rad_get_freemem()   Returns a snapshot   libnuma    rad_get_num(3)
                           of the free memory
                           pages  that are in
                           a RAD.

       rad_get_info()      Returns   informa-   libnuma    rad_get_num(3)
                           tion  about a RAD,
                           including      its
                           state  (online  or
                           offline)  and  the
                           number of CPUs and
                           memory  pages   it
                           contains.

       rad_get_max()       Returns the number   libnuma    rad_get_num(3)
                           of  RADs  in   the
                           system.  **

       rad_get_num()       Returns the number   libnuma    rad_get_num(3)
                           of  RAD's  in  the
                           caller's    partition.
 **
       rad_get_physmem()   Returns the number   libnuma    rad_get_num(3)
                           of  memory   pages
                           assigned to a RAD.







       rad_get_state()     Reserved       for   libnuma    rad_get_num(3)
                           future  use. (Currently,
 RAD  state
                           is  always  set to
                           RAD_ONLINE.)

       radaddset()         Adds a  RAD  to  a   libnuma    radsetops(3)
                           RAD set.

       radandset()         Performs a logical   libnuma    radsetops(3)
                           AND  operation  on
                           two    RAD   sets,
                           storing the result
                           in a RAD set.

       radcopyset()        Copies   the  con-   libnuma    radsetops(3)
                           tents of  one  RAD
                           set to another RAD
                           set.

       radcountset()       Returns  the  mem-   libnuma    radsetops(3)
                           bers of a RAD set.

       raddelset()         Removes a RAD from   libnuma    radsetops(3)
                           a RAD set.

       raddiffset()        Finds  the logical   libnuma    radsetops(3)
                           difference between
                           two    RAD   sets,
                           storing the result
                           in   another   RAD
                           set.

       rademptyset()       Initializes a  RAD   libnuma    radsetops(3)
                           set  such  that no
                           RADs are included.

       radfillset()        Initializes  a RAD   libnuma    radsetops(3)
                           set such  that  it
                           includes all RADs.

       radisemptyset()     Tests  whether   a   libnuma    radsetops(3)
                           RAD  set is empty.

       radismember()       Tests  whether   a   libnuma    radsetops(3)
                           RAD  belongs  to a
                           given RAD set.

       radorset()          Performs a logical   libnuma    radsetops(3)
                           OR   operation  on
                           two   RAD    sets,
                           storing the result
                           in   another   RAD
                           set.

       radsetcreate()      Allocates   a  RAD   libnuma    radsetops(3)
                           set and sets it to
                           empty.

       radsetdestroy()     Releases  the mem-   libnuma    radsetops(3)
                           ory allocated  for
                           a RAD set.




       radxorset()         Performs a logical   libnuma    radsetops(3)
                           XOR  operation  on
                           two    RAD   sets,
                           storing the result
                           in   another   RAD
                           set.

       -----------------------------------------------------------------------

       ** On a partitioned system, the system and  the  partition
       are  equivalent.   In  this  case,  the  operating  system
       returns information only for the partition in which it  is
       installed.

       Processes and Threads    [Toc]    [Back]


       ----------------------------------------------------------------------------------
       Function               Purpose                 Library      Reference Page
       ----------------------------------------------------------------------------------
       nfork()                Creates  a child pro-   libnuma       nfork(3)
                              cess that is an exact
                              copy  of  its  parent
                              process. See also the
                              table    entry    for
                              rad_fork().

       nmadvise()             Tells the system what   libnuma       nmadvise(3)
                              behavior   to  expect
                              from a  process  with
                              respect  to referencing
 mapped files  and
                              shared         memory
                              regions.

       nsg_attach_pid()       Attaches a process to   libnuma      nsg_attach_pid(3)
                              a   NUMA   scheduling
                              group.

       nsg_detach_pid()       Detaches   a  process   libnuma      nsg_attach_pid(3)
                              from a NUMA  scheduling
 group.

       pthread_nsg_attach()   Attaches  a thread to   libpthread   pthread_nsg_attach(3)
                              a   NUMA   scheduling
                              group.

       pthread_nsg_detach()   Detaches   a   thread   libpthread   pthread_nsg_detach(3)
                              from a NUMA  scheduling
 group.

       pthread_rad_attach()   Attaches  a thread to   libpthread   pthread_rad_attach(3)
                              a RAD set.

       pthread_rad_bind()     Attaches  a thread to   libpthread   pthread_rad_attach(3)
                              a   RAD    set    and
                              restricts  its execution
 to the home RAD.

       pthread_rad_detach()   Detaches   a   thread   libpthread   pthread_rad_detach(3)
                              from a RAD set.





       rad_attach_pid()       Attaches a process to   libnuma      rad_attach_pid(3)
                              a RAD (assigns a home
                              RAD but allows execution
 on other  RADs).

       rad_bind_pid()         Binds  a process to a   libnuma      rad_attach_pid(3)
                              RAD (assigns  a  home
                              RAD   and   restricts
                              execution to the home
                              RAD).

       rad_fork()             Creates  a child pro-   libnuma       rad_fork(3)
                              cess on  a  RAD  that
                              optionally  does  not
                              inherit    the    RAD
                              assignment   of   its
                              parent. See also  the
                              table    entry    for
                              nfork().

       ----------------------------------------------------------------------------------

       Memory Management    [Toc]    [Back]


       ----------------------------------------------------------------------
       Function            Purpose                 Library   Reference Page
       ----------------------------------------------------------------------
       memalloc_attr()     Returns  the   memory   libnuma   memalallocation
 policy for             loc_attr(3)
                           a RAD  set  specified
                           by     its    virtual
                           address.

       nacreate()          Sets up an arena  for   libc       amalloc(3)
                           memory allocation for
                           use  with  the  amalloc()
  function..  An
                           arena is used in multithreaded
   programs
                           when  there is a need
                           for   thread-specific
                           heap  memory  allocation.


       nmadvise()          Tells the system what   libnuma    nmadvise(3)
                           behavior   to  expect
                           from a  process  with
                           respect  to referencing
 mapped files  and
                           shared         memory
                           regions.

       nmmap()             Maps an open file (or   libnuma    nmmap(3)
                           anonymous     memory)
                           onto   the    address
                           space  for  a process
                           by using a  specified
                           memory     allocation
                           policy.






       nshmget()           Returns  or   creates   libnuma    nshmget(3)
                           the  ID  for a shared
                           memory region.

       ----------------------------------------------------------------------

       CPUs and CPU Sets    [Toc]    [Back]


       -----------------------------------------------------------------------
       Function            Purpose                  Library   Reference Page
       -----------------------------------------------------------------------
       cpu_foreach()       Enumerates the members   libc      cpu_foreach(3)
                           of a CPU set.

       cpu_get_current()   Returns the identifier   libc      cpu_get_curof
 the current CPU  on             rent(3)
                           which the calling process
 is running.

       cpu_get_info()      Returns  CPU  informa-   libc      cpu_get_info(3)
                           tion for  the  system.
                           **

       cpu_get_max()       Returns the number  of   libc      cpu_get_info(3)
                           CPU slots available in
                           the  caller's   partition.
 **

       cpu_get_num()       Returns the number  of   libc      cpu_get_info(3)
                           available CPUs.

       cpu_get_rad()       Returns the RAD  iden-   libnuma   cpu_get_rad(3)
                           tifier for a CPU.

       cpuaddset()         Adds  a  CPU  to a CPU   libc       cpusetops(3)
                           set.

       cpuandset()         Performs a logical AND   libc       cpusetops(3)
                           operation  on the contents
 of two CPU sets,
                           storing  the result in
                           a third CPU set.

       cpucopyset()        Copies the contents of   libc       cpusetops(3)
                           one CPU set to another
                           CPU set.

       cpucountset()       Returns the number  of   libc       cpusetops(3)
                           CPUs in a CPU set.

       cpudelset()         Deletes  a  CPU from a   libnuma    cpusetops(3)
                           CPU set.

       cpudiffset()        Finds the logical dif-   libnuma    cpusetops(3)
                           ference   between  two
                           CPU sets, storing  the
                           result  in a third CPU
                           set.

       cpuemptyset()       Initializes a CPU  set   libnuma    cpusetops(3)
                           such  that it includes
                           no CPUs.



       cpufillset()        Initializes  a CPU set   libnuma    cpusetops(3)
                           such  that it includes
                           all CPUs.

       cpuisemptyset()     Tests  whether  a  CPU   libnuma    cpusetops(3)
                           set is empty.

       cpuismember()       Tests whether a CPU is   libnuma    cpusetops(3)
                           a member of a particular
 CPU set.

       cpuorset()          Performs  a logical OR   libnuma    cpusetops(3)
                           operation on the  contents
 of two CPU sets,
                           storing the result  in
                           a third CPU set.

       cpusetcreate()      Allocates  a  CPU  set   libnuma    cpusetops(3)
                           and sets it to  empty.

       cpusetdestroy()     Releases   the  memory   libnuma    cpusetops(3)
                           allocated  to  a   CPU
                           set.

       cpuxorset()         Performs a logical XOR   libnuma    cpusetops(3)
                           operation on the  contents
 of two CPU sets,
                           storing the result  in
                           a third CPU set.

       -----------------------------------------------------------------------

       **  On  a partitioned system, the system and the partition
       are  equivalent.   In  this  case,  the  operating  system
       returns  information only for the partition in which it is
       installed.

       NUMA Scheduling Groups    [Toc]    [Back]


       ---------------------------------------------------------------------------------
       Function               Purpose                Library      Reference Page
       ---------------------------------------------------------------------------------
       nsg_attach_pid()       Attaches  a  process   libnuma      nsg_attach_pid(3)
                              to a NUMA scheduling
                              group.

       nsg_destroy()          Removes    a    NUMA   libnuma       nsg_destroy(3)
                              scheduling group and
                              deallocates      its
                              structures.

       nsg_detach_pid()       Detaches  a  process   libnuma      nsg_attach_pid(3)
                              from a NUMA scheduling
 group.

       pthread_nsg_attach()   Attaches a thread to   libpthread   pthread_nsg_attach(3)
                              a   NUMA  scheduling
                              group.

       pthread_nsg_detach()   Detaches  a   thread   libpthread   pthread_nsg_detach(3)
                              from a NUMA scheduling
 group.



       nsg_get()              Returns  the  status   libnuma       nsg_get(3)
                              of a NUMA scheduling
                              group.

       nsg_get_nsgs()         Returns  a  list  of   libnuma       nsg_get_nsgs(3)
                              NUMA      scheduling
                              groups    that   are
                              active.

       nsg_get_pids()         Returns  a  list  of   libnuma       nsg_get_pids(3)
                              processes   attached
                              to a NUMA scheduling
                              group.

       nsg_init()             Looks up (and possi-   libnuma       nsg_init(3)
                              bly creates) a  NUMA
                              scheduling group.

       nsg_set()              Sets  group ID, user   libnuma       nsg_set(3)
                              ID, and  permissions
                              for  a NUMA scheduling
 group.

       pthread_nsg_get()      Returns  a  list  of   libpthread   pthread_nsg_get(3)
                              threads  attached to
                              a  NUMA   scheduling
                              group.

       ---------------------------------------------------------------------------------


   NUMA Enhancements to System Utilities and Deamons    [Toc]    [Back]
       A  number of system commands display RAD-specific information
 or perform  RAD-specific  operations.  The  following
       list  briefly describes the NUMA options supported by system
 utilities and daemons: The runon -r  command  executes
       an  application  on a specific RAD.  The vmstat -r command
       displays virtual memory statistics  for  a  specific  RAD.
       The netstat -R command displays network routing tables for
       each RAD.  The ps -o RAD command includes RAD  binding  in
       the  information  displayed about processes running on the
       system.  The hwmgr -view hier  command  displays  the  RAD
       location  of CPUs and devices. In this case, in place of a
       RAD identifier, the command  identifies  the  contruct  in
       hardware  that  corresponds to a RAD.  When run on a GS80,
       GS160, or GS320 AlphaServer platform,  the  command  shows
       the hierarchy of CPUs and devices within QBBs. When run on
       an ES80 or GS1280 AlphaServer platform, the command  shows
       the  hierarchy of CPUs and devices within PIDs (processing
       unit IDs).  The sched_stat -R command  also  displays  the
       RAD  location  of  system  CPUs. In addition, this command
       shows the relative distance (number of hops) between CPUs.
       The -t and -u options on the nfsd command allow customization
 of the number of TCP and UCP server threads,  respectively,
  that are spawned per RAD. This feature allows the
       NFS server to automatically scale the number  of  TCP  and
       UCP  server  threads  according to the size of the system.
       The -r option on the inetd command allows customization of
       the  RAD locations on which to start Internet server child
       daemons. By default, one child deamon is started  on  each
       RAD.   The  route  -R  command of the kdbx kernel debugger
       displays network route tables for all RADs.

numa_intro(3)

Contents

NAME [Toc] [Back]

DESCRIPTION [Toc] [Back]

SEE ALSO [Toc] [Back]