*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->IRIX man pages -> refcnt (5)              
Title
Content
Arch
Section
 

Contents


refcnt(5)							     refcnt(5)


NAME    [Toc]    [Back]

     Memory Reference Counters - Analysis of memory access patterns

DESCRIPTION    [Toc]    [Back]

     The Origin	hardware and Cellular Irix provide memory reference counters
     to	assist application programmers in tuning their algorithms for optimal
     performance on a NUMA system. These counters are capable of unveiling the
     exact memory reference patterns exhibited by an application or a specific
     algorithm,	enabling the programmer	to optimize the	application data
     layout and	to provide specific memory placement hints to the Operating
     System in order to	maximize cache utilization and locality	of memory
     access, therefore achieving best memory access performance.

IMPLEMENTATION    [Toc]    [Back]

   Hardware Reference Counters
     Origin 2000 and Origin 200	systems	provide	a set of counters for every 4
     KB	hardware page of memory. The number of counters	per set	depends	on the
     number of nodes in	the system: for	systems	with less than 64 nodes	(that
     is	128 processors)	a counter set has one counter per node,	and for
     systems with more than 64 nodes a counter set has one counter for every 8
     nodes. For	systems	with 64	or less	nodes, each counter in a counter set
     counts the	numbers	of references from each	of the nodes. Thus, the
     application programmer can	tell exactly how many references have been
     issued to a page from each	node in	the system. For	systems	with more than
     64	nodes, each counter in a  counter set corresponds to the number	of
     references	to a page issued by a group of 8 nodes.

     Note that a hardware page is not equivalent to a base software page (or
     just page). A hardware page defines the granularity at with the hardware
     does reference counting and other hardware	operations; a base software
     page is the smallest unit of memory that can be mapped by user processes
     via the Translation Look-aside Buffer or the Page Tables. For Origin 2000
     and Origin	200 systems a hardware page, and therefore the memory
     reference counting	granularity, is	4 KB; and a base sofware page is 16KB.

     For example, consider an 8	node (16 cpu) Origin 2000 system with the
     memory configuration shown	in the table below. This table shows the
     number of hardware	pages (equivalent to the number	of counter sets), the
     number of total counters, and the number of base software pages per node.
     For this configuration of 8 nodes,	each counter set has 8 counters	(one
     per node).












									Page 1






refcnt(5)							     refcnt(5)



				  Memory Configuration
				  Hardware   Counter	Total	       Base
	Module	 Slot	Memory	   Pages      Sets     Counters	  Software Pages
			[bytes]	  Mem/4Kb    1/Hpage	8*Sets	     Mem/16Kb
	  1	  n1	 512M	    128K      128K	1024K	       32K
	  1	  n2	 256M	     65K       65K	 512K	       16K
	  1	  n3	 256M	     65K       65K	 512K	       16K
	  1	  n4	 512M	    128K      128K	1024K	       32K
	  2	  n1	 256M	     65K       65K	 512K	       16K
	  2	  n2	  64M	     16K       16K	 128K		4K
	  2	  n3	  64M	     16K       16K	 128K		4K
	  2	  n4	 256M	     65K       65K	 512K	       16K


     The length	of each	counter	also depends on	the system configuration.  For
     systems with more than 16 nodes (32 cpus),	the counters have a length of
     19	bits (maximum count is 0x7ffff). For systems with less than 16 nodes,
     the length	of the counters	depends	on the the kind	of directory SIMMS
     installed on the machine. If STANDARD SIMMS are installed,	then the
     counters are 11-bit (maximum count	0x7ff);	if PREMIUM SIMMS are
     installed,	then the counters are 19-bit.


   Sofware Extended Reference Counters    [Toc]    [Back]
     The hardware counters peg when they reach their maximum count. This is a
     problem for the 11-bit counters that would	peg after only 0x7ff (2047)
     references	to a page from one node. To allow application programmers to
     keep track	of memory references beyond this small number, Cellular	Irix
     provides Software Extended	Memory Reference Counters.

     The Extended Counters are implemented as an array of 32-bit counters that
     closely mirror the	hardware counters, extending their maximum count to
     2^32.  The	hardware counters are setup in such a way that they send an
     interrupt when they reach a threshold close to the	maximum	count. When
     this interrupt is received	by the operating system, the current hardware
     counter count is added to the corresponding software extended counter
     mirror, and the hardware counter is reset to 0. This update procedure is
     performed for complete counter sets, that is, when	we receive the
     overflow interrupt	we not only update the counter that is overflowing,
     but also all the other counters in	its set.

INTERFACE    [Toc]    [Back]

   Enabling Reference Counting
     To	enable reference counting for a	section	of virtual memory within an
     application, the programmer can use a Policy Module (mmci(5)) with	the
     migration policy set to "MigrationRefcnt".

	  void
	  refcnt_enable(char* vaddr, size_t len)
	  {
		  pmo_handle_t pm;



									Page 2






refcnt(5)							     refcnt(5)



		  policy_set_t policy_set;

		  pm_filldefault(&policy_set);
		  policy_set.migration_policy_name = "MigrationRefcnt";
		  policy_set.migration_policy_args = NULL;

		  if ((pm = pm_create(&policy_set)) < 0) {
			  perror("pm_create");
			  exit(1);
		  }

		  if (pm_attach(pm, vaddr, size) < 0) {
			  perror("pm_attach");
			  exit(1);
		  }
	  }



   Hardware Reference Counters    [Toc]    [Back]
     The hardware reference counters for a section of an address space can be
     accessed using procfs (proc(4)). The ioctl	command	code used for this
     purpose is	PIOCGETSN0REFCNTRS. The	third argument is used to specify both
     the virtual address space range we	need the counters for, and the buffer
     where the system should copy the counter values to. This argument is of
     type sn0_refcnt_args_t, as	defined	in <sys/SN/hwcntrs.h>:

	  typedef struct sn0_refcnt_args {
		  caddr_t	      vaddr;
		  long		      len;
		  sn0_refcnt_buf_t*   buf;
	  } sn0_refcnt_args_t;


     The first field vaddr is the base of the virtual address space range, the
     field len is the corresponding length in bytes, and the field buf is a
     pointer to	a user buffer where the	system will store the counter values
     and additional information. This buffer is	an array of elements of	type
     sn0_refcnt_buf_t, where each element corresponds to the counter
     information associated with one hardware page:

	  typedef struct sn0_refcnt_buf	{
	       sn0_refcnt_set_t	  refcnt_set;
	       __uint64_t	  paddr;
		  __uint64_t	     page_size;
		  cnodeid_t	     cnodeid;
	  } sn0_refcnt_buf_t;


     The field refcnt_set contains the set of counters associated with the
     virtual address passed via	sn0_refcnt_args, paddr is the address of the
     physical page associated with this	virtual	address, page_size is the page



									Page 3






refcnt(5)							     refcnt(5)



     size being	used to	map it,	and cnodeid is the physical page home node,
     expressed in terms	of Compact Node	Identifiers which can be mapped	back
     to	node names using the command topology(1).  The refcnt_set type is
     defined by

	  typedef struct sn0_refcnt_set	{
		  refcnt_t    refcnt[SN0_REFCNT_MAX_COUNTERS];
		  __uint64_t  flags;
	  } sn0_refcnt_set_t;


     The field refcnt is the actual set	of counters (one counter per node),
     and flags is a state vector reserved for future use.  The counters	in
     refcnt are	ordered	according to the Compact Node Identifiers, also	known
     as	cnodeids (numa(5)).



   Software Extended Reference Counters    [Toc]    [Back]
     The extended reference counters for a section of an address space can be
     accessed using procfs (proc(4)), using practically	the same interface
     defined above for the hardware reference counters.	 The ioctl command
     code used for this	purpose	is PIOCGETSN0EXTREFCNTRS (the difference
     between this command and the command used for the hardware	counters is
     the prefix	EXT before the word REFCNTRS). The third argument is used to
     specify both the virtual address space range we need the counters for,
     and the buffer where the system should copy the counter values to.	This
     argument is of type sn0_refcnt_args_t, as defined in <sys/SN/hwcntrs.h>:

	  typedef struct sn0_refcnt_args {
		  caddr_t	      vaddr;
		  long		      len;
		  sn0_refcnt_buf_t*   buf;
	  } sn0_refcnt_args_t;


     The first field vaddr is the base of the virtual address space range, the
     field len is the corresponding length in bytes, and the field buf is a
     pointer to	a user buffer where the	system will store the counter values
     and additional information. This buffer is	an array of elements of	type
     sn0_refcnt_buf_t, where each element corresponds to the counter
     information associated with one hardware page:

	  typedef struct sn0_refcnt_buf	{
	       sn0_refcnt_set_t	  refcnt_set;
	       __uint64_t	  paddr;
		  __uint64_t	     page_size;
		  cnodeid_t	     cnodeid;
	  } sn0_refcnt_buf_t;






									Page 4






refcnt(5)							     refcnt(5)



     The field refcnt_set contains the set of counters associated with the
     virtual address passed via	sn0_refcnt_args, paddr is the address of the
     physical page associated with this	virtual	address, page_size is the page
     size being	used to	map it,	and cnodeid is the physical page home node,
     expressed in terms	of Compact Node	Identifiers which can be mapped	back
     to	node names using the command topology(1).  The refcnt_set type is
     defined by

	  typedef struct sn0_refcnt_set	{
		  refcnt_t    refcnt[SN0_REFCNT_MAX_COUNTERS];
		  __uint64_t  flags;
	  } sn0_refcnt_set_t;


     The field refcnt is the actual set	of counters (one counter per node),
     and flags is a state vector reserved for future use.  The counters	in
     refcnt are	ordered	according to the Compact Node Identifiers, also	known
     as	cnodeids (numa(5)).

     The following routineshows	how to access both the hardware	counters and
     the sofware extended counters using procfs.

	  void
	  print_refcounters(char* vaddr, int len)
	  {
		  pid_t	pid = getpid();
		  char	pfile[256];
		  int fd;
		  sn0_refcnt_buf_t* refcnt_buffer;
		  sn0_refcnt_buf_t* direct_refcnt_buffer;
		  sn0_refcnt_args_t* refcnt_args;
		  int npages;
		  int gen_start;
		  int numnodes;
		  int page;
		  int node;
		  char mem_node[512];
		  refcnt_t* set_base;

		  sprintf(pfile, "/proc/%05d", pid);
		  if ((fd = open(pfile,	O_RDONLY)) < 0)	{
		    fprintf(stderr,"Can't open /proc/%d", pid);
		    exit(1);
	       }

		  vaddr	= (char	*)( (unsigned long)vaddr & ~(hw_page_size-1) );
		  npages = (len	+ (hw_page_size-1)) >> logb2(hw_page_size);

		  if ((refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }



									Page 5






refcnt(5)							     refcnt(5)




		  if ((direct_refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) *	npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }

		  if ((refcnt_args = malloc(sizeof(sn0_refcnt_args_t)))	== NULL) {
			  perror("malloc refcnt_args");
			  exit(1);
		  }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = refcnt_buffer;

		  if ((gen_start = ioctl(fd, PIOCGETSN0EXTREFCNTRS, (void *)refcnt_args)) < 0) {
		    perror("ioctl  PIOCGETSN0EXTREFCNTRS returns error");
		    exit(1);
	       }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = direct_refcnt_buffer;
		  if ((gen_start = ioctl(fd, PIOCGETSN0REFCNTRS, (void *)refcnt_args)) < 0) {
		    perror("ioctl  PIOCGETSN0REFCNTRS returns error");
		    exit(1);
	       }

		  if ((numnodes	= sysmp(MP_NUMNODES)) <	0) {
			  perror("sysmp	MP_NUMNODES");
			  exit(1);
		  }

		  for (page = 0; page <	npages;	page++)	{
			  printf("page[%05d, 0x%lx, 0x%llx (0x%llx)]:",
				 page,
				 vaddr + page*0x1000,
				 refcnt_buffer[page].paddr,
				 refcnt_buffer[page].paddr >> 14);
			  for (node = 0; node <	numnodes; node++) {
				  printf(" %ll05d (%ll06d)",
					 refcnt_buffer[page].refcnt_set.refcnt[node],
					 direct_refcnt_buffer[page].refcnt_set.refcnt[node]);
			  }
			  printf("0);
		  }

		  close(fd);
		  free(refcnt_args);
		  free(refcnt_buffer);
	  }




									Page 6






refcnt(5)							     refcnt(5)



   Memory Mapped Software Extended Reference Counters    [Toc]    [Back]
     The extended reference counters can also be accessed by mmapping them to
     a user application's virtual address space. This interface	is intended to
     be	used by	performance tools that provide a global	system view rather
     than a localized process view.

     This interface is based on	a device driver	associated with	a device that
     represents	the reference counters for each	node in	an Origin system.
     Here is the list of reference counter devices for an 8 node system:

	  /hw/module/2/slot/n1/node/refcnt
	  /hw/module/2/slot/n2/node/refcnt
	  /hw/module/2/slot/n3/node/refcnt
	  /hw/module/2/slot/n4/node/refcnt
	  /hw/module/1/slot/n1/node/refcnt
	  /hw/module/1/slot/n2/node/refcnt
	  /hw/module/1/slot/n3/node/refcnt
	  /hw/module/1/slot/n4/node/refcnt


     To	map the	counters in a node, a user needs to open the refcnt device for
     the node, then using the open file	descriptor the user needs to obtain
     information regarding the counters, defined by rcb_info_t in
     <sys/SN/hwcntrs.h>, using ioctl(fd, RCB_INFO_GET, &rcbinfo).

	  typedef struct rcb_info {
		  __uint64_t  rcb_len;			/* total refcnt	buffer len in bytes */

		  int	      rcb_sw_sets;		/* number of sw	counter	sets in	buffer */
		  int	      rcb_sw_counters_per_set;	/* sw counters per set -- numnodes */
		  int	      rcb_sw_counter_size;	/* sizeof(refcnt_t) -- size of sw cntr */

		  int	      rcb_base_pages;		/* number of base pages	in node	*/
		  int	      rcb_base_page_size;	/* sw base page	size */
		  __uint64_t  rcb_base_paddr;		/* base	physical address for this node */

		  int	      rcb_cnodeid;		/* cnodeid for this node */
		  int	      rcb_granularity;		/* hw page size	used for counter sets */
		  uint	      rcb_hw_counter_max;	/* max hwcounter count (width mask) */
		  int	      rcb_diff_threshold;	/* current node	differential threshold */
		  int	      rcb_abs_threshold;	/* current node	absolute threshold */
		  int	      rcb_num_slots;		/* physmem slots */

	  } rcb_info_t;


     Physical memory in	a node is not always contiguous, and therefore
     additional	information is necessary to determine the counter buffer
     location associated with a	physical page. Physical	memory within a	node
     is	divided	into a number of contiguous sections called "slots". The slot
     configuration for a node can be obtained using ioctl(fd, RCB_SLOT_GET,
     slotconfig), where	slot config is of type rcb_slot_t defined in



									Page 7






refcnt(5)							     refcnt(5)



     <sys/SN/hwcntrs.h>.

	  typedef struct rcb_slot {
		  __uint64_t  base;    /* Base physical	address	for slot */
		  __uint64_t  size;    /* Size of slot in bytes	*/
	  } rcb_slot_t;


     The procedure below shows the complete sequence of	operations required to
     mmap the reference	counters for all nodes.	The counters in	a buffer are
     organized as follows:


	  Set for hardware page	0 in node /hw/module/1/slot/n2/node
	       counter for accesses from node with cnodeid 0
	       counter for accesses from node with cnodeid 1
	       ...
	       ...
	  Set for hardware page	1 in node /hw/module/1/slot/n2/node
	       counter for accesses from node with cnodeid 0
	       counter for accesses from node with cnodeid 1
	       ...
	       ...


	  /*
	   * Reference Counter Configuration for all nodes
	   */
	  rcb_info_t** rcbinfo;

	  /*
	   * Physical Memory Config for	all nodes
	   */
	  rcb_slot_t** slotconfig;

	  /*
	   * Mapped counters for all nodes
	   */
	  refcnt_t** cbuffer;


	  void
	  mmap_counters(void)
	  {
		  int fd;
		  char refcnt[1024];
		  refcnt_t* set_base;
		  int numnodes;
		  int node;

		  /* number of nodes */




									Page 8






refcnt(5)							     refcnt(5)



		  numnodes = sysmp(MP_NUMNODES);

		  /* space for refcnt config --	just basic array for now */

		  rcbinfo = (rcb_info_t**)malloc(sizeof(rcb_info_t*) * numnodes);
		  if (rcbinfo == NULL) {
			  perror("malloc");
			  exit(1);
		  }

		  /* space for phys mem	config -- just basic array for now*/

		  slotconfig = (rcb_slot_t**)malloc(sizeof(rcb_slot_t*)	* numnodes);
		  if (slotconfig == NULL) {
			  perror("malloc");
			  exit(1);
		  }

		  /* space for array of	pointers to the	counter	buffers	*/
		  cbuffer = (refcnt_t**)malloc(sizeof(refcnt_t*) * numnodes);
		  if (cbuffer == NULL) {
			  perror("malloc");
			  exit(1);
		  }

		  for (node = 0; node <	numnodes; node++) {
			  sprintf(refcnt, "/hw/nodenum/%d/refcnt", node);
			  if (verbose) {
				  printf("Opening dev %s0, refcnt);
			  }

			  if ((fd = open(refcnt, O_RDONLY)) < 0) {
				  perror("open");
				  exit(1);
			  }

			  /* get rcb info */

			  rcbinfo[node]	= (rcb_info_t*)malloc(sizeof(rcb_info_t));
			  if (rcbinfo[node] == NULL) {
				  perror("malloc");
				  exit(1);
			  }

			  if (ioctl(fd,	RCB_INFO_GET, rcbinfo[node]) < 0) {
				  perror("icctl	RCB_INFO_GET");
				  exit(1);
			  }

			  /* get phys mem config */

			  slotconfig[node] =



									Page 9






refcnt(5)							     refcnt(5)



			      (rcb_slot_t*)malloc(rcbinfo[node]->rcb_num_slots *
			      sizeof(rcb_slot_t));
			  if (slotconfig[node] == NULL)	{
				  perror("malloc");
				  exit(1);
			  }

			  if (ioctl(fd,	RCB_SLOT_GET, slotconfig[node])	< 0) {
				  perror("ioctl	RCB_SLOT_GET");
				  exit(1);
			  }

			  /* map the counter buffer for	this node */
			  cbuffer[node]	=
			     (refcnt_t*)mmap(0,
					     rcbinfo[node]->rcb_len,
					     PROT_READ,	MAP_SHARED, fd,	0);
			  if (cbuffer[node] == (refcnt_t*)MAP_FAILED) {
				  perror("mmap");
				  exit(1);
			  }

			  if (close(fd)	<  0) {
				  perror("close");
				  exit(1);
			  }

		  }
	  }



     All counters in a node are	placed contiguously, but as mentioned earlier,
     memory may	not be contiguous. Therefore, the mapping between a physical
     page and its set of counters needs	to be done taking the memory gaps in
     consideration, as shown below:

	  uint
	  logb2(uint v)
	  {
		  uint r;
		  uint l;

		  r = 0;
		  l = 1;
		  while	(l < v)	{
			  r++;
			  l <<=	1;
		  }

		  return (r);
	  }



								       Page 10






refcnt(5)							     refcnt(5)



	  refcnt_t*
	  paddr_to_setbase(int node, __uint64_t	paddr)
	  {
		  int slot_index;
		  int s;
		  uint set_offset;
		  int btoset_shift;
		  refcnt_t* set_base;


		  btoset_shift = logb2(rcbinfo[node]->rcb_granularity);
		  slot_index = -1;
		  set_offset = 0;

		  for (s = 1; s	< rcbinfo[node]->rcb_num_slots;	s++) {
			  if (paddr < slotconfig[node][s].base)	{
				  slot_index = s - 1;
				  break;
			  }
			  set_offset +=	slotconfig[node][s - 1].size >>	btoset_shift;
		  }
		  if (slot_index < 0) {
			  fprintf(stderr, "Could not find slot0);
			  exit(1);
		  }

		  set_offset +=	(paddr - slotconfig[node][slot_index].base) >> btoset_shift;
		  set_base  = cbuffer[node] + set_offset * rcbinfo[node]->rcb_sw_counters_per_set;

		  return (set_base);
	  }



     This function finds the slot where	the physical address is	located, and
     then calculates and returns the location of the associated	set of
     reference counters.

EXAMPLES    [Toc]    [Back]

   Accessing the Reference Counters via	procfs
	  /*****************************************************************************
	   * Copyright 2000, Silicon Graphics, Inc.
	   * ALL RIGHTS	RESERVED
	   *
	   * UNPUBLISHED -- Rights reserved under the copyright	laws of	the United
	   * States.   Use of a	copyright notice is precautionary only and does	not
	   * imply publication or disclosure.
	   *
	   * U.S. GOVERNMENT RESTRICTED	RIGHTS LEGEND:
	   * Use, duplication or disclosure by the Government is subject to restrictions
	   * as	set forth in FAR 52.227.19(c)(2) or subparagraph (c)(1)(ii) of the Rights
	   * in	Technical Data and Computer Software clause at DFARS 252.227-7013 and/or



								       Page 11






refcnt(5)							     refcnt(5)



	   * in	similar	or successor clauses in	the FAR, or the	DOD or NASA FAR
	   * Supplement.  Contractor/manufacturer is Silicon Graphics, Inc.,
	   * 2011 N. Shoreline Blvd. Mountain View, CA 94039-7311.
	   *
	   * THE CONTENT OF THIS WORK CONTAINS CONFIDENTIAL AND	PROPRIETARY
	   * INFORMATION OF SILICON GRAPHICS, INC. ANY DUPLICATION, MODIFICATION,
	   * DISTRIBUTION, OR DISCLOSURE IN ANY	FORM, IN WHOLE,	OR IN PART, IS STRICTLY
	   * PROHIBITED	WITHOUT	THE PRIOR EXPRESS WRITTEN PERMISSION OF	SILICON
	   * GRAPHICS, INC.
	   ****************************************************************************/


	  #include <stdio.h>
	  #include <string.h>
	  #include <unistd.h>
	  #include <malloc.h>
	  #include <sys/types.h>
	  #include <sys/mman.h>
	  #include <sys/stat.h>
	  #include <fcntl.h>
	  #include <sys/prctl.h>
	  #include <procfs/procfs.h>
	  #include <sys/pmo.h>
	  #include <sys/syssgi.h>
	  #include <sys/sysmp.h>
	  #include <sys/SN/hwcntrs.h>

	  #define HPSIZE	   (0x1000)
	  #define HPSIZE_MASK	   (HPSIZE-1)
	  #define HPSIZE_SHIFT	   (12)
	  #define DATA_POOL_SIZE   (128*1024)
	  #define CACHE_TRASH_SIZE ((4*1024*1024)/sizeof(long))

	  char data_pool[DATA_POOL_SIZE];
	  long cache_trash_buffer[CACHE_TRASH_SIZE];


	  void
	  place_data(char* vaddr, int size, char* node)
	  {
		  pmo_handle_t mld;
		  pmo_handle_t mldset;
		  raff_info_t  rafflist;
		  pmo_handle_t pm;
		  policy_set_t policy_set;

		  if ((mld = mld_create(0, size)) < 0) {
			  perror("mld_create");
			  exit(1);
		  }

		  if ((mldset =	mldset_create(&mld, 1))	< 0) {



								       Page 12






refcnt(5)							     refcnt(5)



			  perror("mldst_create");
			  exit(1);
		  }

		  rafflist.resource = node;
		  rafflist.restype = RAFFIDT_NAME;
		  rafflist.reslen = (ushort)strlen(node);
		  rafflist.radius = 0;
		  rafflist.attr	= RAFFATTR_ATTRACTION;

		  if (mldset_place(mldset,
				   TOPOLOGY_PHYSNODES,
				   &rafflist,
				   1,
				   RQMODE_ADVISORY) < 0) {
			  perror("mldset_place");
			  exit(1);
		  }

		  pm_filldefault(&policy_set);

		  policy_set.placement_policy_name = "PlacementFixed";
		  policy_set.placement_policy_args = (void*)mld;
		  policy_set.migration_policy_name = "MigrationRefcnt";
		  policy_set.migration_policy_args = NULL;

		  if ((pm = pm_create(&policy_set)) < 0) {
			  perror("pm_create");
			  exit(1);
		  }

		  if (pm_attach(pm, vaddr, size) < 0) {
			  perror("pm_attach");
			  exit(1);
		  }

	  }

	  void
	  place_process(char* node)
	  {
		  pmo_handle_t mld;
		  pmo_handle_t mldset;
		  raff_info_t  rafflist;

		  /*
		   * The mld, radius = 0 (from one node	only)
		   */

		  if ((mld = mld_create(0, 0)) < 0) {
			  perror("mld_create");
			  exit(1);



								       Page 13






refcnt(5)							     refcnt(5)



		  }

		  /*
		   * The mldset
		   */

		  if ((mldset =	mldset_create(&mld, 1))	< 0) {
			  perror("mldset_create");
			  exit(1);
		  }

		  /*
		   * Placing the mldset	with the one mld
		   */

		  rafflist.resource = node;
		  rafflist.restype = RAFFIDT_NAME;
		  rafflist.reslen = (ushort)strlen(node);
		  rafflist.radius = 0;
		  rafflist.attr	= RAFFATTR_ATTRACTION;

		  if (mldset_place(mldset,
				   TOPOLOGY_PHYSNODES,
				   &rafflist, 1,
				   RQMODE_ADVISORY) < 0) {
			  perror("mldset_place");
			  exit(1);
		  }

		  /*
		   * Attach this process to run	only on	the node
		   * where thr mld has been placed.
		   */

		  if (process_mldlink(0, mld, RQMODE_MANDATORY)	< 0) {
			  perror("process_mldlink");
			  exit(1);
		  }

	  }


	  void
	  print_refcounters(char* vaddr, int len)
	  {
		  pid_t	pid = getpid();
		  char	pfile[256];
		  int fd;
		  sn0_refcnt_buf_t* refcnt_buffer;
		  sn0_refcnt_buf_t* direct_refcnt_buffer;
		  sn0_refcnt_args_t* refcnt_args;
		  int npages;



								       Page 14






refcnt(5)							     refcnt(5)



		  int numnodes;
		  int page;
		  int node;

		  sprintf(pfile, "/proc/%05d", pid);
		  if ((fd = open(pfile,	O_RDONLY)) < 0)	{
		    fprintf(stderr,"Can't open /proc/%d", pid);
		    exit(1);
	       }

		  vaddr	= (char	*)( (unsigned long)vaddr & ~HPSIZE_MASK	);
		  npages = (len	+ HPSIZE_MASK) >> (HPSIZE_SHIFT);

		  if ((refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }

		  if ((direct_refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) *	npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }

		  if ((refcnt_args = malloc(sizeof(sn0_refcnt_args_t)))	== NULL) {
			  perror("malloc refcnt_args");
			  exit(1);
		  }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = refcnt_buffer;

		  if (ioctl(fd,	PIOCGETSN0EXTREFCNTRS, (void *)refcnt_args) < 0) {
		    perror("ioctl  PIOCGETSN0EXTREFCNTRS returns error");
		    exit(1);
	       }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = direct_refcnt_buffer;

		  if (ioctl(fd,	PIOCGETSN0REFCNTRS, (void *)refcnt_args) < 0) {
		    perror("ioctl  PIOCGETSN0REFCNTRS returns error");
		    exit(1);
	       }

		  if ((numnodes	= sysmp(MP_NUMNODES)) <	0) {
			  perror("sysmp	MP_NUMNODES");
			  exit(1);
		  }

		  for (page = 0; page <	npages;	page++)	{



								       Page 15






refcnt(5)							     refcnt(5)



			  printf("page[%05d, 0x%lx, 0x%llx (0x%llx)]:",
				 page,
				 vaddr + page*0x1000,
				 refcnt_buffer[page].paddr,
				 refcnt_buffer[page].paddr >> 14);
			  for (node = 0; node <	numnodes; node++) {
				  printf(" %05lld (%06lld)",
					 refcnt_buffer[page].refcnt_set.refcnt[node],
					 direct_refcnt_buffer[page].refcnt_set.refcnt[node]);
			  }
			  printf("0);
		  }

		  close(fd);
		  free(refcnt_args);
		  free(refcnt_buffer);
	  }


	  void
	  init_buffer(void* m, size_t size)
	  {
		  size_t i;
		  char*	p = (char*)m;

		  for (i = 0; i	< size;	i++) {
			  p[i] = (char)i;
		  }
	  }

	  long
	  buffer_auto_dotproduct_update(void* m, size_t	size)
	  {
		  size_t i;
		  size_t j;
		  char*	p = (char*)m;
		  long sum = 0;


		  for (i = 0, j	= size - 1; i <	size; i++, j--)	{
			  sum += (long)p[i]-- *	(long)p[j]++;
		  }

		  return (sum);
	  }

	  long
	  cache_trash(long* m, size_t long_size)
	  {
		  int i;
		  long sum = 0;




								       Page 16






refcnt(5)							     refcnt(5)



		  for (i = 0; i	< long_size; i++) {
			 m[i] =	i;
		  }

		  for (i = 0; i	< long_size; i++) {
			  sum += m[i];
		  }

		  return (sum);
	  }

	  void
	  do_stuff(void* m, size_t size, int loops, char* label)
	  {
		  int64_t total	= 0;
		  int count = loops;

		  while	(count--) {
			  total	+= buffer_auto_dotproduct_update(m, size);
			  total	+= cache_trash(cache_trash_buffer, CACHE_TRASH_SIZE);
		  }
		  printf("{%s},	sum after %d loops: 0x%llx0, label, loops, total);
	  }

	  void
	  main(int argc, char**	argv)
	  {
		  char*	thread_node;
		  char*	mem_node;

		  if (argc != 3) {
			  fprintf(stderr, "Usage %s <thread-node> <mem-node>0, argv[0]);
			  exit(1);
		  }

		  thread_node =	argv[1];
		  mem_node = argv[2];


		  place_data(&data_pool[0], DATA_POOL_SIZE, mem_node);
		  init_buffer(&data_pool[0], DATA_POOL_SIZE);

		  /*
		   * Place process
		   */
		  place_process(thread_node);

		  /*
		   * Reference pages & print refcnt
		   */

		  do_stuff(data_pool, DATA_POOL_SIZE, 100, "BUFFER");



								       Page 17






refcnt(5)							     refcnt(5)



		  print_refcounters(data_pool, DATA_POOL_SIZE);
	  }


     The program above places a	data buffer and	the running process on nodes
     specified on the command line. When the data buffer is placed, we also
     enable reference counting by specifying the migration policy to be
     "MigrationRefcnt".	Then we	just access the	buffer several times, making
     sure that we flush	the cache between loops.  At the end, we print both
     the extended reference counters and the hardware reference	counters for
     all hardware pages	being used for the data	buffer.

     For a machine with	the following configuration

	  System Configuration
	  # hinv
	  FPU: MIPS R10010 Floating Point Chip Revision: 0.0
	  CPU: MIPS R10000 Processor Chip Revision: 2.6
	  16 180 MHZ IP27 Processors
	  Main memory size: 2048 Mbytes
	  Instruction cache size: 32 Kbytes
	  Data cache size: 32 Kbytes
	  Secondary unified instruction/data cache size: 1 Mbyte

	  Topology

	  # topology

	  Machine ricotta has 16 cpu's,	8 memory nodes,	and 4 routers.

	  The cpus are:
	  cpu	0 is /hw/module/2/slot/n1/node/cpu/a
	  cpu	1 is /hw/module/2/slot/n1/node/cpu/b
	  cpu	2 is /hw/module/2/slot/n2/node/cpu/a
	  cpu	3 is /hw/module/2/slot/n2/node/cpu/b
	  cpu	4 is /hw/module/2/slot/n3/node/cpu/a
	  cpu	5 is /hw/module/2/slot/n3/node/cpu/b
	  cpu	6 is /hw/module/2/slot/n4/node/cpu/a
	  cpu	7 is /hw/module/2/slot/n4/node/cpu/b
	  cpu	8 is /hw/module/1/slot/n1/node/cpu/a
	  cpu	9 is /hw/module/1/slot/n1/node/cpu/b
	  cpu  10 is /hw/module/1/slot/n2/node/cpu/a
	  cpu  11 is /hw/module/1/slot/n2/node/cpu/b
	  cpu  12 is /hw/module/1/slot/n3/node/cpu/a
	  cpu  13 is /hw/module/1/slot/n3/node/cpu/b
	  cpu  14 is /hw/module/1/slot/n4/node/cpu/a
	  cpu  15 is /hw/module/1/slot/n4/node/cpu/b

	  The nodes are:
	  /hw/module/1/slot/n1/node
	  /hw/module/1/slot/n2/node
	  /hw/module/1/slot/n3/node



								       Page 18






refcnt(5)							     refcnt(5)



	  /hw/module/1/slot/n4/node
	  /hw/module/2/slot/n1/node
	  /hw/module/2/slot/n2/node
	  /hw/module/2/slot/n3/node
	  /hw/module/2/slot/n4/node

	  The routers are:
	  /hw/module/1/slot/r1/router
	  /hw/module/1/slot/r2/router
	  /hw/module/2/slot/r1/router
	  /hw/module/2/slot/r2/router

	  The topology is defined by:
	  /hw/module/1/slot/n1/node/link -> /hw/module/1/slot/r1/router
	  /hw/module/1/slot/n2/node/link -> /hw/module/1/slot/r1/router
	  /hw/module/1/slot/n3/node/link -> /hw/module/1/slot/r2/router
	  /hw/module/1/slot/n4/node/link -> /hw/module/1/slot/r2/router
	  /hw/module/2/slot/n1/node/link -> /hw/module/2/slot/r1/router
	  /hw/module/2/slot/n2/node/link -> /hw/module/2/slot/r1/router
	  /hw/module/2/slot/n3/node/link -> /hw/module/2/slot/r2/router
	  /hw/module/2/slot/n4/node/link -> /hw/module/2/slot/r2/router

	  /hw/module/1/slot/r1/router/1	-> /hw/module/2/slot/r1/router
	  /hw/module/1/slot/r1/router/4	-> /hw/module/1/slot/n2/node
	  /hw/module/1/slot/r1/router/5	-> /hw/module/1/slot/n1/node
	  /hw/module/1/slot/r1/router/6	-> /hw/module/1/slot/r2/router
	  /hw/module/1/slot/r2/router/1	-> /hw/module/2/slot/r2/router
	  /hw/module/1/slot/r2/router/4	-> /hw/module/1/slot/n4/node
	  /hw/module/1/slot/r2/router/5	-> /hw/module/1/slot/n3/node
	  /hw/module/1/slot/r2/router/6	-> /hw/module/1/slot/r1/router
	  /hw/module/2/slot/r1/router/1	-> /hw/module/1/slot/r1/router
	  /hw/module/2/slot/r1/router/4	-> /hw/module/2/slot/n2/node
	  /hw/module/2/slot/r1/router/5	-> /hw/module/2/slot/n1/node
	  /hw/module/2/slot/r1/router/6	-> /hw/module/2/slot/r2/router
	  /hw/module/2/slot/r2/router/1	-> /hw/module/1/slot/r2/router
	  /hw/module/2/slot/r2/router/4	-> /hw/module/2/slot/n4/node
	  /hw/module/2/slot/r2/router/5	-> /hw/module/2/slot/n3/node
	  /hw/module/2/slot/r2/router/6	-> /hw/module/2/slot/r1/router


     we	obtain the following output when running the example program:

	  # ./refcnt_procfs /hw/module/2/slot/n3/node /hw/module/2/slot/n3/node
	  {BUFFER}, sum	after 100 loops: 0xee780000
	  page[00000, 0x10002000, 0x207ece000 (0x81fb3)]: 00000	(000038) 00000 (000000)
	  00000	(002047) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000) 00000 (000000)
	  page[00001, 0x10003000, 0x207ecf000 (0x81fb3)]: 00000	(000065) 00000 (000000)
	   00000 (002047) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00002, 0x10004000, 0x2278d0000 (0x89e34)]: 00041	(000000) 00000 (000000)
	   01793 (001569) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00003, 0x10005000, 0x2278d1000 (0x89e34)]: 00033	(000000) 00000 (000000)
	   01664 (001504) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)



								       Page 19






refcnt(5)							     refcnt(5)



	  page[00004, 0x10006000, 0x2278d2000 (0x89e34)]: 00032	(000000) 00000 (000000)
	   01664 (001504) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00005, 0x10007000, 0x2278d3000 (0x89e34)]: 00032	(000000) 00000 (000000)
	   01664 (001504) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00006, 0x10008000, 0x207cd4000 (0x81f35)]: 00048	(000000) 00000 (000000)
	   03136 (000032) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00007, 0x10009000, 0x207cd5000 (0x81f35)]: 00039	(000000) 00000 (000000)
	   03586 (000068) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00008, 0x1000a000, 0x207cd6000 (0x81f35)]: 00041	(000000) 00000 (000000)
	   03136 (000065) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00009, 0x1000b000, 0x207cd7000 (0x81f35)]: 00042	(000000) 00000 (000000)
	   03104 (000064) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00010, 0x1000c000, 0x207ad8000 (0x81eb6)]: 00060	(000000) 00000 (000000)
	  01793	(001513) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000) 00000 (000000)
	  page[00011, 0x1000d000, 0x207ad9000 (0x81eb6)]: 00035	(000000) 00000 (000000)
	  01696	(001472) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000) 00000 (000000)
	  page[00012, 0x1000e000, 0x207ada000 (0x81eb6)]: 00032	(000000) 00000 (000000)
	  01696	(001472) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000) 00000 (000000)
	  page[00013, 0x1000f000, 0x207adb000 (0x81eb6)]: 00035	(000000) 00000 (000000)
	   01696 (001472) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00014, 0x10010000, 0x2068dc000 (0x81a37)]: 00041	(000000) 00000 (000000)
	   01793 (001375) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00015, 0x10011000, 0x2068dd000 (0x81a37)]: 00034	(000000) 00000 (000000)
	   01792 (001376) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00016, 0x10012000, 0x2068de000 (0x81a37)]: 00034	(000000) 00000 (000000)
	   01792 (001376) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00017, 0x10013000, 0x2068df000 (0x81a37)]: 00034	(000000) 00000 (000000)
	   01792 (001376) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00018, 0x10014000, 0x206be0000 (0x81af8)]: 00035	(000000) 00000 (000000)
	   01632 (001536) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00019, 0x10015000, 0x206be1000 (0x81af8)]: 00039	(000000) 00000 (000000)
	   01632 (001536) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00020, 0x10016000, 0x206be2000 (0x81af8)]: 00034	(000000) 00000 (000000)
	   01793 (001636) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00021, 0x10017000, 0x206be3000 (0x81af8)]: 00035	(000000) 00000 (000000)
	   01664 (001504) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00022, 0x10018000, 0x226ce4000 (0x89b39)]: 00051	(000000) 00000 (000000)
	   01793 (001515) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00023, 0x10019000, 0x226ce5000 (0x89b39)]: 00044	(000000) 00000 (000000)
	   01728 (001440) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00024, 0x1001a000, 0x226ce6000 (0x89b39)]: 00037	(000000) 00000 (000000)
	   01728 (001440) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00025, 0x1001b000, 0x226ce7000 (0x89b39)]: 00034	(000000) 00000 (000000)
	   01728 (001440) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00026, 0x1001c000, 0x2066e8000 (0x819ba)]: 00033	(000000) 00000 (000000)
	  02741	(000529) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000) 00000 (000000)
	  page[00027, 0x1001d000, 0x2066e9000 (0x819ba)]: 00041	(000000) 00000 (000000)
	   03586 (000680) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00028, 0x1001e000, 0x2066ea000 (0x819ba)]: 00033	(000000) 00000 (000000)
	   02688 (000480) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00029, 0x1001f000, 0x2066eb000 (0x819ba)]: 00034	(000000) 00000 (000000)
	   02688 (000480) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)



								       Page 20






refcnt(5)							     refcnt(5)



	  page[00030, 0x10020000, 0x2200ec000 (0x8803b)]: 00045	(000000) 00000 (000000)
	   02688 (000480) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)
	  page[00031, 0x10021000, 0x2200ed000 (0x8803b)]: 00033	(000000) 00000 (000000)
	   02688 (000480) 00000	(000000) 00000 (000000)	00000 (000000) 00000 (000000) 00000 (000000)


     We	place the data buffer and the process on the same node.	In this	case
     we	chose node /hw/module/2/slot/n3/node, which corresponds	to cpus	4 and
     5 according to the	information obtained using the command "topology"

	  cpu	4 is /hw/module/2/slot/n3/node/cpu/a
	  cpu	5 is /hw/module/2/slot/n3/node/cpu/b


     which corresponds to a node with cnodeid 2.

     We	print one record per hardware page. Each record	shows a	page number
     within the	data buffer, the virtual address for the page, the physical
     hardware page associated with the virtual address,	and the	page frame
     number for	the physical page. Then	follows	a list of counters, two	values
     per node: the first counter of each pair is the extended reference
     counter, and the second counter of	each pair is the actual	hardware
     reference counter.

     As	expected, we see that the counters for node 2 show a high count.

   Accessing the extended reference counters via mmap    [Toc]    [Back]
     The following example mmaps the counter buffer, and uses both procfs and
     the mmapped buffer	to access and print out	the counts.

	  /*****************************************************************************
	   * Copyright 2000, Silicon Graphics, Inc.
	   * ALL RIGHTS	RESERVED
	   *
	   * UNPUBLISHED -- Rights reserved under the copyright	laws of	the United
	   * States.   Use of a	copyright notice is precautionary only and does	not
	   * imply publication or disclosure.
	   *
	   * U.S. GOVERNMENT RESTRICTED	RIGHTS LEGEND:
	   * Use, duplication or disclosure by the Government is subject to restrictions
	   * as	set forth in FAR 52.227.19(c)(2) or subparagraph (c)(1)(ii) of the Rights
	   * in	Technical Data and Computer Software clause at DFARS 252.227-7013 and/or
	   * in	similar	or successor clauses in	the FAR, or the	DOD or NASA FAR
	   * Supplement.  Contractor/manufacturer is Silicon Graphics, Inc.,
	   * 2011 N. Shoreline Blvd. Mountain View, CA 94039-7311.
	   *
	   * THE CONTENT OF THIS WORK CONTAINS CONFIDENTIAL AND	PROPRIETARY
	   * INFORMATION OF SILICON GRAPHICS, INC. ANY DUPLICATION, MODIFICATION,
	   * DISTRIBUTION, OR DISCLOSURE IN ANY	FORM, IN WHOLE,	OR IN PART, IS STRICTLY
	   * PROHIBITED	WITHOUT	THE PRIOR EXPRESS WRITTEN PERMISSION OF	SILICON
	   * GRAPHICS, INC.
	   ****************************************************************************/



								       Page 21






refcnt(5)							     refcnt(5)



	  #include <stdio.h>
	  #include <string.h>
	  #include <unistd.h>
	  #include <malloc.h>
	  #include <sys/types.h>
	  #include <sys/mman.h>
	  #include <sys/stat.h>
	  #include <fcntl.h>
	  #include <sys/prctl.h>
	  #include <procfs/procfs.h>
	  #include <sys/pmo.h>
	  #include <sys/syssgi.h>
	  #include <sys/sysmp.h>
	  #include <sys/SN/hwcntrs.h>


	  #define DATA_POOL_SIZE  (8*16*1024)
	  #define CACHE_TRASH_SIZE ((4*1024*1024)/sizeof(long))

	  char fixed_data_pool[DATA_POOL_SIZE];
	  long cache_trash_buffer[CACHE_TRASH_SIZE];

	  /*
	   * Reference Counter Configuration for all nodes
	   */
	  rcb_info_t** rcbinfo;

	  /*
	   * Hardware Page Size
	   */
	  uint hw_page_size;

	  /*
	   * Physical Memory Config for	all nodes
	   */
	  rcb_slot_t** slotconfig;

	  /*
	   * Mapped counters for all nodes
	   */
	  refcnt_t** cbuffer;

	  /*
	   * Verbose ?
	   */

	  int verbose =	0;

	  void
	  print_rcb(int	node, rcb_info_t* rcb, rcb_slot_t* slot)
	  {
		  int s;



								       Page 22






refcnt(5)							     refcnt(5)




		  printf("RCB for node [%d]0, node);

		  printf("rcb_len: %lld0, rcb->rcb_len);
		  printf("rcb_sw_sets: %d0, rcb->rcb_sw_sets);
		  printf("rcb_sw_counters_per_set: %d0,	rcb->rcb_sw_counters_per_set);
		  printf("rcb_sw_counter_size: %d0, rcb->rcb_sw_counter_size);

		  printf("rcb_base_pages: %d0, rcb->rcb_base_pages);
		  printf("rcb_base_page_size: %d0, rcb->rcb_base_page_size);
		  printf("rcb_base_paddr: 0x%llx0, rcb->rcb_base_paddr);

		  printf("rcb_cnodeid: %d0, rcb->rcb_cnodeid);
		  printf("rcb_granularity: %d0,	rcb->rcb_granularity);
		  printf("rcb_hw_counter_max: %d0, rcb->rcb_hw_counter_max);
		  printf("rcb_diff_threshold: %d0, rcb->rcb_diff_threshold);
		  printf("rcb_abs_threshold: %d0, rcb->rcb_abs_threshold);

		  for (s = 0; s	< rcb->rcb_num_slots; s++) {
			  printf("Slot[%d]: 0x%llx -> 0x%llx, size: 0x%llx0,
				 s, slot[s].base, slot[s].base + slot[s].size, slot[s].size);
		  }
	  }

	  void
	  mmap_counters(void)
	  {
		  int fd;
		  char refcnt[1024];
		  refcnt_t* set_base;
		  int numnodes;
		  int node;

		  /* number of nodes */

		  numnodes = sysmp(MP_NUMNODES);

		  /* space for refcnt config --	just basic array for now */

		  rcbinfo = (rcb_info_t**)malloc(sizeof(rcb_info_t*) * numnodes);
		  if (rcbinfo == NULL) {
			  perror("malloc");
			  exit(1);
		  }

		  /* space for phys mem	config -- just basic array for now*/

		  slotconfig = (rcb_slot_t**)malloc(sizeof(rcb_slot_t*)	* numnodes);
		  if (slotconfig == NULL) {
			  perror("malloc");
			  exit(1);
		  }



								       Page 23






refcnt(5)							     refcnt(5)



		  /* space for array of	pointers to the	counter	buffers	*/
		  cbuffer = (refcnt_t**)malloc(sizeof(refcnt_t*) * numnodes);
		  if (cbuffer == NULL) {
			  perror("malloc");
			  exit(1);
		  }

		  for (node = 0; node <	numnodes; node++) {
			  sprintf(refcnt, "/hw/nodenum/%d/refcnt", node);
			  if (verbose) {
				  printf("Opening dev %s0, refcnt);
			  }

			  if ((fd = open(refcnt, O_RDONLY)) < 0) {
				  perror("open");
				  exit(1);
			  }

			  /* get rcb info */

			  rcbinfo[node]	= (rcb_info_t*)malloc(sizeof(rcb_info_t));
			  if (rcbinfo[node] == NULL) {
				  perror("malloc");
				  exit(1);
			  }

			  if (ioctl(fd,	RCB_INFO_GET, rcbinfo[node]) < 0) {
				  perror("icctl	RCB_INFO_GET");
				  exit(1);
			  }

			  /* get phys mem config */

			  slotconfig[node] = (rcb_slot_t*)malloc(rcbinfo[node]->rcb_num_slots *
					      sizeof(rcb_slot_t));
			  if (slotconfig[node] == NULL)	{
				  perror("malloc");
				  exit(1);
			  }

			  if (ioctl(fd,	RCB_SLOT_GET, slotconfig[node])	< 0) {
				  perror("ioctl	RCB_SLOT_GET");
				  exit(1);
			  }

			  /* map the counter buffer for	this node */
			  cbuffer[node]	= (refcnt_t*)mmap(0, rcbinfo[node]->rcb_len,
							  PROT_READ, MAP_SHARED, fd, 0);
			  if (cbuffer[node] == (refcnt_t*)MAP_FAILED) {
				  perror("mmap");
				  exit(1);
			  }



								       Page 24






refcnt(5)							     refcnt(5)



			  if (verbose) {
				  print_rcb(node, rcbinfo[node], slotconfig[node]);
			  }

			  if (close(fd)	<  0) {
				  perror("close");
				  exit(1);
			  }

		  }
	  }


	  uint
	  logb2(uint v)
	  {
		  uint r;
		  uint l;

		  r = 0;
		  l = 1;
		  while	(l < v)	{
			  r++;
			  l <<=	1;
		  }

		  return (r);
	  }


	  refcnt_t*
	  paddr_to_setbase(int node, __uint64_t	paddr)
	  {
		  int slot_index;
		  int s;
		  uint set_offset;
		  int btoset_shift;
		  refcnt_t* set_base;


		  btoset_shift = logb2(rcbinfo[node]->rcb_granularity);
		  slot_index = -1;
		  set_offset = 0;

		  for (s = 1; s	< rcbinfo[node]->rcb_num_slots;	s++) {
			  if (paddr < slotconfig[node][s].base)	{
				  slot_index = s - 1;
				  break;
			  }
			  set_offset +=	slotconfig[node][s - 1].size >>	btoset_shift;
		  }
		  if (slot_index < 0) {



								       Page 25






refcnt(5)							     refcnt(5)



			  fprintf(stderr, "Could not find slot0);
			  exit(1);
		  }

		  set_offset +=	(paddr - slotconfig[node][slot_index].base) >> btoset_shift;
		  set_base  = cbuffer[node] + set_offset * rcbinfo[node]->rcb_sw_counters_per_set;

		  return (set_base);
	  }

	  void
	  place_data(char* vaddr, int size, char* node,	int migr_on)
	  {
		  pmo_handle_t mld;
		  pmo_handle_t mldset;
		  raff_info_t  rafflist;
		  pmo_handle_t pm;
		  policy_set_t policy_set;
		  migr_policy_uparms_t migr_parms;

		  if ((mld = mld_create(0, size)) < 0) {
			  perror("mld_create");
			  exit(1);
		  }

		  if ((mldset =	mldset_create(&mld, 1))	< 0) {
			  perror("mldst_create");
			  exit(1);
		  }

		  rafflist.resource = node;
		  rafflist.restype = RAFFIDT_NAME;
		  rafflist.reslen = (ushort)strlen(node);
		  rafflist.radius = 0;
		  rafflist.attr	= RAFFATTR_ATTRACTION;

		  if (mldset_place(mldset,
				   TOPOLOGY_PHYSNODES,
				   &rafflist,
				   1,
				   RQMODE_ADVISORY) < 0) {
			  perror("mldset_place");
			  exit(1);
		  }

		  pm_filldefault(&policy_set);

		  policy_set.placement_policy_name = "PlacementFixed";
		  policy_set.placement_policy_args = (void*)mld;
		  policy_set.migration_policy_name = "MigrationRefcnt";
		  policy_set.migration_policy_args = NULL;




								       Page 26






refcnt(5)							     refcnt(5)



		  if ((pm = pm_create(&policy_set)) < 0) {
			  perror("pm_create");
			  exit(1);
		  }

		  if (pm_attach(pm, vaddr, size) < 0) {
			  perror("pm_attach");
			  exit(1);
		  }

	  }

	  void
	  place_process(char* node)
	  {
		  pmo_handle_t mld;
		  pmo_handle_t mldset;
		  raff_info_t  rafflist;

		  /*
		   * The mld, radius = 0 (from one node	only)
		   */

		  if ((mld = mld_create(0, 0)) < 0) {
			  perror("mld_create");
			  exit(1);
		  }

		  /*
		   * The mldset
		   */

		  if ((mldset =	mldset_create(&mld, 1))	< 0) {
			  perror("mldset_create");
			  exit(1);
		  }

		  /*
		   * Placing the mldset	with the one mld
		   */

		  rafflist.resource = node;
		  rafflist.restype = RAFFIDT_NAME;
		  rafflist.reslen = (ushort)strlen(node);
		  rafflist.radius = 0;
		  rafflist.attr	= RAFFATTR_ATTRACTION;

		  if (mldset_place(mldset,
				   TOPOLOGY_PHYSNODES,
				   &rafflist, 1,
				   RQMODE_ADVISORY) < 0) {
			  perror("mldset_place");



								       Page 27






refcnt(5)							     refcnt(5)



			  exit(1);
		  }

		  /*
		   * Attach this process to run	only on	the node
		   * where thr mld has been placed.
		   */

		  if (process_mldlink(0, mld, RQMODE_MANDATORY)	< 0) {
			  perror("process_mldlink");
			  exit(1);
		  }

	  }


	  void
	  print_refcounters(char* vaddr, int len)
	  {
		  pid_t	pid = getpid();
		  char	pfile[256];
		  int fd;
		  sn0_refcnt_buf_t* refcnt_buffer;
		  sn0_refcnt_buf_t* direct_refcnt_buffer;
		  sn0_refcnt_args_t* refcnt_args;
		  int npages;
		  int gen_start;
		  int numnodes;
		  int page;
		  int node;
		  char mem_node[512];
		  refcnt_t* set_base;

		  sprintf(pfile, "/proc/%05d", pid);
		  if ((fd = open(pfile,	O_RDONLY)) < 0)	{
		    fprintf(stderr,"Can't open /proc/%d", pid);
		    exit(1);
	       }

		  vaddr	= (char	*)( (unsigned long)vaddr & ~(hw_page_size-1) );
		  npages = (len	+ (hw_page_size-1)) >> logb2(hw_page_size);

		  if ((refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }

		  if ((direct_refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) *	npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }




								       Page 28






refcnt(5)							     refcnt(5)



		  if ((refcnt_args = malloc(sizeof(sn0_refcnt_args_t)))	== NULL) {
			  perror("malloc refcnt_args");
			  exit(1);
		  }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = refcnt_buffer;

		  if ((gen_start = ioctl(fd, PIOCGETSN0EXTREFCNTRS, (void *)refcnt_args)) < 0) {
		    perror("ioctl  PIOCGETSN0EXTREFCNTRS returns error");
		    exit(1);
	       }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = direct_refcnt_buffer;
		  if ((gen_start = ioctl(fd, PIOCGETSN0REFCNTRS, (void *)refcnt_args)) < 0) {
		    perror("ioctl  PIOCGETSN0REFCNTRS returns error");
		    exit(1);
	       }

		  if ((numnodes	= sysmp(MP_NUMNODES)) <	0) {
			  perror("sysmp	MP_NUMNODES");
			  exit(1);
		  }

		  for (page = 0; page <	npages;	page++)	{
			  printf("page[%05d, 0x%lx, 0x%llx (0x%llx)]:",
				 page,
				 vaddr + page*0x1000,
				 refcnt_buffer[page].paddr,
				 refcnt_buffer[page].paddr >> 14);
			  for (node = 0; node <	numnodes; node++) {
				  printf(" %05llu (%06llu)",
					 refcnt_buffer[page].refcnt_set.refcnt[node],
					 direct_refcnt_buffer[page].refcnt_set.refcnt[node]);
			  }
			  printf("0);

			  set_base = paddr_to_setbase(refcnt_buffer[page].cnodeid,
						      refcnt_buffer[page].paddr);
			  printf("MMAPPED CTRS:	");
			  for (node = 0; node <	numnodes; node++) {
				  printf(" %05llu (%06llu)",
					 set_base[node],
					 direct_refcnt_buffer[page].refcnt_set.refcnt[node]);
			  }
			  printf("0);
		  }

		  close(fd);



								       Page 29






refcnt(5)							     refcnt(5)



		  free(refcnt_args);
		  free(refcnt_buffer);
	  }

	  void
	  check_refcounters(char* vaddr, int len)
	  {
		  pid_t	pid = getpid();
		  char	pfile[256];
		  int fd;
		  sn0_refcnt_buf_t* refcnt_buffer;
		  sn0_refcnt_buf_t* direct_refcnt_buffer;
		  sn0_refcnt_args_t* refcnt_args;
		  int npages;
		  int gen_start;
		  int numnodes;
		  int page;
		  int node;
		  char mem_node[512];
		  refcnt_t* set_base;

		  sprintf(pfile, "/proc/%05d", pid);
		  if ((fd = open(pfile,	O_RDONLY)) < 0)	{
		    fprintf(stderr,"Can't open /proc/%d", pid);
		    exit(1);
	       }

		  vaddr	= (char	*)( (unsigned long)vaddr & ~0xfff );
		  npages = (len	+ 0xfff) >> 12;

		  if ((refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }

		  if ((direct_refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) *	npages)) == NULL) {
			  perror("malloc refcnt_buffer");
			  exit(1);
		  }

		  if ((refcnt_args = malloc(sizeof(sn0_refcnt_args_t)))	== NULL) {
			  perror("malloc refcnt_args");
			  exit(1);
		  }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = refcnt_buffer;

		  if ((gen_start = ioctl(fd, PIOCGETSN0EXTREFCNTRS, (void *)refcnt_args)) < 0) {
		    perror("ioctl  PIOCGETSN0EXTREFCNTRS returns error");
		    exit(1);



								       Page 30






refcnt(5)							     refcnt(5)



	      }

		  refcnt_args->vaddr = (__uint64_t)vaddr;
		  refcnt_args->len = len;
		  refcnt_args->buf = direct_refcnt_buffer;
		  if ((gen_start = ioctl(fd, PIOCGETSN0REFCNTRS, (void *)refcnt_args)) < 0) {
		    perror("ioctl  PIOCGETSN0REFCNTRS returns error");
		    exit(1);
	       }

		  if ((numnodes	= sysmp(MP_NUMNODES)) <	0) {
			  perror("sysmp	MP_NUMNODES");
			  exit(1);
		  }

		  for (page = 0; page <	npages;	page++)	{
			  set_base = paddr_to_setbase(refcnt_buffer[page].cnodeid,
						      refcnt_buffer[page].paddr);
			  for (node = 0; node <	numnodes; node++) {
				  if (refcnt_buffer[page].refcnt_set.refcnt[node] !=
				      set_base[node]) {
					  if (verbose) {
					       fprintf(stderr,
					       "DIFF: procf-refcnt: %lld, mmapped-refcnt: %lld0,
					       refcnt_buffer[page].refcnt_set.refcnt[node],
					       set_base[node]);
					  }
				  }
			  }
		  }

		  close(fd);
		  free(refcnt_args);
		  free(refcnt_buffer);
	  }



	  void
	  init_buffer(void* m, size_t size)
	  {
		  size_t i;
		  char*	p = (char*)m;

		  for (i = 0; i	< size;	i++) {
			  p[i] = (char)i;
		  }
	  }

	  long
	  buffer_auto_dotproduct_update(void* m, size_t	size)
	  {



								       Page 31






refcnt(5)							     refcnt(5)



		  size_t i;
		  size_t j;
		  char*	p = (char*)m;
		  long sum = 0;


		  for (i = 0, j	= size - 1; i <	size; i++, j--)	{
			  sum += (long)p[i]-- *	(long)p[j]++;
		  }

		  return (sum);
	  }

	  long
	  cache_trash(long* m, size_t long_size)
	  {
		  int i;
		  long sum = 0;

		  for (i = 0; i	< long_size; i++) {
			 m[i] =	i;
		  }

		  for (i = 0; i	< long_size; i++) {
			  sum += m[i];
		  }

		  return (sum);
	  }



	  void
	  do_stuff(void* m, size_t size, int loops, char* label)
	  {
		  int64_t total	= 0;
		  int count = loops;

		  while	(count--) {
			  total	+= buffer_auto_dotproduct_update(m, size);
			  total	+= cache_trash(cache_trash_buffer, CACHE_TRASH_SIZE);
		  }

		  if (verbose) {
			  printf("{%s},	sum after %d loops: 0x%llx0, label, loops, total);
		  }
	  }



	  void
	  main(int argc, char**	argv)



								       Page 32






refcnt(5)							     refcnt(5)



	  {
		  char*	thread_node;
		  char*	mem_node;

		  if (argc != 4) {
			  fprintf(stderr,
			  "Usage %s <thread-node> <mem-node> <0|1 (verbose)>0, argv[0]);
			  exit(1);
		  }

		  thread_node =	argv[1];
		  mem_node = argv[2];
		  verbose = atoi(argv[3]);

		  mmap_counters();
		  hw_page_size = rcbinfo[0]->rcb_granularity;

		  /*
		   * Place data, migr off
		   */
		  place_data(&fixed_data_pool[0], DATA_POOL_SIZE, mem_node, 0);
		  init_buffer(&fixed_data_pool[0], DATA_POOL_SIZE);

		  /*
		   * Place process
		   */
		  place_process(thread_node);


		  /*
		   * Reference pages & verify
		   */

		  do_stuff(fixed_data_pool, DATA_POOL_SIZE, 100, "FIXED");

		  if (verbose) {
			  print_refcounters(fixed_data_pool, DATA_POOL_SIZE);
		  }

		  check_refcounters(fixed_data_pool, DATA_POOL_SIZE);



	  }



     The output	on ricotta follows:

	  ricotta:migr>	mapcnt /hw/nodenum/3 /hw/nodenum/3  1
	  Opening dev /hw/nodenum/0/refcnt
	  RCB for node [0]



								       Page 33






refcnt(5)							     refcnt(5)



		  rcb_len: 4194304
		  rcb_sw_sets: 65536
		  rcb_

 Similar pages
Name OS Title
defpattern IRIX defines patterns
patterns Tru64 Patterns for use with internationalization tools
sshregex Tru64 Glob (wildcard) patterns
grep Tru64 Searches a file for patterns
fgrep Tru64 Searches a file for patterns
egrep Tru64 Searches a file for patterns
fnmatch Tru64 Match filename patterns
glob IRIX Return names of files that match patterns
numa IRIX non uniform memory access
dprof IRIX a memory access profiling tool
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service