*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->IRIX man pages -> migration (5)              
Title
Content
Arch
Section
 

Contents


migration(5)							  migration(5)


NAME    [Toc]    [Back]

     migration - dynamic memory	migration

DESCRIPTION    [Toc]    [Back]

     This document describes the dynamic memory	migration system available in
     Origin systems.


   Introduction    [Toc]    [Back]
     Dynamic page migration is a mechanism that	provides adaptive memory
     locality for applications running on a NUMA machine such as the Origin
     systems. The Origin hardware implements a competitive algorithm based on
     comparing remote memory access counters to	a local	memory access counter;
     when the difference between the numbers of	remote and local accesses goes
     beyond a preset threshold,	an interrupt is	generated to inform the
     operating system that a physical memory page is currently experiencing
     excessive remote accesses.


     Within the	interrupt handler the operating	system makes a final decision
     whether to	migrate	the page or not. If it decides to migrate the page,
     the migration is executed immediately. The	system may decide not to
     execute the migration due to enforcement of a migration control policy or
     due to lack of resources.


     Page migration can	also be	explicitly requested by	users, and in
     addition, it is used to assist the	memory coalescing algorithms for
     multiple page size	support.


   Migration Modules    [Toc]    [Back]
     The migration subsystem is	composed of the	following modules:

     - Detection Module. This module monitors memory accesses issued by	nodes
       in the system to	each physical memory page. In Origin systems this
       module is mostly	implemented in hardware. This detection	module informs
       the Migration Control Module that a page	is experiencing	excessive
       remote accesses via an interrupt	sent to	the page's home	node.

     - Migration Engine	Module.	This module carries out	data movement from a
       current physical	memory page to a new page in the node issuing the
       remote accesses.

     - Migration Control Module. This module decides whether the page should
       be migrated or not, based on migration control policies,	defined	by
       parameters such as migration threshold, bounce detection	and
       prevention, dampening factor, and others.

     - Migration Control Periodic Operations Module. This module executes all
       periodic	operations needed for the Migration Control Module.




									Page 1






migration(5)							  migration(5)



     - Memory Management Control Interface Module (MMCI	Module). This module
       provides	an interface for users to tune the migration policy associated
       with an address space.


   Migration Detection Module    [Toc]    [Back]
     The basic goal of memory migration	is to minimize memory access latency.
     In	a NUMA system where local memory access	latency	is smaller then	remote
     memory access latency, we can achieve this	latency	minimization goal by
     moving the	data to	the node where most memory references are going	to be
     issued from.

     It	would be great to be able to move data to the node where it is going
     to	be needed right	before it is referenced. Unfortunately,	we cannot
     predict the future. However, common programs usually have some amount of
     temporal and spatial locality, which allows us to heuristically predict
     future behavior based on recent past behavior.

     The usual procedure used to predict future	memory accesses	to a page is
     to	count the memory references to this page issued	by each	node in	the
     system.  If the accumulated number	of remote references becomes
     considerably greater than the number of accumulated local references,
     then it may be beneficial to migrate the page to the remote node issuing
     the references, especially	if this	remote node will continue accessing
     this same page for	a long time.

     Origin systems have counters that continuously monitor all	memory
     accesses issued by	each node in the system	to each	physical memory	page.
     In	a 64-node Origin (128 processors), we have 64 memory access counters
     for every 4-KB low	level physical page (4 KB is the size of a low level
     physical page size; software page sizes start at 16KB for Origin
     systems). For every memory	access,	the counter associated with the	node
     issuing the reference is incremented; at the same time, this counter is
     compared to the counter that keeps	track of local accesses, and if	the
     remote counter exceeds the	local counter by a threshold, an interrupt is
     generated advising	the Operating System about the existence of a page
     with excessive remote accesses.

     Upon reception of the interrupt, the Migration Control Module in the
     Operating System decides whether to migrate the page or not.

     The threshold that	determines how large the difference between remote and
     local counters needs to be	in order for the interrupt to be generated is
     stored in a per-node hardware register, which is initialized by the
     Migration Control Module. The default system threshold defined in
     /var/sysgen/mtune/numa by the tunable variables
     numa_migr_default_threshold and numa_migr_threshold_reference (see
     Migration Tunables	below),	and the	threshold specified by users as	a
     parameter of a migration policy (mmci(5)),	are not	directly stored	into
     this register due to the fact that	different pages	on the same node may
     have different migration thresholds. These	thresholds are used to
     initialize	the reference counters when a page is initialized.



									Page 2






migration(5)							  migration(5)



   Migration Engine Module    [Toc]    [Back]
     This module transparently moves a page from one physical frame to
     another. The migration engine first verifies the availability of all
     resources needed to realize the migration of a page. If all resources are
     not available, the	operation is cancelled.

     The data transfer operation may be	done using a processor or a
     specialized Block Transfer	Engine.	Translation lookaside buffer (TLB)
     shootdowns	may be done using inter-processor interrupts or	special
     hardware known as poison bits, available only as an option	on special
     Origin systems running IRIX 6.5 or	later. TLB shootdowns are needed in
     order to avoid the	use of stale translations that may be pointing to the
     physical memory page that contained the data before migration took	place.
     Normally, a TLB shootdown operation is performed by sending interrupts to
     all processors in the system with a TLB that may have stale translation
     entries. On systems with poison bits, this	global TLB shootdown is	not
     needed: along with	the data transfer operation, hardware bits are
     automatically set to indicate that	the page is now	stale (poisonous); if
     a processor tries to access this stale page via a stale translation, the
     memory management hardware	generates a special Bus	Error which causes the
     TLB with the stale	translation to be updated. Effectively,	poison bits
     allow for the implementation of a lazy TLB	shootdown algorithm.

     The vehicle used for the data transfer operation may be selected by the
     system administrator via a	tunable	variable in /var/sysgen/mtune/numa:
     numa_migr_vehicle.	Poison bit based TLB shootdowns	are enabled whenever
     the data transfer vehicle is the Block Transfer Engine and	the hardware
     is	equipped with the optional poison bits.


   Migration Control Module    [Toc]    [Back]
     This module decides whether a page	should be migrated or not after
     receiving a notification (via an interrupt) from the Migration Detection
     Module alerting that a page is experiencing excessive remote accesses.
     This decision is based on applicable migration control policies and
     resource availability.

     The basic idea behind controlling migration is that it is not always a
     good idea to migrate a page when the memory reference counters are
     telling us	that a page is experiencing excessive remote accesses; the
     page may be bouncing back and forth due to	poor application behavior, the
     counters may have accumulated too much past knowledge, making them	unfit
     to	predict	near future behavior, the destination node may have little
     free memory, or the path needed to	do the migration may be	too busy.

     The Migration Control Module applies a series of filters to a reference
     counter notification or migration request,	as enumerated below. All
     tunables mentioned	in this	list are found in /var/sysgen/mtune/numa.







									Page 3






migration(5)							  migration(5)



     Node Distance Filter	 This filter rejects all migration requests
				 where the distance between the	source and the
				 destination is	less than
				 numa_migr_min_distance	in
				 /var/sysgen/mtune/numa. All rejected requests
				 result	in the page being frozen in order to
				 prevent this request from being re-issued too
				 soon.

     Memory Pressure Filter	 This filter rejects migration requests	to
				 nodes where physical memory is	low.  The
				 threshold for low memory is defined by	the
				 tunable numa_migr_memory_low_threshold, which
				 defines the minimum percentage	of physical
				 memory	that needs to be available in order
				 for a page to be migrated there. This filter
				 can be	enabled	and disabled using the tunable
				 numa_migr_memory_low_enabled.

     Traffic Control Filter	 Experimental filter intended to throttle
				 migration down	when the Craylink Interconnect
				 traffic reaches peak levels. Experiments have
				 shown that this filter	is unnecessary for
				 Origin	2000 systems.

     Bounce Control Filter	 Sometimes pages may start bouncing due	to
				 poor application behavior or simple page
				 level false sharing. This filter detects and
				 freezes bouncing pages. The detection is done
				 by keeping a count of the number of
				 migrations per	page in	a counter that is aged
				 (periodically decremented by a	system
				 daemon). If the count ever goes above a
				 threshold, it is considered to	be bouncing
				 and is	then frozen. Frozen pages start
				 melting immediately, so after a period	of
				 time, they are	unfrozen and migratable	again.
				 Note the the melting procedure	is gradual,
				 not instantaneous. The	bounce control filter
				 relies	on operations executed periodically by
				 the Migration Control Periodic	Operations
				 Module	described below, for a)	aging of the
				 migration counters and	b) melting of frozen
				 pages.	The period of these bounce control
				 periodic operations is	defined	by the tunable
				 numa_migr_bounce_control_interval. The
				 default value for this	tunable	is 0, which
				 translates into a period such that 4 physical
				 pages are operated on per tick	(10[ms]
				 interval). Freezing can be enabled and
				 disabled using	the tunable
				 numa_migr_freeze_enabled, and the freezing



									Page 4






migration(5)							  migration(5)



				 threshold can be set using the	tunable
				 numa_migr_freeze_threshold. This threshold is
				 specified as a	percentage of the maximum
				 effective freezing threshold value, which is
				 7 for Origin 2000 systems. Melting can	be
				 enabled and disabled using the	tunable
				 numa_migr_melt_enabled, and the melting
				 threshold can be set using the	tunable
				 numa_migr_melt_threshold. The melting
				 threshold is expressed	as a percentage	of the
				 maximum effective melting threshold value,
				 which is 7 for	Origin 2000 systems.

     Migration Dampening Filter	 This filter minimizes the amount of migration
				 due to	quick temporary	remote memory
				 accesses, such	as those that occur when
				 caches	are loaded from	a cold state, or when
				 they are reloaded with	a new context. We
				 implement this	dampening flter	using a	perpage
 migration	request	counter	that is
				 incremented every time	we receive a migration
				 request interrupt, and	aged (periodically
				 decremented) by the Migration Control
				 Periodic Operations Module. We	migrate	a page
				 only if the counter reaches a value greater
				 than some dampening threshold.	This will
				 happen	only for applications that
				 continuously generate remote accesses to the
				 same page during some interval	of time. If
				 the application experiences just a short
				 transitory sequence of	remote accesses, it is
				 very unlikely that the	migration request
				 counter will reach the	threshold value. This
				 filter	can be enabled and disabled using the
				 tunable numa_migr_dampening_enabled, and the
				 migration request count threshold can be set
				 using the tunable numa_migr_dampening_factor.


     The memory	reference counters are re-initialized to their startup values
     after every reference counter interrupt.


   Migration Control Periodic Operations Module    [Toc]    [Back]
     The Migration Control Module relies on several periodic operations. These
     operations	are listed below:

     - Bounce Control Operations. Age migration	counter	for freezing and
       melting.






									Page 5






migration(5)							  migration(5)



     _ Unpegging. Reset	memory reference counters that have reached a
       saturation level.

     - Queue Control Operations. Age queued outstanding	migration requests.
       Experimental, always disabled for production systems.

     - Traffic Control Operations. Sample the state of the Craylink
       interconnect and	correspondingly	adjust the per-node migration
       threshold. Experimental,	always disabled	for production systems.

     These operations are executed in a	loop, triggered	once every
     mem_tick_base_period, a tunable that defines the migration	control
     periodic period in	terms of system	ticks (a system	tick is	equivalent to
     10	[ms] on	Origin systems running IRIX 6.5). This loop of operations may
     be	enabled	and disabled using the tunable mem_tick_enabled. If migration
     is	enabled	or users are allowed to	use migration, this loop must be
     enabled.

     In	order to minimize interference with user processes, we limit the
     number of pages operated on in a loop to a	few pages, trying to limit the
     time used to less than 20 [us]. Administrators can	adjust the time
     dedicated to these	periodic operations via	the following tunables:

     + mem_tick_base_period

     + numa_migr_unpegging_control_interval

     + numa_migr_traffic_control_interval

     + numa_migr_bounce_control_interval

   Description of Periodic Operations    [Toc]    [Back]
     The following list	describes the Bounce Control Periodic Operations in
     detail:

     Aging Migration Counters	 In order to detect bouncing we	keep track of
				 the number of migrations per page using a
				 counter that is periodically decremented
				 (aged). When the counter goes beyond a
				 threshold, we consider	the page to be
				 bouncing and freeze it.

     Aging Migration Request Counters
				 In order to avoid excessive migration or
				 bouncing due to short,	transitory remote
				 memry access sequences	we have	a migration
				 dampening filter that needs to	count several
				 migration requests within a limited period of
				 time before it	actually lets a	real page
				 migration take	place. The time	factor is
				 introduced in the filter by aging the
				 migration request counters.



									Page 6






migration(5)							  migration(5)



     Melting Frozen Pages	 When a	page is	frozen we want to eventually
				 unfreeze it so	that it	becomes	migratable
				 again.	This behavior is desirable because the
				 events	that cause a page to be	frozen are
				 usually temporary. As part of the periodic
				 operations, we	increment a counter per	page
				 to keep track of how long the page has	been
				 frozen. When the counter goes above a
				 threshold, meaning that the page has been
				 frozen	for a sufficient time, we unfreeze the
				 page, thereby making it migratable again.

     The Unpegging Periodic Operation consists of scanning all the memory
     reference counters	looking	for those counters that	have pegged due	to
     reaching their maximum count. When	a pegged counter is found, all
     counters associated with that page	are restarted.

     The current implementation	of the Migration Control module	does not
     execute Queue Control Periodic Operations or Traffic Control Periodic
     Operations.

   Page	Migration Tunables
     This is a list of all the memory migration	tunables in
     /var/sysgen/mtune/numa that define	the default memory migration policy
     used by the system.

     * numa_migr_default_mode.	This tunable defines the default migration
       mode. It	can take the following values:


	      0: MIGR_DEFMODE_DISABLED
		 Migration is completely disabled, users cannot	use migration.

	      1: MIGR_DEFMODE_ENABLED
		 Migration is always enabled, users cannot disable migration.

	      2: MIGR_DEFMODE_NORMOFF
		 Migration is normally off, users can enable migration for
		 an application.

	      3: MIGR_DEFMODE_NORMON
		 Migration is normally on, users can disable migration for
		 an application.

	      4: MIGR_DEFMODE_LIMITED
		 Migration is normally off for machine configurations with
		 a maximum Craylink distance less than numa_migr_min_maxradius
		 (defined below). Migration is normally	on otherwise. Users
		 can override this mode.






									Page 7






migration(5)							  migration(5)



     *	  numa_migr_default_threshold.	This threshold defines the minimum
	  difference between the local and any remote counter needed to
	  generate a migration request interrupt.


	      if ((remote_counter - local_counter) >=
		  ((numa_migr_threshold_reference_value	/ 100) *
		   numa_migr_default_threshold)) {
		  send_migration_request_intr();
	      }



     *	  numa_migr_threshold_reference.  This parameter defines the pegging
	  value	for the	memory reference counters.  It is machine
	  configuration	dependent. For Origin 2000 systems, it can take	the
	  following values:


	     0:	MIGR_THRESHREF_STANDARD	= Threshold reference is 2048 (11 bit
					  counters) Maximum threshold allowed
					  for systems with STANDARD DIMMS. This
					  is the default.
	     1:	MIGR_THRESHREF_PREMIUM =  Threshold reference is 524288	(19-bit
					  counters) Maximum threshold allowed
					  for systems with *all* PREMIUM SIMMS.



     *	  numa_migr_vehicle.  This tunable defines what	device the system
	  should use to	migrate	a page.	 The value 0 selects the Block
	  Transfer Engine (BTE)	and a value of 1 selects the processor.	When
	  the BTE is selected, and the system is equipped with the optional
	  poison bits, the system automatically	uses Lazy TLB Shootdown
	  Algorithms.

     *	  numa_migr_min_maxradius.  This tunable is used if
	  numa_migr_default_mode has been set to mode 4
	  (MIGR_DEFMODE_LIMITED). For this mode, migration is normally off for
	  machine configurations with a	maximum	Craylink distance less than
	  numa_migr_min_maxradius Migration is normally	on otherwise.

     *	  numa_migr_auto_migr_mech.  This tunable defines the migration
	  execution mode for memory reference counter triggered	migrations: 0
	  for immediate	and 1 for delayed. Only	the Immediate Mode (0) is
	  currently available.

     *	  numa_migr_user_migr_mech.  This tunables defines the migration
	  execution mode for user requested migrations:	0 for immediate	and 1
	  for delayed. Only the	Immediate Mode (0) is currently	available.





									Page 8






migration(5)							  migration(5)



     *	  numa_migr_coaldmigr_mech .  This tunables defines the	migration
	  execution mode for memory coalescing migrations: 0 for immediate and
	  1 for	delayed. Only the Immediate Mode (0) is	currently available.

     *	  numa_refcnt_default_mode.  Extended counters are used	in application
	  profiling (see refcnt(5)) and	to control automatic memory migration.
	  This tunable defines the default extended reference counter mode. It
	  can take the following values:


	     0:	REFCNT_DEFMODE_DISABLED
		Extended reference counters are	disabled, users	cannot access
		the extended reference counters	(refcnt(5)). In	this case
		automatic memory migration will	not be performed regardless of
		any other settings.

	     1:	REFCNT_DEFMODE_ENABLED
		Extended reference counters are	always enabled,	users cannot
		disable	them.

	     2:	REFCNT_DEFMODE_NORMOFF
		Extended reference counters are	normally disabled, users can
		disable	or enable the counters for an application.

	     3:	REFCNT_DEFMODE_NORMON
		Extended reference counters are	normally enabled, users	can
		disable	or enable the counters for an application.


     *	  numa_refcnt_overflow_threshold This tunable defines the count	at
	  which	the hardware reference counters	notify the operating system of
	  a counter overflow in	order for the count to be transferred into the
	  (software) extended reference	counters. It is	expresses as a
	  percentage of	the threshold reference	value defined by
	  numa_migr_threshold_reference.

     *	  numa_migr_min_distance Minimum distance required by the Node
	  Distance Filter in order to accept a migration request.

     *	  numa_migr_memory_low_enabled Enable or disable the Memory Pressure
	  Filter.

     *	  numa_migr_memory_low_threshold Threshold at which the	Memory
	  Pressure Filter starts rejecting migration requests to a node. This
	  threshold is expressed as a percentage of the	total amount of
	  physical memory in  a	node.

     *	  numa_migr_freeze_enabled Enable or disable the freezing operation in
	  the Bounce Control Filter.






									Page 9






migration(5)							  migration(5)



     *	  numa_migr_freeze_threshold Threshold at which	a page is frozen. This
	  tunable is expressed as a percent of the maximum count supported by
	  the migration	counters (7 for	Origin 2000).

     *	  numa_migr_melt_enabled Enable	or disable the melting operation in
	  the Bounce Control Filter.

     *	  numa_migr_melt_threshold When	a migration counter goes below this
	  threshold a page is unfrozen.	 This tunable is expressed as a
	  percent of the maximum count supported by the	migration counters (7
	  for Origin 2000).

     *	  numa_migr_bounce_control_interval This tunable defines the period
	  for the loop that ages the migration counters	and the	dampening
	  counters. It is expressed in terms of	number of mem_ticks. The
	  mem_tick unit	is defined by mem_tick_base_period below.  If it is
	  set to 0, we process 4 pages per mem_tick. In	this case, the actual
	  period depends on the	amount of physical memory present in a node.

     *	  numa_migr_dampening_enabled Enable or	disable	migration dampening.

     *	  numa_migr_dampening_factor The number	of migration requests needed
	  for a	page before migration is actually executed. It is expressed as
	  a percentage of the maximum count supported by the migration-request
	  counters (3 for Origin 2000).

     *	  mem_tick_enabled Enable or disabled the loop that executes the
	  Migration Control Periodic Operation.

     *	  mem_tick_base_period Number of 10[ms]	system ticks in	one mem_tick.

     *	  numa_migr_unpegging_control_enabled Enable or	disable	the unpegging
	  periodic operation

     *	  numa_migr_unpegging_control_interval This tunable defines the	period
	  for the loop that unpegs the hardware	memory reference counters. It
	  is expressed in terms	of number of mem_ticks.	 The mem_tick unit is
	  defined by mem_tick_base_period above. If it is set to 0, we process
	  8 pages per mem_tick.	In this	case, the actual period	depends	on the
	  amount of physical memory present in a node.

     *	  numa_migr_unpegging_control_threshold	Hardware memory	reference
	  counter value	at which we consider the counter to be pegged. It is
	  expressed as a percent of the	maximum	count defined by
	  numa_migr_threshold_reference.

     *	  numa_migr_traffic_control_enabled Enable or disable the Traffic
	  Control Filter. This is an experimental module, and therefore	it
	  should always	be disabled.






								       Page 10






migration(5)							  migration(5)



     *	  numa_migr_traffic_control_interval Traffic control period.
	  Experimental module.

     *	  numa_migr_traffic_control_threshold Traffic control threshold	for
	  kicking the batch migration of enqueued migration requests.
	  Experimental module.

FILES    [Toc]    [Back]

     /var/sysgen/mtune/numa

SEE ALSO    [Toc]    [Back]

      
      
     numa(5), replication(5), mtune(4),	refcnt(5), mmci(5), nstats(1), sn(1).


								       PPPPaaaaggggeeee 11111111
[ Back ]
 Similar pages
Name OS Title
malloc Linux Allocate and free dynamic memory
nsysmap64 HP-UX number of entries in a kernel dynamic memory allocation map
nsysmap HP-UX number of entries in a kernel dynamic memory allocation map
migration IRIX user migration operations
if_freenameindex Tru64 Free dynamic memory allocated to the array of interface names and indexes
lvmmigrate HP-UX prepare root file system for migration from partitions to LVM logical volumes
dld.so HP-UX dynamic loader
dld.sl HP-UX dynamic loader
lh_doall_arg OpenBSD dynamic hash table
dladdr NetBSD dynamic link interface
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service