lno - IRIX

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->IRIX man pages -> lno (5)


LNO(5)						      Last changed: 3-10-98

NAME [Toc] [Back]

     LNO - Compiler loop nest optimization option group

SYNOPSIS [Toc] [Back]

     -LNO: ...

IMPLEMENTATION [Toc] [Back]

     IRIX systems

DESCRIPTION [Toc] [Back]

     This man page describes the loop nest optimization	options	accepted by
     the f90(1), f77(1), CC(1),	cc(1), and c89(1) commands.

     The -LNO: option group specifies options and transformations performed
     on	loop nests.  The -LNO: option group is enabled only if the -O3
     option is also specified on the compiler command line.

     For information on	the LNO	options	that are in effect during a
     compilation, use the -LIST:options=ON option.

     You can specify more than one suboption to	the -LNO: option either	by
     using colons to separate each suboption or	by specifying multiple
     options on	the command line.  For example,	the following command lines
     are equivalent:

	  f90 -LNO:auto_dist=ON:outer=OFF b.f
	  f90 -LNO:auto_dist=ON	-LNO:outer=OFF b.f

     Some -LNO:	suboptions are specified with a	setting	that either enables
     or	disables the feature.  To enable a feature, specify the	argument
     either alone or with =1, =ON, or =TRUE.  To disable a feature, specify
     the suboption with	either =0, =OFF, or =FALSE.  For example, the
     following command lines are equivalent:

	  f90 -LNO:auto_dist:blocking=OFF:oinvar=FALSE a.f
	  f90 -LNO:auto_dist=1:blocking=0:oinvar=OFF a.f

     For brevity, this man page	shows only the ON or OFF settings to
     suboptions, but 0,	1, TRUE, and FALSE are also allowed as settings. In
     addition, this man	page shows the abbreviated form	for some of the
     suboption names.  You can use either the abbreviation or the complete
     suboption name when using the suboptions. The following is	a list of
     the abbreviations and the complete	suboption names:

	  Complete name			Abbreviation

	  outer_unroll			ou

	  associativity			assoc

	  clean_miss_penalty		cmp

	  dirty_miss_penalty		dmp

	  cache_size			cs

	  is_memory_level		is_mem

	  line_size			ls

	  tlb_entries			tlb

	  tlb_clean_miss_penalty	tlbcmp

	  prefetch_level		pf

     See "F77 LNO Directives" at the end of this man page for a	summary	of
     the F77 directives	for LNO.  See the MIPSPro 7 Fortran 90 Commands	and
     Directives	Reference Manual, publication SR-3907, for a discussion	of
     the Fortran 90 LNO	directives.  See MIPSpro C and C++ Pragmas, for
     descriptions of the C and C++ LNO #pragma directives.

     The descriptions to the suboptions	to -LNO: are divided into the
     following categories:

	       * General options

	       * Transformation	options

	       * Cache memory management options

	       * TLB options

	       * Prefetch options

     The -LNO option accepts the following general suboptions:

     Suboption	 Action

     auto_dist[	= ( ON|OFF )]
		 Distributes local arrays in common blocks that	are
		 accessed in parallel.	The default is OFF.

		 This optimization works with either automatic parallelism
		 or parallelism	using directives; it is	always safe, and
		 does not affect the layout of arrays in virtual space,	and
		 does not incur	addressing overhead.

     fission=n	 Controls loop fission.	 n can be one of the following:

		 0   Disables loop fission.

		 1   Performs normal fission as	necessary.  This is the
		     default.

		 2   Specifies that fission be tried before fusion.

		 If -LNO:fission=n and -LNO:fusion=n are both set to 1 or
		 to 2, fusion is performed.

     fusion=n	 Controls loop fusion.	n can be one of	the following:

		 0   Disables loop fusion.

		 1   Performs standard outer loop fusion.  This	is the
		     default.

		 2   Specifies that outer loops	should be fused, even if it
		     means partial fusion.

		 The compiler attempts fusion before fission.  The compiler
		 performs partial fusion if not	all levels can be fused	in
		 the multiple-level fusion.

		 If -LNO=fission=n and -LNO:fusion=n are both set to 1 or
		 to 2, fusion is performed.

     fusion_peeling_limit=n
		 Sets the limit	for the	number of iterations allowed to	be
		 peeled	in fusion, where n >= 0.  By default, n=5.

     gather_scatter=n
		 Performs gather-scatter optimizations.	 n can be one of
		 the following:

		 0   Disables all gather-scatter optimization.

		 1   Performs gather-scatter optimizations on non-nested IF
		     statements.  This is the default.

		 2   Performs multi-level gather-scatter optimizations.

     ignore_pragmas[ = ( ON|OFF	)]
		 Specifies that	the command line options override
		 directives in the source file.	 The default is	OFF.

     non_blocking_loads[ = ( ON|OFF )]
		 (C/C++	and F77	only) Specifies	whether	the processor
		 blocks	on loads.  If not set, the default of the current
		 processor is used.

     oinvar[ = ( ON|OFF	)]
		 Controls outer	loop hoisting.	The default is ON.

     opt=n	 Controls the LNO optimization level.  n can be	one of the
		 following:

		 0   Disables nearly all loop nest optimization.

		 1   Peforms full loop nest transformations.  This is the
		     default.

     outer[ = (	ON|OFF )]
		 Enables or disables outer loop	fusion.	 The default is	ON.

     vintr[ = (	ON|OFF )]
		 Specifies that	vectorizable versions of the math intrinsic
		 functions should be used.  The	default	is ON.

		 For information on the	math intrinsic functions, see
		 math(3M).

     The loop transformation arguments allow you to control cache blocking,
     loop unrolling, and loop interchange.  They are as	follows:

     blocking[ = ( ON|OFF )]
	       Specify blocking=OFF to disable the cache blocking
	       transformation.	The default is ON.

     blocking_size=[n1][,n2]
	       Specifies a block size that the compiler	must use when
	       performing any blocking.	 When using the	MIPSpro	7 Fortran
	       90 compiler, specify a value for	n2 when	using a	2-level
	       cache.  For n1 or n2, specify a positive	integer	number that
	       represents the number of	iterations.

     interchange[ = ( ON|OFF )]
	       Specifies whether or not	loop interchange optimizations are
	       performed.  The default is ON.

     ou=n      Indicates that all outer	loops for which	unrolling is legal
	       should be unrolled by n,	where n	is a positive integer.	The
	       compiler	unrolls	loops by this amount or	not at all.

     ou_deep[ =	( ON|OFF )]
	       Specifies that for loops	with 3-deep, or	deeper,	loop nests,
	       the compiler should outer unroll	the wind-down loops that
	       result from outer unrolling loops further out.  This results
	       in large	code size, but it generates much faster	code
	       whenever	wind-down loop execution costs are important.  The
	       default is ON.

     ou_further=n
	       Specifies whether or not	the compiler performs outer loop
	       unrolling on wind-down loops.  Specify an integer for n.

     ou_max=n  Indicates that the compiler can unroll as many as n copies
	       per loop, but no	more.

     ou_prod_max=n
	       Indicates that the product of unrolling of the various outer
	       loops in	a given	loop nest is not to exceed n, where n is a
	       positive	integer.  The default is 16.

     pwr2[ = ( ON|OFF )]
	       (C/C++ and F77 only) Specifies whether to ignore	the leading
	       dimension (set to OFF to	ignore).

	       You can disable additional unrolling by specifying
	       -LNO:ou_further=999999.	Unrolling is enabled as	much as	is
	       sensible	by specifying -LNO:ou_further=3.

     Certain arguments allow you to describe the target	cache memory
     system.  The numbering in the following arguments starts with the
     cache level closest to the	processor and works outward:

     assoc1=n, assoc2=n, assoc3=n, assoc4=n
	       Specifies the cache set associativity.  For a fully
	       associative cache, such as main memory, set n to	any
	       sufficiently large number, such as 128.	Specify	a positive
	       integer for n.  Specifying n=0 indicates	that there is no
	       cache at	that level.

     cmp1=n, cmp2=n, cmp3=n, cmp4=n
     dmp1=n, dmp2=n, dmp3=n, dmp4=n
	       Specifies, in processor cycles, the time	for a clean miss
	       (cmpx=) or dirty	miss (dmpx=) to	the next outer level of	the
	       memory hierarchy.  This number is approximate because it
	       depends upon a clean or dirty line, read	or write miss, etc.
	       Specify a positive integer for n.  Specifying n=0 indicates
	       that there is no	cache at that level.

     cs1=n, cs2=n, cs3=n, cs4=n
	       Specifies the cache size.  The value n can be 0,	or it can
	       be a positive integer followed by one of	the following
	       letters:	 k, K, m, or M.	 This specifies	the cache size in
	       Kbytes or Mbytes.  Specifying 0 indicates that there is no
	       cache at	that level.

	       The default cache size depends on your system.  You can use
	       the -LIST:options=ON option to see the default cache sizes
	       used during your	compilation.

     is_mem1[ =	( ON|OFF )]
     is_mem2[ =	( ON|OFF )]
     is_mem3[ =	( ON|OFF )]
     is_mem4[ =	( ON|OFF )]
	       Specifies that certain memory hierarchies should	be modeled
	       as memory, not cache.  The default is OFF for each option.

	       Blocking	can be attempted for this memory hierarchy level,
	       and blocking appropriate	for memory, rather than	cache, is
	       applied.	 No prefetching	is performed, and any prefetching
	       options are ignored.  If	an -OPT:is_memx[ = ( ON|OFF )]
	       option is specified, the	corresponding assocx=n
	       specification is	ignored, any cmpx=n and	dmpx=n options on
	       the command line	are ignored.

     ls1=n, ls2=n, ls3=n, ls4=n
	       Specifies the line size,	in bytes.  This	is the number of
	       bytes, specified	in the form of a positive integer number,
	       n, that are moved from the memory hierarchy level further
	       out to this level on a miss.  Specifying	n=0 indicates that
	       there is	no cache at that level.

     Certain arguments control the TLB.	 The TLB is a cache for	the page
     table, and	it is assumed to be fully associative.	The TLB	control
     arguments are as follows:

     ps1=n, ps2=n, ps3=n, ps4=n
	       Specifies the number of bytes in	a page.	 Specify a positive
	       integer for n.  The default n depends on	your system
	       hardware.

     tlb1=n, tlb2=n, tlb3=n, tlb4=n
	       Specifies the number of entries in the TLB for this cache
	       level.  Specify a positive integer for n.  The default n
	       depends on your system hardware.

     tlbcmp1=n,	tlbcmp2=n, tlbcmp3=n, tlbcmp4=n
     tlbdmp1=n,	tlbdmp2=n, tlbdmp3=n, tlbdmp4=n
	       Specifies the number of processor cycles	it takes to service
	       a clean TLB miss	(the tlbcmpx= options) or dirty	TLB miss
	       (the tlbdmpn= options).	Specify	a positive integer for n.
	       The default n depends on	your system hardware.

     The following arguments control the prefetch operation:

     pf1[ = ( ON|OFF )]
     pf2[ = ( ON|OFF )]
     pf3[ = ( ON|OFF )]
     pf4[ = ( ON|OFF )]
	       Selectively disables and	enables	prefetching for	cache level
	       x, for pfx[ = ( ON|OFF )]

	       When -r10000 is in effect, pf1=ON and pf2=ON by default.	 At
	       any other -rn setting, OFF is in	effect for all cache
	       levels.

     prefetch=n
	       Specifies levels	of prefetching.	 n can be one of the
	       following:

	       0   Disables all	prefetching.  This is the default when
		   -r4000, -r5000, or -r8000 is	in effect.

	       1   Enables conservative	prefetching.  This is the default
		   when	-r10000	is in effect.

	       2   Enables aggressive prefetching.

     prefetch_ahead=n
	       Prefetches the specified	number of cache	lines ahead of the
	       reference.  Specify a positive integer for n.  The default
	       is 2.

     prefetch_manual[ =	( ON|OFF )]
	       Specifies whether manual	prefetches (through directives)
	       should be respected or ignored.

	       prefetch_manual=OFF ignores manual prefetches.  This is the
	       default when -r8000, -r5000, or -r4000 is in effect.

	       prefetch_manual=ON respects manual prefetches.  This is the
	       default when -r10000 is in effect.

F77 LNO	Directives
     Directives	within a program unit apply only to that program unit,
     reverting to the default values at	the end	of the program unit.
     Directives	that occur outside of a	program	unit alter the default
     value, and	therefore apply	to the rest of the file	from that point	on,
     until overridden by a subsequent directive.

     Directives	within a file override the command line	options	by default.
     To	have the command line options override directives, use the command
     line option:

	  -LNO:ignore_pragmas

   Fission and Fusion Directives    [Toc]    [Back]
     * C*$* AGGRESSIVE INNER LOOP FISSION: Fission this	loop in
       inner_fission phase to as many loops as possible.  This must be
       followed	by a inner loop	and has	no effect if that loop is not inner
       any more	after the SNL phase.

     * C*$* FISSION [(n)] or C*$* FISSIONABLE:	Fission	the enclosing n
       level of	loops after this directive. Perform legality test unless a
       fissionable directive is	also specified.	Does not re-order
       statements.

     * C*$* FUSE [(n [,level] )] or C*$* FUSABLE:  Fuse	the following n
       immediately adjacent loops. Fusion is attempted on each pair of
       adjacent	loops and the level, by	default, is the	determined by the
       maximal SNL levels of the fused loops, although partial fusion is
       allowed.	 Iterations may	be peeled as needed during fusion; the
       peeling limit is	5 or the number	specified by the
       -LNO:fusion_peeling_limit flag.	When the FUSABLE directive is
       present,	no legality test is done and the fusion	is done	up to the
       maximal SNL levels where	the iteration numbers matched for each pair
       of loops	to be matched.	The default value for n	is  2.

     * C*$* NO FISSION:	 The loop following this directive should not be
       fissioned in either fiz_fuse phase or inner_fission phase. Its inner
       loops, however, are allowed to be fissioned.

     * C*$* NO FUSION:	The loop following this	directive should not be
       fused with other	loops.

   SNL Transformation Directives    [Toc]    [Back]
     The parallelizing preprocessor may	do some	transformation for
     parallelism that violate some of these directives.

     * C*$* INTERCHANGE	(I, J [,K ...] ):  Loops I, J and K (in	any order)
       must directly follow this directive and be perfectly nested inside
       each other. If they are not perfectly nested, the compiler may
       perform loop distribution to make them so, or may ignore	the
       annotation, or may apply	imperfect interchange (this is not likely).
       The compiler attempts to	reorder	loops so that I	is outermost, then
       J, then K.  The compiler	may ignore this	directive.  There must be a
       minimum of 2 indexes in the directive.

     * C*$* NO INTERCHANGE:  Prevents the compiler from	involving the loop
       directly	following this directive in a permutation, or any loop
       nested within this loop.

     * C*$* BLOCKING SIZE (n1,n2) or C*$* BLOCKING SIZE	(n1) or	C*$*
       BLOCKING	SIZE (,n2): If the specified loop is involved in a blocking
       for the primary or secondary cache, it will have	a blocksize of n1
       or n2. The compiler will	try to include this loop within	such a
       block.  If a blocking size is specified as 0, the loop is not
       actually	stripped, but the entire loop is inside	the block.

     * C*$* NO BLOCKING: Prevent the compiler from involving this loop in a
       cache blocking.

     * C*$* UNROLL (n [,n2] ): This directive suggests that n-1	copies of
       the loop	body be	added to the inner loop. If the	loop that this
       directive directly preceeds is an inner loop, then it indicates
       standard	unrolling. If the loop that this directive directly
       preceeds	is not innermost, then outer loop unrolling is performed.
       n must be at least 1.  If n=1 then no unrolling will be performed.
       If n=0, then the	default	unrolling should be applied.  n2 is
       ignored.

     * C*$* BLOCKABLE (I,J [,K ...] ): The I, J	and K loops must be
       adjacent	and nested within each other, although not necessarily
       perfectly nested.  This directive informs the compiler that these
       loops may legally be involved in	a blocking with	each other, even if
       the compiler would consider such	a transformation illegal.  The
       loops are also interchangeable and unrollable.  This directive does
       not instruct the	compiler which of these	transformations	to apply.
       You must	specify	at least 2 loop	indexes	in the directive.

   Prefetch Directives    [Toc]    [Back]
     * C*$* PREFETCH (n[,n]): Specify prefetching for each level of the
       cache. The scope	is the entire function containing the directive.  n
       can be one of the following values:

       0    prefetching	off (default for all processors	except R10000)

       1    prefetching	on, but	conservative

       2    prefetching	on, and	aggressive (default when prefetch is on)

     * C*$* PREFETCH_MANUAL (n):  Specify if manual prefetches (through
       directives) should be respected or ignored.  Scope: Entire function
       containing the directive.  n can	be one of the following	values:

       0    ignore manual prefetches (default for mips3	and earlier)

       1    respect manual prefetches (default for mips4)

     * C*$* PREFETCH_REF_DISABLE=A [, size=num]:  This directive explicitly
       disables	prefetching all	references to array A in the current
       function. The auto-prefetcher runs (if enabled) ignoring	array A.
       The size	is used	for volume analysis.  Scope: Entire function
       containing the directive.  size=num is the size of the array
       references in this loop,	in Kbytes.  This is an optional	argument
       and must	be a constant.

     * C*$* PREFETCH_REF=array-ref,[stride=[str] [,str]], [level=[lev]
       [,lev]],	[kind=[rd/wr]],	[size=[sz]]: This directive generates a
       single prefetch instruction to the specified memory location. It
       searches	for array references that match	the supplied reference in
       the current loop-nest.  If such a reference is found, that reference
       is connected to this prefetch node with the specified latency. If no
       such reference is found,	this prefetch node stays free-floating and
       is scheduled "loosely".

       All references to this array in this loop-nest are ignored by the
       automatic prefetcher (if	enabled).

       If the size is supplied,	then the auto-prefetcher (if enabled)
       reduces the effective cache size	by that	amount in its calculations.

       The compiler tries to issue one prefetch	per stride iteration, but
       cannot guarantee	it. Redundant prefetches are preferred to
       transformations (such as	inserting conditionals)	which incur other
       overhead.

       Scope: No scope.	Just generates a prefetch instruction.

       The following arguments are used	with this option:

       array-ref Required.  The	reference itself, for example, A(i, j).

       str	 Optional. Prefetch every str iterations of this loop.	The
		 default is 1.

       lev	 Optional.  The	level in memory	hierarchy to prefetch. The
		 default is 2.	If lev=1, prefetch from	L2 to L1 cache.	If
		 lev=2,	prefetch from memory to	L1 cache.

       rd/wr	 Optional.  The	default	is read/write.

       sz	 Optional.  The	size (in Kbytes) of the	array referenced in
		 this loop. This must be a constant.

   Dependence Analysis Directives    [Toc]    [Back]
     * CDIR$ IVDEP: This applies only to inner loops. Liberalize dependence
       analysis.  Given	two memory references, where at	least one is loop
       variant,	ignore any loop-carried	dependences between the	two
       references. The following are examples of this directive.

	  do i = 1,n
	    b(k) = b(k)	+ a(i)
	  enddo

     IVDEP does	not break the dependence because b(k) is not loop-variant.

	  do i=1,n
	     a(i) = a(i-1) + 3
	  enddo

     IVDEP does	break the dependence but the compiler warns the	user that
     it	is breaking an obvious dependence.

	  do i=1,n
	     a(b(i)) = a(b(i)) + 3.
	  enddo

     IVDEP does	break the dependence.

	  do i = 1,n
	     a(i) = b(i)
	     c(i) = a(i) + 3.
	  enddo

     IVDEP does	not break the dependence on a[i] because it is within an
     iteration.

     If	-OPT:cray_ivdep=ON, Cray semantics are used and	all lexically
     backwards dependences are broken. The following are examples:

	  do i=1,n
	     a(i) = a(i-1) + 3.
	  enddo

     IVDEP does	break the dependence but the compiler warns the	user that
     it's breaking an obvious dependence.

	  do i=1,n
	     a(i) = a(i+1) + 3.
	  enddo

     IVDEP does	not break the dependence because the dependence	is from	the
     load to the store,	and the	load comes lexically before the	store.

     If	-OPT:liberal_ivdep=ON, all dependences are broken.

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

IMPLEMENTATION [Toc] [Back]

DESCRIPTION [Toc] [Back]

SEE ALSO [Toc] [Back]