DDOPT(1) DDOPT(1)
ddopt - MIPS Data-Dependency-based Optimizer
ddopt unopt_file opt_file [ -v -mips3 -hostcache -cachesz size ]
ddopt, the MIPS data-dependency-based optimizer, reads the input binary
ucode file on a procedure by procedure basis, performs loop-based
transformations on each outer-most loop nest in each procedure and
outputs the optimized binary ucode file. By convention, it takes a
binary ucode file with the extensions .B or .M as input and output a
binary ucode file with the extension .D. In the compilation process,
ddopt runs after the front-end, after uld and usplit, and before umerge,
uopt and ugen. Currently, ddopt only takes ucode files generated from
FORTRAN.
ddopt borrows optimization techniques that originated from compilers for
supercomputers and adapts them to apply to scalar machines. It performs
high-level analysis on the behavior of array accesses in loops, deriving
what we call data dependency information. Numerous optimization
transformations on the program code are performed based on such
information (and thus the name ddopt ). The transformations are
invariantly associated with program loops that operate on arrays.
There are different kinds of transformations performed by ddopt that
benefit program performance:
1. Those that reduce memory references. Techniques include re-using
array references that have been allocated to register (register
allocation for array references) and moving array references and
assignments outside loops.
2. Those that improve locality of memory references (thus reducing data
cache misses). Techniques include changing the order of loop nests (loop
interchange) and partitioning loop iterations to operate on smaller
sections of array (strip-mining).
3. Those that reduce floating-point interlocks and promote greater
parallelism among floating-point operations by promoting larger pieces of
straight-line code in loops. Techniques include unrolling and
unrolling-and-jam (unroll outer loop and jam the resulting copies of the
inner loop into one bigger loop).
There are other optimizations that ddopt does just to bring in more
opportunities for doing the above transformations: local common
subexpression, secondary index variable elimination, constant
propagation, copy propagation, constant folding, jump folding and dead
code elimination. Some of these optimizations duplicate the
optimizations performed in uopt . These optimizations are applied
iteratively until there is no more change to the code, and they precedes
the data-dependency-based analyses and transformations.
Page 1
DDOPT(1) DDOPT(1)
The following options are interpreted by ddopt. Options starting with -X
are not recognized by the compiler driver, and have to be passed to ddopt
via -Wd,... .
-v Turns on verbose mode. In this mode, ddopt will print the name
of the procedure it is currently optimizing.
-mips3 Tells ddopt that the target machine uses the MIPS3 instruction
set.
-hostcache
Tells ddopt to assume that the target machine has the same data
cache size as the host machine, so it can find out the cache size
via system call.
-cachesz size
Gives ddopt the data cache size of the target machine, in bytes.
The default is 8192 bytes.
-Xbldgr Dumps the data dependency information computed, for debugging
purpose.
-Xbboptoff
Turns off the conventional global optimizations that precede the
data-dependency-related transformations.
-Xbf size
Changes the blocking factor used by ddopt in strip-mining. The
default is 36 bytes.
-Xdump Tells ddopt to dump the original and transformed program in a
compact, close-to-source-level format.
-Xdosizethreshold count
If the number of statements in a DO loop exceeds this number,
that DO loop is excluded from transformation by ddopt. The
default is 150.
-Xgcopyoff
Turns off global copy propagation.
-Xinteroff
Turns off loop interchange.
-Xindepregoff
Turns off loop-independent dependence register allocation.
-Xinputregoff
Turns off input dependence register allocation.
Page 2
DDOPT(1) DDOPT(1)
-Xinvarregoff
Turns off loop-invariant register allocation.
-Xlcopyoff
Turns off local copy propagation.
-Xmergepiblockoff
Disallows the merging of pi-blocks created for statements in the
same basic blocks.
-Xmoreunrolljam
By default, unroll-and-jam are performed only on inner loop nests
that come out of strip-mining. This flag removes this restriction
and tells ddopt to do unroll-and-jam whenever it thinks it is
advantageous.
-Xmax_int_regs
Tells ddopt the number of integer registers available in the
underlying machine. The default is 32.
-Xmax_float_regs
Tells ddopt the number of floating-point registers available in
the underlying machine. The default is 16.
-Xofffoo
Turns off all transformation for the given procedure name ("foo"
in this case).
-Xoutputregoff
Turns off output dependence register allocation.
-Xoverallocate
Tells ddopt to perform register allocation without regard to the
number of registers available in the underlying machine.
-Xstripoff
Turns off strip-mining.
-Xstriponly
Tells ddopt to perform strip-mining but prevent the newly-formed
loops from being interchanged into a deeper region of the loop
nest, for debugging purpose only.
-Xstat Prints optimization statistics to give line numbers and number of
times various transformations were applied.
-Xtrueregoff
Turns off true dependence register allocation.
-Xunrolloff
Turns off loop unrolling.
Page 3
DDOPT(1) DDOPT(1)
-Xunrolljamoff
Turns off unroll-and-jam.
-Xunrollthreshold count
Sets the threshold that limits the extent to which unrolling can
be performed without causing the number of statements in the loop
to exceed this number. The default is 180.
-Xunrolltimes count
Sets the maximum number of times to unroll a loop. The default
is 4.
ucode(1), uopt(1), btou(1), ppu(1),
ddopt assumes the input ucode file is error-free.
PPPPaaaaggggeeee 4444 [ Back ]
|