prof, pixstats - Analyzes profile data
prof [options] [prog_name [PC-sampling_data_file]...]
prof -pixie [options] [prog_name [Addrs_file |
Counts_file]...]
prof -pixstats [options] [prog_name [Addrs_file |
Counts_file]...]
pixstats [options] [prog_name [Addrs_file |
Counts_file]...]
Name of the program executable to be profiled. This program
should be compiled with the -g1, -g2, or -g3 option
to obtain more complete profiling information. If the
default symbol table level (-g0) has been used, line number
information, static procedure names, and file names
are unavailable to the profiling code. Name of a profiling
data file (default mon.out) produced by executing a
program that has been linked with the cc -p command. Name
of an instruction-counts file produced by executing a program
that has been instrumented with pixie. If no
Counts_file or Addrs_file is specified, prog_name.Counts
is used if found in the current working directory. Name
of an instruction-address file produced when the executable
or shared library object is instrumented with
pixie. By default, the path of each object.Addrs file will
be recorded in the Counts_file, so they do not need to be
specified. The order of precedence for finding an
Addrs_file is as follows: Addrs_file path specified on
command line, current directory, directory of object specified
in command line argument, directory where pixie created
it.
For each prof option, you need to type only enough of the
name to distinguish it from the other options. If you do
not specify any options, prof uses -procedures by default.
Always specify -pixie or -pixstats when you process and
files.
The prof command accepts the following options: Causes the
profiles for all shared libraries (if any) described in
the data file(s) to be displayed, in addition to the profile
for the executable. Causes the profiler to print the
assembly instructions for each subroutine along with the
cycle counts for each instruction. The subroutines are
sorted from highest cycle count to lowest. The instructions
for each subroutine are printed in order; they are
not sorted by cycle count.
When used without the -pixie option for a PC-sampling
profile, the CPU time used by each instruction
is presented in milliseconds. (For uprofile
and kprofile, per-instruction sample counts are
also provided for events other than time.) Alters
the appropriate parts of the listing to reflect the
clock speed of the CPU. By default, the cycle time
of the processor on which program was run is used.
(Use this option only with the -pixie option.)
Disassembles and shows the analyzed object code.
(Use this option only with the -pixstats option.)
Limits the disassembly to blocks with f% frequency.
(Use this option only with the -pixstats option.)
If you use one or more -exclude options, the profiler
omits the specified procedure and its descendents
from the listing. If any option uses an
uppercase "E" (for "Exclude"), prof also omits that
procedure from the base upon which it calculates
percentages. To represent all of the variations of
an overloaded C++ function name, you can specify
just the part of the name up to but not including
the "(". Causes the profile for the named executable
or shared library not to be printed. You
can use this option multiple times in a single prof
command. Produces a file with information that the
compiler system can use to decide which parts of
the program will benefit most from global optimization
and which parts will benefit most from in-line
procedure substitution (requires basic-block counting).
(Use this option only with the -pixie
option.)
This option is for compilers whose -feedback option
requires a feedback file (rather than an executable
file) and that do not support the prof command's
-update option. For compilers that support the
-update option, better results can be achieved
using that option instead of the (prof) -feedback
option. Reports the most heavily used lines in
descending order of use. Causes the profile for
the named shared library to be printed, in addition
to the profile for the executable. You can use this
option multiple times in a single prof command.
For each procedure, reports how many times the procedure
was invoked from each of its possible
callers (requires basic-block counting). For this
listing, the -exclude and -only options apply to
callees, but not to callers. (Use this option only
with the -pixie option.) Changes the library
directory search order for shared object libraries
so that prof looks for them in dir before the
library recorded in profile_file and the default
library directories. You can specify multiple
-Ldir switches to specify several directory names.
Changes the library directory search order for
shared object libraries so that prof never looks
for them in the default library directories. Use
this option when the default library directories
should not be searched and only the directories
specified by -Ldir are to be searched. Gives the
lines in order of occurrence within procedures.
The procedures are sorted in descending order of
use. Sums the sampling data files (or, in pixie
mode, the files) and writes the result into a new
file with the specified name. The -only and
-exclude options have no effect on the merged data.
Uses 1 for each basic block count. (Use this option
only with the -pixstats or -pixie option.) Prints
each procedure's starting line number if source
file information is available from the object file.
If you use one or more -only options, the profile
listing includes only the named procedures, rather
than the entire program. If any option uses an
uppercase "O" for "Only," prof uses only the named
procedures, rather than the entire program, as the
base upon which it calculates percentages. To represent
all of the variations of an overloaded C++
function name, you can specify just the part of the
name up to but not including the "(". Selects
pixie mode, as opposed to sampling mode. Selects
generation of an alternative pixie-mode report for
basic-block profiling data, as previously produced
by the pixstats(1) command. All options of the previous
version of pixstats(1) are recognized, for
compatibility. Reports time spent per procedure
(using data obtained from sampling or basic-block
counting; the listing tells which one). For basicblock
counting, this option also reports the number
of invocations per procedure, including the aggregated
invocations of any alternate entry points.
Truncates listings after n lines (if n is an integer),
after the first entry that represents less
than n percent of the total (if n is followed immediately
by a "%" character), or after enough
entries have been printed to account for n percent
of the total (if n is followed immediately by
"cum%"). For example, "-quit 15" truncates each
part of the listing after 15 lines of text, "-quit
15%" truncates each part after the first line that
represents less than 15 percent of the whole, and
"-quit 15cum%" truncates each part after the line
that brought the cumulative percentage above 15
percent. Reports all lines that never executed.
(Use this option only with the -pixie option.) For
-procedures and -invocations listings, prints cumulative
statistics for the entire object file
instead of for each procedure in the object. Generates
more analysis of a program to provide a more
accurate reading of cycles, instead of the default
which assumes each instruction executes in one
cycle. The higher the number chosen from the arguments,
the more accurate the reading, although the
profiler will run slower, and memory-access delays
are still not reflected. This option has little or
no effect on EV6 (21264) and later Alpha systems.
(Use this option only with the -pixie option.)
Updates the program executable (prog_name) with
profiling information in the specified .Counts
files, for use in future cc -feedback prog_name
command(s). This option requires that prog_name
have been compiled with the -feedback prog_name
option or updating will fail. This option will not
generate a display unless another option forcing
the display behavior is specified. (Use this option
only with the -pixie option.) Prints the tool's
version number. Prints a list of procedures that
were never invoked (requires basic-block counting).
(Use this option only with the -pixie option.)
The prof command analyzes one or more data files generated
by the compiler's execution-profiling system and produces
a listing. The prof command can also combine those data
files or produce a feedback file that lets the optimizer
take into account the program's run-time behavior during a
subsequent compilation. Profiling is a three-step process:
Compile the program Execute the program Run prof to
analyze the data.
The compiler system provides two kinds of profiling:
Interrupts the program periodically, recording the value
of the program counter. Divides the program into blocks
delimited by labels, jump instructions, and branch
instructions. It counts the number of times each block
executes.
The uprofile and kprofile tools provide a third kind of
profiling, performance counter sampling. The Alpha architecture
on-chip performance counters are used in performance
counter sampling.
The following sections describe how to perform the various
kinds of profiling.
PC-Sampling Profiles [Toc] [Back]
To use PC-sampling, compile your program with the -p
option (strictly speaking, it is sufficient to use this
option only when linking the program). Then, run the program
containing the profiling startup routine that calls
monstartup to allocate extra memory to hold the profiling
data. If the program terminates normally or calls exit(2),
it records the data in a file at the end of execution.
If your program uses shared libraries, note that only its
call-shared portion is profiled in detail. Only the total
time spent in each shared library is recorded. To individually
profile all library routines a program uses, build
the program with the -non_shared switch (by default, the
compiler produces a call-shared object unless -non_shared
is explicitly specified), or set the PROFFLAGS environment
variable as described in the Environment Variables section.
After running your program, use prof to analyze the PCsampling
data file. For example:
cc -c myprog.c cc -p -o myprog myprog.o myprog
(generates mon.out) prof myprog mon.out
When you use prof for PC-sampling, the program name
defaults to a.out. The PC-sampling data file name defaults
to mon.out; if you specify more than one PC-sampling data
file, prof reports the sum of the data.
PC-Sampling Environment Variables [Toc] [Back]
You can use environment variables to change the default PC
sampling and profile data collection behavior. The variables
are PROFDIR and PROFFLAGS. The general form for
setting these variables is: For C shell: setenv varname
"value" For Bourne shell: varname = "value"; export varname
For Korn shell: export varname = value
In the preceding example, varname can be one of the following:
This environment variable causes PC-sampling data
files to be generated with unique file names in a specified
directory.
You specify a directory path as the value and your
prof results are placed in the file path/pid.progname
where path is the pathname, pid is the process
ID of the executing program, and progname is the
program name. This environment variable can take
any of the following values: Causes a separate data
file to be generated for each thread. The name of
the data file takes the following form:
pid.sid.progname.
The form of the filename resolves to pid as the
process ID of the program, sid as the sequence number
of the thread, and progname as the name of the
program being profiled. Causes the program to
fully profile all the permanently loaded shared
libraries, in addition to the nonshared or callshared
executable. Causes the program to profile
only the named executable or shared library.
Causes the program not to profile the named executable
or shared library. Causes prof to change
the ratio of text segment stride size to PC-sample
counter buffer size, that is, the number of
instructions that are counted together in a single
counter word. The appropriate ratio involves a
tradeoff of size versus precision. Strides of 1,
2, 4, and 8 are supported. A special stride of 0
causes a single PC-sample count to be recorded for
each text segment.
The default stride is 2 for the executable, and 0
for each of its shared libraries. If -all or
-incobj are specified, all selected objects are
profiled with the same stride. Automatically
establishes monitor_signal(3) as the signal handler
for the named signal, and it causes monitor_signal(3) to zero the profile after it is written to a
file. This allows a signal to be sent several times
without the successive profiles overlapping, if the
file is renamed. The asynchronous nature of a signal
may cause small variations in the profile.
Unrecognized signal-names are ignored. The
-threads option is ignored if combined with -sigdump.
Specifies the directory path in which the
profiling data file or files are created. [Disables]
or enables the addition of the process-id
number to the name of the profiling data file or
files.
You can use the PROFDIR and PROFFLAGS environment variables
together. For more information, see the Programmer's
Guide.
Basic-Block Counting [Toc] [Back]
To use basic-block counting, compile your program without
the option -p. Use the pixie program to translate your
program into a profiling version and generate a file
(prog_name.Addrs) containing block addresses. Then, run
the pixie version of the program, which (assuming the program
terminates normally or calls exit(2)) will generate
a file (prog_name\.Counts) containing block counts.
After running the pixie version of your program, use prof
with the -pixie option to analyze the and files. Notice
that you must specify the name of your original program,
not the name of the version. For example:
cc -c myprog.c cc -o myprog myprog.o pixie myprog
(generates myprog.Addrs and myprog.pixie)
myprog.pixie (generates myprog.Counts)
prof -pixie myprog myprog.Addrs myprog.Counts
When you use prof with the -pixie option, the file name
defaults to prog_name.Addrs, and the file name defaults
to prog_name.Counts. Note that, when the file name
defaults to prog_name.Counts, prof does not attach any
path prefix to prog_name, and it looks for the file in the
current working directory. If you specify more than one
file, prof reports the sum of the data.
For each shared library selected for profiling, the prof
command searches for an file in the following locations if
the file location is not explicitly specified on the command
line: Current directory Directory in which the object
file is located if the location of the object file is
explicitly specified on the command line Directory in
which pixie created it, as recorded in the file
For each selected shared library, the prof command
searches for an object file in the following locations:
Directories specified in -Ldir options Directory in which
pixie found it, as recorded in the file, if the -L option
is specified Standard library search directories, as
searched by ld, if the -L option is not specified
Basic-Block Statistics [Toc] [Back]
Use the -pixstats option to get an alternative profile.
All options of the previous version of the pixstats(1)
command are recognized, for compatibility.
If a disassembly is requested, all basic blocks (or those
whose execution count exceeds the -dislimit percentage of
total instructions) are disassembled, in increasing
address order. Each block is labeled with its procedure
name and any offset from the start of the procedure. For
each instruction, the relative estimated CPU cycle at
which the instruction executes is printed, plus its source
line, address, binary code, and assembly language. The
total CPU cycles used by one execution of the block, the
number of times it was executed, and its percentage of all
instructions executed are printed at the end of the block,
following any line reporting a non-zero delay caused to a
follow-on block.
The main report begins with a record of the command line.
This is followed by a summary of the program's behavior:
Total CPU cycles used by the profiled objects, plus the
equivalent number of seconds Total number of instructions
executed Total delay caused by instructions executed in
the preceding basic block Total integer and floating-point
no-op, arithmetic and logical, logical, shift, load,
store, load and store, load followed by load, load and
store and fetch (data bus use), load and store relative to
the stack or global pointers, floating-point, floatingpoint
compare, conditional branch instructions executed
(itemized). Also, total number of branch instructions executed
whose target instruction is another branch; and
total number of such branches that are estimated to be
taken, rather than executing the next instruction in line.
Total basic blocks, procedure calls, and branches that
skip a single instruction that were executed.
Next, some ratios are printed: Stores : stores + loads
Instructions : basic block Instructions : branches Backward
branches : branches CPU cycles : procedure calls
Instructions : procedure calls Integer no-ops : integer
and floating-point no-ops Floating-point no-ops : integer
and floating-point no-ops Floating-point pipeline interlocks
: floating-point operators
Next, basic blocks are analyzed according to how many
instructions they contain. For each size, pixstats reports
the execution count, its precentage and cumulative percentage
relative to both instructions and basic blocks,
the number of instructions contained in blocks of that
size, the percentage and cumulative percentage of this
relative to all instructions, and the CPU-cycle cost per
instruction of blocks of that size. Then, pixstats prints
various averages and quartiles of basic block size, plus
the largest basic block execution count encountered (to
indicate the chance of integer overflow in the analysis).
Next, pixstats analyzes the number of registers (integer
and floating-point) that are saved on procedure entry (and
restored on exit). It prints the number of procedure
entries that save a given number of registers, and the
percentage and cumulative percentage of this relative to
all procedure entries, all registers saved, and all
instructions executed. Finally, it prints some averages
and ratios.
The next two tables contain information on the sizes of
executed procedures' stack frames and the frequency of
execution of each kind of instruction. Frame sizes are
reported in "bits"; for example, 6 bits means a 32- to
48-byte stack frame. The number, percentage, and cumulative
percentage of executed calls to procedures with the
given frame size is printed. Similarly, the execution
count is printed for each machine instruction code, but
this table is ordered by decreasing usage.
The next four tables are similar. They provide information
about the size of literals used by various categories of
Alpha instructions: ADD,SUB,CMP instructions
AND,BIC,BIS,XOR,CMOV instructions MUL instructions
SHIFT,EXT,INS,MSK,ZAP instructions
(Note that a table may be omitted if there is no use of
literals in the program for the particular instruction
category). For each of these tables the size of the literal
is reported in bits (for example, 4 bits means the
literal is greater than or equal to 8 and less than 16).
The next six tables are similar. They contain information
on the size of the memory displacement from a base register:
LDA displacement from 0 (used like a load immediate
instruction) LDAH displacement from 0 (used like a load
immediate high) Branch SP-based load/store (load or store
within a stack frame) GP-based load/store (load or store
within a global offset table) All load or store instructions
Again, the "size" of the displacement is reported in bits;
for example, 6 bits means a 32 to 63 byte displacement.
For both positive displacements (in the "0-extend" column)
and negative displacements (in the "1-extend" column), the
execution count is printed along with percentage and cumulative
percentage. The summed cumulative percentage is
printed last (in the "Total" column).
In the "static" analysis of instructions, each instruction
is counted once per executed basic-block. The "static"
distribution will be the same as the regular opcode distribution
when -nocounts is specified. Following "static"
totals for instructions and basic blocks, the number and
percentage of each instruction code is listed.
The next two tables contain information on how many times
each integer and floating-point register was accessed,
plus its percentage, ordered by register number. For
integer registers, the number and percent of uses as a
base register in memory operations is also listed.
Finally, pixstats prints a flat profile of CPU cycles used
by procedures. This includes the CPU cycles used by the
procedure, the percentage of the total, the cumulative
percentage, the number of instructions executed as part of
the procedure, its average number of CPU cycles per
instruction, the number of calls made to the procedure,
the average number of CPU cycles per call, and the procedure
name. If -numbers is specified, the object and source
file names and line number are also printed.
Performance Counter Samples [Toc] [Back]
After running the uprofile or kprofile utility to collect
profiling data or your program or the kernel, respectively,
run prof to examine the resulting mon.out or
kmon.out file, as follows: For uprofile output: prof
prog_name mon.out For kprofile output: prof /vmunix
kmon.out
Use prof as for PC sampling, except that only the executable
has a profile. Old performance counter sample
data files, generated on versions of the operating system
prior to DIGITAL UNIX Version 4.0, must be analyzed as if
they contained PC-sampling data.
The -pixstats option models execution assuming a perfect
memory system. Memory system events such as cache misses
will increase execution above the -pixstats predictions.
The set of statistics reported by the -pixstats option and
the format of the report are the same as for previous versions
of the pixstats(1) command, but note the following:
The labels on disassembled basic blocks take the form procedure-name
(or proc_at_0x... if no symbol is available)
for an initial block and procedure-name+offset for subsequent
blocks. All reported cycles reflect CPU pipeline
interlocks, so they usually do not match the reported
instruction counts. If not all the shared objects used by
a program are profiled, the procedure-call counts may be
smaller than the jsr/bsr instruction counts.
Normal startup code Startup code for PC-sampling Library
for PC-sampling Default kprofile data file Default PC-sampling
data file Default uprofile data file
Introduction: prof_intro(1)
Commands: as(1), cc(1), gprof(1), pixie(1), uprofile(1),
kprofile(1), dxprof(1). (dxprof is available as an
option.)
Functions: monitor(3), profil(2)
Programmer's Guide
prof(1)
[ Back ] |