hiprof - CPU-time and page-fault call-graph profiler for
performance analysis
hiprof [-flat] [-pthread | -threads] [hiprof-option...]
[gprof-option...] program [argument...]
hiprof { -cycles | -faults } [hiprof-option...] [gprofoption...]
program [argument...]
See the start of the OPTIONS section for details of hiprof
options that may be essential for the correct execution of
the program.
The atom -tool hiprof interface is still available, for
compatibility with earlier releases. However, it is now
undocumented, and it will be retired in a future release.
See prof_intro(1) for an introduction to the application
performance tuning tools provided with Tru64 UNIX.
The hiprof command creates an instrumented version of a
program (program.hiprof) that produces call-graph and flat
profiles of one of a range of performance statistics: The
CPU time spent in each procedure (or optionally, each
source line or instruction), measured by sampling the program
counter about every millisecond (the default) The CPU
time spent in each procedure and procedure call, measured
as machine cycles, including the effects of any memoryaccess
delays (with the -cycles option) The number of page
faults that occur during each procedure and procedure call
(with the -faults option)
See the limitations of each performance statistic in the
RESTRICTIONS section below.
If you specify program arguments (argument...) or -run,
the instrumented program is also executed.
If you specify -display or any of the gprof-options, the
hiprof command runs the instrumented program and then displays
the profile by running the gprof tool (with any
specified gprof-options).
If you omit the program name, a usage message is printed.
The following example shows how to instrument, run, and
display the profile for a multithreaded program: cc *.c
-pthread -L. -g1 -O2 -o program -lapp1 -lapp2 hiprof
-pthread -L. -all program data/*
The -all option request that all shared libraries be profiled,
but threads-related system libraries cannot be
safely instrumented to count procedure calls that are
needed to print a call graph. By default, these libraries
are still sampled to provide flat CPU-time profiles. The
-cycles and -faults options cannot be used with threaded
programs, but the displayed time or page-fault count for a
procedure includes the time or count for any procedures
that it calls but that were not selected for
instrumentation--for example, any procedures in libraries
not selected by the -all or -incobj options. This means
that time is not lost from these profiles by excluding
shared libraries.
File name of a fully linked call-shared or nonshared executable
to be profiled. This program should be compiled
with the -g or -gn option (n>=1) to obtain more complete
profiling information. If the default symbol table level
(-g0) is used, line number information, static procedure
names, and file names are unavailable. Inlined procedure
calls are also unavailable.Programs that are stripped or
are optimized by spike or cc -om are not supported. All
arguments following the program name are considered to be
arguments needed by the instrumented program to execute
the procedures, lines, and instructions of interest. Multiple
arguments can be specified. They imply -run if any
are specified, and they can be replaced by -run if none
are needed.
Options can be abbreviated to three characters. The gprofoptions,
which are provided as alternatives to the -display
option, can be abbreviated to one character.
For options that specify a procedure name (proc), C++ procedures
can omit the argument type list, though this will
match all overloaded procedures with that name. To select
a specific procedure, specify the full symbol name (as
printed by the nm command). Symbol names containing
spaces, asterisks, and so on must be quoted.
Essential Options [Toc] [Back]
Some or all of these options may be needed to prevent the
instrumented program from malfunctioning: Specify -pthread
if the program or any of its libraries calls pthread_create(3) (for example, if it was compiled with either the
-pthread option or the -threads compatibility option).
This will make the collection of profile data thread-safe.
The -fork option is maintained for compatibility with earlier
releases. By default, hiprof now profiles subprocesses
that do not call exec(2), and produces separate
profiling data files for the forked subprocesses, including
the process id in their file names as if -pids was
specified. By default, the hiprof code running in the
program's process allocates memory for its own use at
address 38000000000. If the program needs to use memory
between 38000000000 and 3ff00000000, specify the address
that the hiprof code should use. Specify -sigdump to
force the instrumented program to write the current profile
data to its file(s) on receipt of the named signal.
By default, the program writes the profiling data file(s)
only when the process terminates, but some processes never
terminate normally, so this option lets you generate the
file(s) on demand. After a file is written, the instruction
counts of the profile are all set to zero; so by
sending two signals, any interval of a test run can be
profiled, with the second signal's file(s) overwriting the
first. For example, to use the default kill pid command to
signal the program, specify -sigdump TERM. Choose a signal
that the program does not use for another purpose.
Profiling Statistics Options [Toc] [Back]
Generates a flat profile; that is, it avoids the intrusiveness
of collecting the default call-graph information.
If the -display option is specified, it defaults to gprof
-procedures. Do not use the -flat option with the -cycles
or -faults options. Profiles CPU time by counting the
machine cycles used in each procedure call. Use this
option only for non-threaded programs. Profiles page
faults that occur during each procedure instead of the
default time spent in each procedure. Use this option
only for nonthreaded programs.
File Generating Options [Toc] [Back]
Does not print informational and progress messages on the
standard error stream. Prints the command lines used to
instrument the program and to execute the instrumented
program. Prints the names of any procedures that were not
instrumented. Names the instrumented program file instead
of the default program.hiprof. Specifies the directory to
which the instrumented program writes the profiling data
file(s) for each test run. The default is the current
directory. Adds the process-id of the instrumented program's
test run to the name of the profiling data file
produced (that is, program.pid.hiout). By default, the
file is named program.hiout. When profiling a threaded
program, specify -threads to produce a separate profile
for each pthread in the program. The files are named program[.pid].sequence.hiout,
where sequence is the thread
sequence number assigned by pthread_create(3). The
-threads option implies the -pthread option. If -sigdump
is needed, -pthread is recommended instead of -threads, to
avoid possible synchronization problems.
Shared-Library Profiling Options [Toc] [Back]
Profiles all of the shared libraries in addition to the
program's executable. If -all was specified, does not
profile the shared library lib. Can be repeated to exclude
multiple libraries. Profiles the shared library lib. Can
be repeated to include multiple libraries. Searches for
shared-libraries in the specified directory before searching
the default directories. Can be repeated to make a
search path. Use the same options that were used when
linking the program with ld. Does not instrument the procedure
proc. This option can be used to exclude procedures
that are uninteresting or that interfere with the instrumentation
(such as nonstandard assembly code).
Execution Control Options [Toc] [Back]
Prints the tool's version number. Executes the instrumented
program, even if no arguments are specified. By
default, the program is only instrumented (for later execution).
Executes the instrumented program, and runs
gprof with default options on the resulting file(s). Executes
the instrumented program, and runs gprof on the
resulting file(s). The following gprof options are supported:
Profiles each instruction within selected procedures.
Does not report on called procedures. Excludes
procedure proc and its descendants from the profile, but
totals all procedures. Includes only procedure proc and
its descendants in the profile, but totals all procedures.
Profiles procedures as an indexed call graph (default).
Profiles source lines, listing the most heavily used
first. Profiles source lines, in order within selected
procedures. Merges all input files into file. Prints
each procedure's starting line number. Profiles
procedures, listing the most heavily used first (default).
Profiles the whole executable and any shared libraries.
Reports procedures that were never called.
If hiprof finds any previously instrumented shared
libraries in the working directory, it will reuse them if
they meet current requirements, to reduce re-instrumentation
costs.
Temporary instrumentation files are created in /tmp. Set
the TMPDIR environment variable to a different directory
to create the files elsewhere, for example, in a disk partition
with more space.
The default sampled profile only estimates the CPU time
spent in each procedure call; profiles made with the
-cycles and -faults options measure it.
When timing a program's procedures by measuring machine
cycles (with the -cycles option), the 32-bit cycle-counting
hardware will wrap if no procedure call or return is
executed by the program every few seconds -- for example,
because of a long-running loop. If the counter wraps, the
profile will be incorrect. Using the -all or -incobj
options to profile all nonsystem libraries and procedures
can help avoid this restriction.
The -cycles option generates an inaccurate profile if the
instrumented program is run on a system whose processors
have different cycle speeds. This inaccuracy can be
avoided by using hiprof's default sampling profiler or the
cc -p/-pg profilers instead, or by running the application
on a subset of the processors: Select a single processor
using the runon command. Check the processor speeds using
the psrinfo -v command and run the application in a processor
set comprising only processors that run at the same
speed (see processor_sets(4)).
Approximate performance estimates are as follows but will
vary according to the application and the machine's CPU
count, type, and clock rate. The hiprof instrumentation
takes ~2s per Mb of program file on a 500-MHz EV6 (21264)
Alpha system, using ~10 Mb of memory plus another ~10 Mb
per Mb of the largest file. The instrumented files are
~20% larger than the originals, plus ~1 Mb of hiprof code.
They run ~4 times slower. By default, each profile data
file is at least the size of the instrumented code (and
uses this much memory), but these files are very small for
the -cycles and -faults options.
If a procedure contains interprocedural branches or interprocedural
jumps, that procedure will not be instrumented
with the -cycles or -faults option, and no information
will be reported about that procedure. Use the -v option
to see which procedures were not instrumented. Compilers
can optimize return statements or non-returning function
calls to interprocedural branches. To avoid this, recompile
with the -O0 or -no_inline option.
Instrumented version of program produced by hiprof Profile
data file produced by program.hiprof Instrumented shared
libraries produced by hiprof Temporary file created and
deleted in the current and -dirname path directories.
Introduction: prof_intro(1)
atom(1), cc(1), dxprof(1), fork(2), gprof(1), kill(1),
ld(1), pixie(1), processor_sets(4), psrinfo(1),
pthread(3), runon(1), uprofile(1). (dxprof is available
as an option.)
Programmer's Guide
hiprof(1)
[ Back ] |