*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> prof (1)              
Title
Content
Arch
Section
 

prof(1)

Contents


NAME    [Toc]    [Back]

       prof, pixstats - Analyzes profile data

SYNOPSIS    [Toc]    [Back]

       prof [options] [prog_name [PC-sampling_data_file]...]

       prof   -pixie    [options]   [prog_name   [Addrs_file    |
       Counts_file]...]

       prof  -pixstats   [options]  [prog_name   [Addrs_file    |
       Counts_file]...]

       pixstats     [options]     [prog_name     [Addrs_file    |
       Counts_file]...]

OPERANDS    [Toc]    [Back]

       Name of the program executable to be profiled.  This  program
  should  be compiled with the -g1, -g2, or -g3 option
       to obtain more complete  profiling  information.   If  the
       default  symbol table level (-g0) has been used, line number
 information, static procedure names,  and  file  names
       are  unavailable to the profiling code.  Name of a profiling
 data file (default mon.out) produced  by  executing  a
       program that has been linked with the cc -p command.  Name
       of an instruction-counts file produced by executing a program
   that  has  been  instrumented  with  pixie.  If  no
       Counts_file or Addrs_file is  specified,  prog_name.Counts
       is  used  if found in the current working directory.  Name
       of an instruction-address  file  produced  when  the  executable
  or  shared  library  object  is instrumented with
       pixie. By default, the path of each object.Addrs file will
       be  recorded in the Counts_file, so they do not need to be
       specified.  The  order  of  precedence  for   finding   an
       Addrs_file  is  as  follows:  Addrs_file path specified on
       command line, current directory, directory of object specified
 in command line argument, directory where pixie created
 it.

OPTIONS    [Toc]    [Back]

       For each prof option, you need to type only enough of  the
       name  to  distinguish it from the other options. If you do
       not specify any options, prof uses -procedures by default.
       Always  specify  -pixie  or -pixstats when you process and
       files.

       The prof command accepts the following options: Causes the
       profiles  for  all  shared libraries (if any) described in
       the data file(s) to be displayed, in addition to the  profile
 for the executable.  Causes the profiler to print the
       assembly instructions for each subroutine along  with  the
       cycle  counts  for  each  instruction. The subroutines are
       sorted from highest cycle count to  lowest.  The  instructions
  for  each subroutine are printed in order; they are
       not sorted by cycle count.

              When used without the -pixie option for  a  PC-sampling
  profile,  the CPU time used by each instruction
 is presented in milliseconds.   (For  uprofile
              and  kprofile,  per-instruction  sample  counts are
              also provided for events other than time.)   Alters
              the appropriate parts of the listing to reflect the
              clock speed of the CPU. By default, the cycle  time
              of  the processor on which program was run is used.
              (Use this option  only  with  the  -pixie  option.)
              Disassembles  and  shows  the analyzed object code.
              (Use this option only with the  -pixstats  option.)
              Limits the disassembly to blocks with f% frequency.
              (Use this option only with the  -pixstats  option.)
              If  you  use one or more -exclude options, the profiler
 omits the specified procedure and its descendents
  from  the  listing.   If  any option uses an
              uppercase "E" (for "Exclude"), prof also omits that
              procedure  from  the  base upon which it calculates
              percentages. To represent all of the variations  of
              an  overloaded  C++  function name, you can specify
              just the part of the name up to but  not  including
              the  "(".   Causes  the  profile for the named executable
 or shared library not to be  printed.   You
              can use this option multiple times in a single prof
              command.  Produces a file with information that the
              compiler  system  can  use to decide which parts of
              the program will benefit most from global optimization
 and which parts will benefit most from in-line
              procedure substitution (requires basic-block counting).
   (Use  this  option  only  with  the  -pixie
              option.)

              This option is for compilers whose -feedback option
              requires a feedback file (rather than an executable
              file) and that do not support  the  prof  command's
              -update  option.   For  compilers  that support the
              -update option,  better  results  can  be  achieved
              using  that  option instead of the (prof) -feedback
              option.  Reports the most  heavily  used  lines  in
              descending  order  of  use.  Causes the profile for
              the named shared library to be printed, in addition
              to the profile for the executable. You can use this
              option multiple times in  a  single  prof  command.
              For each procedure, reports how many times the procedure
  was  invoked  from  each  of  its  possible
              callers  (requires basic-block counting).  For this
              listing, the -exclude and -only  options  apply  to
              callees, but not to callers.  (Use this option only
              with  the  -pixie  option.)   Changes  the  library
              directory  search order for shared object libraries
              so that prof looks  for  them  in  dir  before  the
              library  recorded  in  profile_file and the default
              library  directories.   You  can  specify  multiple
              -Ldir  switches to specify several directory names.
              Changes the  library  directory  search  order  for
              shared  object  libraries  so that prof never looks
              for them in the default library  directories.   Use
              this  option  when  the default library directories
              should not be searched  and  only  the  directories
              specified  by  -Ldir are to be searched.  Gives the
              lines in order  of  occurrence  within  procedures.
              The  procedures  are  sorted in descending order of
              use.  Sums the sampling data files  (or,  in  pixie
              mode,  the  files) and writes the result into a new
              file  with  the  specified  name.  The  -only   and
              -exclude options have no effect on the merged data.
              Uses 1 for each basic block count. (Use this option
              only  with the -pixstats or -pixie option.)  Prints
              each procedure's starting  line  number  if  source
              file information is available from the object file.
              If you use one or more -only options,  the  profile
              listing  includes only the named procedures, rather
              than the entire program.  If  any  option  uses  an
              uppercase  "O" for "Only," prof uses only the named
              procedures, rather than the entire program, as  the
              base  upon which it calculates percentages. To represent
 all of the variations of an  overloaded  C++
              function name, you can specify just the part of the
              name up to but  not  including  the  "(".   Selects
              pixie  mode,  as opposed to sampling mode.  Selects
              generation of an alternative pixie-mode report  for
              basic-block  profiling data, as previously produced
              by the pixstats(1) command. All options of the previous
  version  of  pixstats(1) are recognized, for
              compatibility.  Reports time  spent  per  procedure
              (using  data  obtained from sampling or basic-block
              counting; the listing tells which one). For  basicblock
 counting, this option also reports the number
              of invocations per procedure, including the  aggregated
  invocations  of  any alternate entry points.
              Truncates listings after n lines (if n is an  integer),
  after  the  first entry that represents less
              than n percent of the total (if n is followed immediately
  by  a  "%"  character),  or  after  enough
              entries have been printed to account for n  percent
              of  the  total  (if  n  is  followed immediately by
              "cum%").  For example, "-quit  15"  truncates  each
              part  of the listing after 15 lines of text, "-quit
              15%" truncates each part after the first line  that
              represents  less  than 15 percent of the whole, and
              "-quit 15cum%" truncates each part after  the  line
              that  brought  the  cumulative  percentage above 15
              percent.  Reports all lines  that  never  executed.
              (Use this option only with the -pixie option.)  For
              -procedures and -invocations listings, prints cumulative
   statistics  for  the  entire  object  file
              instead of for each procedure in the object.   Generates
 more analysis of a program to provide a more
              accurate reading of cycles, instead of the  default
              which  assumes  each  instruction  executes  in one
              cycle. The higher the number chosen from the  arguments,
  the more accurate the reading, although the
              profiler will run slower, and memory-access  delays
              are  still not reflected. This option has little or
              no effect on EV6 (21264) and later  Alpha  systems.
              (Use  this  option  only  with  the -pixie option.)
              Updates the  program  executable  (prog_name)  with
              profiling  information  in  the  specified  .Counts
              files, for use in  future  cc  -feedback  prog_name
              command(s).  This  option  requires  that prog_name
              have been compiled  with  the  -feedback  prog_name
              option  or updating will fail. This option will not
              generate a display unless  another  option  forcing
              the display behavior is specified. (Use this option
              only with the -pixie option.)   Prints  the  tool's
              version  number.   Prints a list of procedures that
              were never invoked (requires basic-block counting).
              (Use this option only with the -pixie option.)

DESCRIPTION    [Toc]    [Back]

       The prof command analyzes one or more data files generated
       by the compiler's execution-profiling system and  produces
       a  listing.  The  prof command can also combine those data
       files or produce a feedback file that lets  the  optimizer
       take into account the program's run-time behavior during a
       subsequent compilation.  Profiling is  a  three-step  process:
  Compile the program Execute the program Run prof to
       analyze the data.

       The compiler  system  provides  two  kinds  of  profiling:
       Interrupts  the  program periodically, recording the value
       of the program counter.  Divides the program  into  blocks
       delimited   by   labels,  jump  instructions,  and  branch
       instructions. It counts the number  of  times  each  block
       executes.

       The  uprofile  and  kprofile tools provide a third kind of
       profiling, performance counter sampling. The Alpha  architecture
  on-chip  performance counters are used in performance
 counter sampling.

       The following sections describe how to perform the various
       kinds of profiling.

   PC-Sampling Profiles    [Toc]    [Back]
       To  use  PC-sampling,  compile  your  program  with the -p
       option (strictly speaking, it is sufficient  to  use  this
       option  only when linking the program). Then, run the program
 containing the profiling startup routine  that  calls
       monstartup  to allocate extra memory to hold the profiling
       data. If the program terminates normally or calls exit(2),
       it records the data in a file at the end of execution.

       If  your program uses shared libraries, note that only its
       call-shared portion is profiled in detail. Only the  total
       time spent in each shared library is recorded. To individually
 profile all library routines a program  uses,  build
       the  program  with the -non_shared switch (by default, the
       compiler produces a call-shared object unless  -non_shared
       is explicitly specified), or set the PROFFLAGS environment
       variable as described in the  Environment  Variables  section.


       After  running  your  program, use prof to analyze the PCsampling
 data file. For example:

       cc  -c  myprog.c  cc  -p   -o   myprog   myprog.o   myprog
                      (generates mon.out) prof myprog mon.out

       When  you  use  prof  for  PC-sampling,  the  program name
       defaults to a.out. The PC-sampling data file name defaults
       to  mon.out; if you specify more than one PC-sampling data
       file, prof reports the sum of the data.

   PC-Sampling Environment Variables    [Toc]    [Back]
       You can use environment variables to change the default PC
       sampling  and  profile data collection behavior. The variables
 are PROFDIR and PROFFLAGS.   The  general  form  for
       setting  these  variables  is: For C shell: setenv varname
       "value" For Bourne shell: varname = "value";  export  varname
 For Korn shell: export varname = value

       In  the  preceding example, varname can be one of the following:
 This environment variable causes PC-sampling  data
       files  to  be generated with unique file names in a specified
 directory.

              You specify a directory path as the value and  your
              prof  results are placed in the file path/pid.progname
 where path is the pathname, pid is the process
              ID  of  the  executing program, and progname is the
              program name.  This environment variable  can  take
              any of the following values: Causes a separate data
              file to be generated for each thread. The  name  of
              the   data   file   takes   the   following   form:
              pid.sid.progname.

              The form of the filename resolves  to  pid  as  the
              process ID of the program, sid as the sequence number
 of the thread, and progname as the name of  the
              program  being  profiled.   Causes  the  program to
              fully profile all  the  permanently  loaded  shared
              libraries,  in  addition  to the nonshared or callshared
 executable.  Causes the program  to  profile
              only   the  named  executable  or  shared  library.
              Causes the program not to profile  the  named  executable
  or  shared library.  Causes prof to change
              the ratio of text segment stride size to  PC-sample
              counter   buffer  size,  that  is,  the  number  of
              instructions that are counted together in a  single
              counter  word.  The  appropriate  ratio  involves a
              tradeoff of size versus precision.  Strides  of  1,
              2,  4,  and 8 are supported.  A special stride of 0
              causes a single PC-sample count to be recorded  for
              each text segment.

              The  default  stride is 2 for the executable, and 0
              for each  of  its  shared  libraries.  If  -all  or
              -incobj  are  specified,  all  selected objects are
              profiled  with  the  same  stride.    Automatically
              establishes monitor_signal(3) as the signal handler
              for the named signal, and  it  causes  monitor_signal(3) to zero the profile after it is written to a
              file. This allows a signal to be sent several times
              without the successive profiles overlapping, if the
              file is renamed. The asynchronous nature of a  signal
  may  cause  small  variations  in the profile.
              Unrecognized   signal-names   are   ignored.    The
              -threads  option  is ignored if combined with -sigdump.
  Specifies the directory path  in  which  the
              profiling  data  file  or files are created.  [Disables]
 or enables the addition  of  the  process-id
              number  to  the  name of the profiling data file or
              files.

       You can use the PROFDIR and  PROFFLAGS  environment  variables
 together. For more information, see the Programmer's
       Guide.

   Basic-Block Counting    [Toc]    [Back]
       To use basic-block counting, compile your program  without
       the  option  -p.  Use  the pixie program to translate your
       program into a  profiling  version  and  generate  a  file
       (prog_name.Addrs)  containing  block  addresses. Then, run
       the pixie version of the program, which (assuming the program
  terminates  normally or calls exit(2)) will generate
       a file (prog_name\.Counts) containing block counts.

       After running the pixie version of your program, use  prof
       with  the  -pixie option to analyze the and files.  Notice
       that you must specify the name of your  original  program,
       not the name of the version. For example:

       cc   -c  myprog.c  cc  -o  myprog  myprog.o  pixie  myprog
              (generates    myprog.Addrs    and     myprog.pixie)
       myprog.pixie                    (generates  myprog.Counts)
       prof -pixie myprog myprog.Addrs myprog.Counts

       When you use prof with the -pixie option,  the  file  name
       defaults  to  prog_name.Addrs,  and the file name defaults
       to  prog_name.Counts.   Note  that,  when  the  file  name
       defaults  to  prog_name.Counts,  prof  does not attach any
       path prefix to prog_name, and it looks for the file in the
       current  working  directory.  If you specify more than one
       file, prof reports the sum of the data.

       For each shared library selected for profiling,  the  prof
       command searches for an file in the following locations if
       the  file location is not explicitly specified on the command
 line: Current directory Directory in which the object
       file is located if the location  of  the  object  file  is
       explicitly  specified  on  the  command  line Directory in
       which pixie created it, as recorded in the file

       For  each  selected  shared  library,  the  prof   command
       searches  for  an  object file in the following locations:
       Directories specified in -Ldir options Directory in  which
       pixie  found it, as recorded in the file, if the -L option
       is  specified  Standard  library  search  directories,  as
       searched by ld, if the -L option is not specified

   Basic-Block Statistics    [Toc]    [Back]
       Use  the  -pixstats  option to get an alternative profile.
       All options of the previous  version  of  the  pixstats(1)
       command are recognized, for compatibility.

       If  a disassembly is requested, all basic blocks (or those
       whose execution count exceeds the -dislimit percentage  of
       total   instructions)   are  disassembled,  in  increasing
       address order. Each block is labeled  with  its  procedure
       name  and  any offset from the start of the procedure. For
       each instruction, the  relative  estimated  CPU  cycle  at
       which the instruction executes is printed, plus its source
       line, address, binary code, and  assembly  language.   The
       total  CPU  cycles used by one execution of the block, the
       number of times it was executed, and its percentage of all
       instructions executed are printed at the end of the block,
       following any line reporting a non-zero delay caused to  a
       follow-on block.

       The  main report begins with a record of the command line.
       This is followed by a summary of the  program's  behavior:
       Total  CPU  cycles  used by the profiled objects, plus the
       equivalent number of seconds Total number of  instructions
       executed  Total  delay  caused by instructions executed in
       the preceding basic block Total integer and floating-point
       no-op,  arithmetic  and  logical,  logical,  shift,  load,
       store, load and store, load followed  by  load,  load  and
       store and fetch (data bus use), load and store relative to
       the stack or global  pointers,  floating-point,  floatingpoint
  compare,  conditional  branch instructions executed
       (itemized). Also, total number of branch instructions executed
  whose  target  instruction  is  another branch; and
       total number of such branches that  are  estimated  to  be
       taken, rather than executing the next instruction in line.
       Total basic blocks, procedure  calls,  and  branches  that
       skip a single instruction that were executed.

       Next,  some  ratios  are  printed: Stores : stores + loads
       Instructions : basic block Instructions :  branches  Backward
  branches  :  branches  CPU  cycles : procedure calls
       Instructions : procedure calls Integer  no-ops  :  integer
       and  floating-point no-ops Floating-point no-ops : integer
       and floating-point no-ops Floating-point  pipeline  interlocks
 : floating-point operators

       Next,  basic  blocks  are  analyzed  according to how many
       instructions they contain. For each size, pixstats reports
       the  execution  count,  its precentage and cumulative percentage
 relative to both instructions  and  basic  blocks,
       the  number  of  instructions  contained in blocks of that
       size, the percentage and  cumulative  percentage  of  this
       relative  to  all instructions, and the CPU-cycle cost per
       instruction of blocks of that size. Then, pixstats  prints
       various  averages  and quartiles of basic block size, plus
       the largest basic block execution  count  encountered  (to
       indicate  the chance of integer overflow in the analysis).

       Next, pixstats analyzes the number of  registers  (integer
       and floating-point) that are saved on procedure entry (and
       restored on exit).  It  prints  the  number  of  procedure
       entries  that  save  a  given number of registers, and the
       percentage and cumulative percentage of this  relative  to
       all  procedure  entries,  all  registers  saved,  and  all
       instructions executed. Finally, it  prints  some  averages
       and ratios.

       The  next  two  tables contain information on the sizes of
       executed procedures' stack frames  and  the  frequency  of
       execution  of  each  kind  of instruction. Frame sizes are
       reported in "bits"; for example, 6 bits  means  a  32-  to
       48-byte  stack  frame. The number, percentage, and cumulative
 percentage of executed calls to procedures  with  the
       given  frame  size  is  printed.  Similarly, the execution
       count is printed for each machine  instruction  code,  but
       this table is ordered by decreasing usage.

       The next four tables are similar. They provide information
       about the size of literals used by various  categories  of
       Alpha      instructions:      ADD,SUB,CMP     instructions
       AND,BIC,BIS,XOR,CMOV   instructions    MUL    instructions
       SHIFT,EXT,INS,MSK,ZAP instructions

       (Note  that  a  table may be omitted if there is no use of
       literals in the program  for  the  particular  instruction
       category).  For  each of these tables the size of the literal
 is reported in bits (for example, 4  bits  means  the
       literal is greater than or equal to 8 and less than 16).

       The next six tables are similar.  They contain information
       on the size of the memory displacement from a base  register:
  LDA  displacement from 0 (used like a load immediate
       instruction) LDAH displacement from 0 (used  like  a  load
       immediate  high) Branch SP-based load/store (load or store
       within a stack frame) GP-based load/store (load  or  store
       within  a  global offset table) All load or store instructions


       Again, the "size" of the displacement is reported in bits;
       for  example,  6  bits means a 32 to 63 byte displacement.
       For both positive displacements (in the "0-extend" column)
       and negative displacements (in the "1-extend" column), the
       execution count is printed along with percentage and cumulative
  percentage.  The  summed  cumulative percentage is
       printed last (in the "Total" column).

       In the "static" analysis of instructions, each instruction
       is  counted  once  per  executed basic-block. The "static"
       distribution will be the same as the regular  opcode  distribution
  when -nocounts is specified. Following "static"
       totals for instructions and basic blocks, the  number  and
       percentage of each instruction code is listed.

       The  next two tables contain information on how many times
       each integer and  floating-point  register  was  accessed,
       plus  its  percentage,  ordered  by  register number.  For
       integer registers, the number and percent  of  uses  as  a
       base register in memory operations is also listed.

       Finally, pixstats prints a flat profile of CPU cycles used
       by procedures.  This includes the CPU cycles used  by  the
       procedure,  the  percentage  of  the total, the cumulative
       percentage, the number of instructions executed as part of
       the  procedure,  its  average  number  of  CPU  cycles per
       instruction, the number of calls made  to  the  procedure,
       the  average number of CPU cycles per call, and the procedure
 name. If -numbers is specified, the object and source
       file names and line number are also printed.

   Performance Counter Samples    [Toc]    [Back]
       After  running the uprofile or kprofile utility to collect
       profiling data or your  program  or  the  kernel,  respectively,
  run  prof  to  examine  the  resulting mon.out or
       kmon.out file,  as  follows:  For  uprofile  output:  prof
       prog_name   mon.out  For  kprofile  output:  prof  /vmunix
       kmon.out

       Use prof as for PC sampling, except  that  only  the  executable
  has  a  profile.   Old performance counter sample
       data files, generated on versions of the operating  system
       prior  to DIGITAL UNIX Version 4.0, must be analyzed as if
       they contained PC-sampling data.

RESTRICTIONS    [Toc]    [Back]

       The -pixstats option models execution assuming  a  perfect
       memory  system.  Memory system events such as cache misses
       will increase execution above the -pixstats predictions.

       The set of statistics reported by the -pixstats option and
       the format of the report are the same as for previous versions
 of the pixstats(1) command, but note the  following:
       The labels on disassembled basic blocks take the form procedure-name
 (or proc_at_0x...  if no symbol is  available)
       for  an initial block and procedure-name+offset for subsequent
 blocks.  All reported cycles  reflect  CPU  pipeline
       interlocks,  so  they  usually  do  not match the reported
       instruction counts.  If not all the shared objects used by
       a  program  are profiled, the procedure-call counts may be
       smaller than the jsr/bsr instruction counts.

FILES    [Toc]    [Back]

       Normal startup code Startup code for  PC-sampling  Library
       for PC-sampling Default kprofile data file Default PC-sampling
 data file Default uprofile data file

SEE ALSO    [Toc]    [Back]

      
      
       Introduction: prof_intro(1)

       Commands:  as(1), cc(1), gprof(1), pixie(1),  uprofile(1),
       kprofile(1),   dxprof(1).   (dxprof  is  available  as  an
       option.)

       Functions:  monitor(3), profil(2)

       Programmer's Guide



                                                          prof(1)
[ Back ]
 Similar pages
Name OS Title
prof HP-UX display profile data
dprofpp Linux display perl profile data
dprofpp OpenBSD display perl profile data
gprof FreeBSD display call graph profile data
gprof OpenBSD display call graph profile data
gprof Tru64 Displays call-graph profile data
gprof Linux display call graph profile data
gprof NetBSD display call graph profile data
gprof HP-UX display call graph profile data
gcov NetBSD display basic block profile / coverage data
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service