*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> kprofile (1)              
Title
Content
Arch
Section
 

uprofile(1)

Contents


NAME    [Toc]    [Back]

       uprofile,  kprofile - Profile a program (uprofile) or kernel
 (kprofile) with Alpha on-chip performance counters

SYNOPSIS    [Toc]    [Back]

       uprofile [-v] [-quiet] [-dirname path]  [-[no]pids]  [-all
       | -each  | -one] [-stride n] [-average] [-pixie] [-display
       | prof-option...] [statistic...] program [argument...]

       kprofile [-v] [-quiet] [-dirname path]  [-[no]pids]  [-all
       | -each  | -one] [-stride n] [-average] [-pixie] [-display
       | prof-option...]  [-k kernel_name]  [-t]  [-ra]  [statistic...]
 [program [argument...]]

DESCRIPTION    [Toc]    [Back]

       See  prof_intro(1)  for an introduction to the application
       performance tuning tools provided with Tru64 UNIX.

       The uprofile command uses the  Alpha  on-chip  performance
       counters  to produce a finely-grained program-counter profile
 of a user program. The command runs the  program  you
       specify  with  the  arguments  you specify, collecting the
       selected statistics  on  the  program's  process  and  its
       descendants.  It  writes  the profile data to the umon.out
       file, by default. If the program calls  shared  libraries,
       those libraries are not profiled.

       The  kprofile  command  uses the Alpha on-chip performance
       counters to produce a detailed program-counter profile  of
       the  kernel.  If  you specify a program, kprofile runs the
       program with the arguments you specify,  and  it  collects
       the  selected statistics on the kernel for the duration of
       the program's execution. If you do not specify a  program,
       kprofile  collects  the  selected statistics on the kernel
       until you enter Ctrl/C or the kprofile process receives  a
       SIGTERM  signal. Note that if SIGINT (usually generated by
       entering a Ctrl/C at the  controlling  terminal)  is  currently
  being  ignored, it will continue to be ignored and
       SIGTERM must be used to terminate data collection.   kprofile
  writes  the  profile  data  to the kmon.out file, by
       default.

       If you specify -display or any of  the  prof-options,  the
       uprofile  and  kprofile  commands  display  the profile by
       runnning the prof tool (with any specified  prof-options).

       You can also run the prof command separately, to help analyze
 the data in the umon.out or kmon.out file.  The  following
  examples  show  how  to invoke the prof command to
       analyze  data  in  the  respective  files:  %  prof  a.out
       umon.out % prof /vmunix kmon.out

       The  CPU-time  profile displayed by prof will not be accurate
 if the CPU speed of the processors that executed  the
       application are not the same, as in certain multiprocessor
       systems containing EV67 or later processors.  The  inaccuracy
  may  be avoided by using the hiprof (sampling) or cc
       -p/-pg profilers, or by running the application on a  subset
 of the processors: Select a single processor using the
       runon command.   Check  the  processor  speeds  using  the
       psrinfo  -v command and run the application in a processor
       set comprising only processors that run at the same  speed
       (see processor_sets(4))

OPERANDS    [Toc]    [Back]

       The  name  of an event that your particular Alpha hardware
       can profile, as detailed in the STATISTICS section, below.
       If no statistic is named, machine cycles are counted, giving
 a CPU-time profile. One statistic can be specified for
       each  of  the hardware counters on your machine.  The name
       of the executable to run while  profiling  operations  are
       being  performed.  An argument to pass to the program that
       is run. Multiple arguments can be specified, as needed  by
       the program.

OPTIONS    [Toc]    [Back]

       Options can be abbreviated to three characters, except the
       prof-options, which can be  abbreviated  (usually  to  one
       character)  as  in  a  prof  command. For example, -qui is
       interpreted as quiet, but -q is interpreted as -quit. (See
       the -display option for the supported prof-options.)

       For options that specify a procedure name (proc), C++ procedures
 can omit the argument type list, though this  will
       match  all overloaded procedures with that name. To select
       a specific procedure, specify the  full  symbol  name  (as
       printed  by  the  nm  command).  Symbol  names  containing
       spaces, *, and so on  must  be  quoted.   Engages  verbose
       mode,  which prints some useful information about the program
 being profiled.  Prevents informational and  progress
       messages from being printed.  Specifies the directory path
       in which the profiling data file  or  files  are  created.
       [Disables]  or enables the addition of the process-id number
 to the name of  the  profiling  data  file  or  files.
       Specifies  which mode to use for profiling on multiprocessor
 machines. Using the -all option (the  default)  aggregates
  the data for all CPUs into one umon.out file. Using
       the -each option collects separate profiles for  each  CPU
       and   writes   the  output  into  a  set  of  files  named
       umon.out.n, where n is the  CPU  number.  Using  the  -one
       option  profiles only the current CPU. For the -one option
       to work, the uprofile or  kprofile  program  must  be  run
       using the runon command.  Sets the granularity of the sample
 counts, where n is the number of consecutive  instructions
  grouped together for each sample count. The default
       is -stride 4. The -asm, -heavy,  and  -lines  prof-options
       need  a  separate  sample  count for each instruction (for
       their reports to be  precise  enough),  so  these  options
       imply  -stride  1.   This makes the output file four times
       bigger than the default size. The -stride argument must be
       a  power  of  two  (for example, 1, 2, 4, 8).  Attempts to
       average samples within basic blocks so that each  instruction
  within  a  basic  block will show the same number of
       samples. Ensures fine grain profiles by setting stride  to
       1.   Produces  and files similar to those produced by running
 an executable instrumented with pixie (see pixie(1)).
       Uses  cycles0 statistic (freq on EV67) by default. Ensures
       fine grain profiles by setting stride to 1.  Overrides the
       name  of the kernel to profile. (The default is the booted
       kernel.)  Enables triggered mode for kprofile. This option
       sets  up  all required information for running the performance
 counters, but does not invoke them. See the  STATISTICS
  section  for  additional information.  Enables PCNTCALLER
 mode for kprofile. Collects profiling data  on  the
       caller  of  certain  kernel utility routines (for example,
       bcopy, bzero, simple_lock), instead of the routine itself.
       Runs  prof  on  the  resulting  profile  data file(s). The
       following prof options are supported: Reports the  profile
       as an annotated disassembly.  Excludes procedure proc from
       the profile but includes its CPU time or  other  statistic
       in  the  total.   Excludes procedure proc from the profile
       and from the total.  Profiles source lines, printing those
       with  the  highest  CPU  time  or  other  statistic first.
       Reports the profile per source line within each procedure.
       Merges all profile data files into file.  Prints each procedure's
 starting line number.   Includes  only  procedure
       proc  in the profile, but totals all procedures.  Includes
       only procedure proc in the profile and in the total.  Profiles
 procedures, printing those with the highest CPU time
       or other statistic first.  Truncates the reports  after  n
       lines or after (cumulative) n percent of the whole.

STATISTICS    [Toc]    [Back]

       You  specify  the  statistics that you want to collect for
       the program  being  profiled  in  one  or  more  statistic
       operands.

       If  you specify multiple statistics, uprofile and kprofile
       accumulate their results. You cannot then view the results
       of any single statistic separately. Because collected data
       is merged into a single buffer, interpretation of multiply
       collected statistics may be difficult.

       The  Alpha architecture implemented on your machine determines
 which statistics can be collected and the number  of
       counters  available  for collecting multiple statistics at
       the same time. The  implementation  is  indicated  by  the
       Alpha  chip  number,  which can be displayed with the show
       config console command  before  booting  Tru64  UNIX,  or,
       after  booting,  by  using  the  psrinfo -v command, or by
       calling getsysinfo (GSI_PROC_TYPE). Also, if the  uprofile
       command  is  run  without arguments, it will show how many
       counters  and  what  statistics  are  available  on   your
       machine.

       All  of  the  chips in the EV4 family (21064 [EV4], 21064A
       [EV45], 21066/21068 [LCA4]) have two  performance  counter
       registers, each of which can be separately programmed. The
       statistics that each counter can collect are shown in  the
       following table:

       ------------------------------
       Counter0Stats   Counter1Stats
       ------------------------------
       0disabled       1disabled
       issues          dcache
       pipedry         icache
       loads           dualissues
       pipefrozen      mispredicts
       branches        floatops
       cycles          intops
       PALcycles       stores
       nonissues       novictims
       victims
       ------------------------------

       All  of  the  chips in the EV5 family (21164 [EV5], 21164A
       [EV56],  and  21164PC  [PCA56])  have  three   performance
       counter  registers,  each  of which can be separately programmed.
 Some of the counters are common to all EV5 implementations,
  some  are  specific to EV5 and EV56, and some
       are specific to PCA56.

       The statistics that each of the common  EV5  counters  can
       collect are shown in the following table:

       --------------------------------------------------
       Counter0Stats   Counter1Stats   Counter2Stats
       --------------------------------------------------
       0disabled       1disabled       2disabled
       cycles0         nonissues       longstalls
       issues          splitissue      pcmispredicts
                       pipedry         branchmispredicts
                       replay          icachemisses
                       singleissues    itbmisses
                       dualissues      dcacheldmisses
                       tripleissues    dtbmisses
                       quadissues      ldsmerged
                       flowchanges     ldureplays
                       intops          fullreplays
                       floatops        externalinput
                       loads           cycles2
                       stores          memorybarriers
                       icacheacc       lockedloads
                       dcacheacc
       --------------------------------------------------

       The  statistics  that  each  of the EV5- and EV56-specific
       counters can collect are shown in the following table:

       -----------------------------------
       Counter1Stats   Counter2Stats
       -----------------------------------
       scacheacc       scachemisses
       scachereads     scachereadmisses
       scachewrites1   scachewritemisses
       scachevictim    scachesharedwrites
       bcacheref       scachewrites2
       bcachevictim    bcachemisses
       sysreqs         systeminvalidates
                       systemreadrequests
       -----------------------------------

       The statistics that each of  the  PCA56-specific  counters
       can collect are shown in the following table:

       ------------------------------------------
       Counter1Stats          Counter2Stats
       ------------------------------------------
       bcachereads            bcachedreads
       bcachedreadhits        bcachereadhits
       bcachedreadfills       bcachereadfills
       bcachewrites           bcachewritehits
       bcachecleanwritehits   bcachewritefills
       bcachevictims          sysreadflushhits
       readmisstwo            sysreadflushmisses
                              readmissthree
       ------------------------------------------

       The  EV6  chip has two performance counter registers, each
       of which can be separately programmed. The statistics that
       each of the EV6-specific counters can collect are shown in
       the following table:






       ------------------------------
       Counter0Stats   Counter1Stats
       ------------------------------
       0disabled       1disabled
       cycles0         cycles1
       retinst         retcondbranch
                       retdtb1miss
                       retdtb2miss
                       retitbmiss
                       retunaltrap
                       replay
       ------------------------------

       The default is to  gather  cycle  statistics  in  the  0th
       counter and to disable other counters.

       The  EV67 chip has two kinds of performance counters: traditional
 aggregate counters and profile-me  counters.  The
       traditional   aggregate   statistics   that  each  of  the
       EV67-specific counters can collect are shown in  the  following
  table.  Any one statistic or statistic combination
       may be selected.

       ------------------------------
       Counter0Stats   Counter1Stats
       ------------------------------
       0disabled       1disabled
       cycles0         replay
       retinst         cycles1
       retinst         bcachemisses
       ------------------------------

       If no aggregate statistics are  selected,  one  profile-me
       statistic may be selected:

       -----------------------------------------------------------------------------
       Profile-me Statistics
       -----------------------------------------------------------------------------
       2disabled             abort               abort_per_ret    arith_trap
       cbr_taken             cbr_taken_per_ret   cycles           cycles_per_ret
       delay                 delay_per_ret       dstream_fault    dtb_miss
       dtb_miss_per_ret      dtb_miss3           dtb_miss4        early_kill
       early_kill_per_ret    fp_disabled         freq             icache_miss
       icache_miss_per_ret   icache_parity       inflt_bcache     inflt_replays
       inflt_retires         interrupt           istream_accvio   itb_miss
       ldst_order            ldst_unalign        map_stall        map_stall_per_ret
       mispredict            mispre-             opcdec           replay_trap
                             dict_per_ret
       replay_trap_per_ret   retire              trap             trap_per_ret
       valid
       -----------------------------------------------------------------------------

       The default is to  gather  cycle  statistics  in  the  0th
       counter and to disable other counters.

       For  descriptions  of the statistics for all EV4, EV5, and
       EV6 implementations, refer to pfm(7).

       You can disable any counter by specifying 0disabled, 1disabled,
 or 2disabled as the counter statistic.  You can use
       this feature to isolate  specific  event  types,  such  as
       loads, without extraneous data being generated. You cannot
       disable all counters at the same time, choose two  statistics
  for  the same counter, or disable a counter once its
       statistic is specified.

       When you specify no counter statistics, uprofile and kprofile
  count  cycles  on  counter 0 by default, and display
       (through prof) a profile in terms of seconds used by  each
       procedure in the program, except for any shared libraries.

       For noncycle statistics, the displayed profile  shows  the
       number  of samples recorded, the sampling interval (events
       per second), and the total  number  of  events  that  this
       implies.  Most  noncycle statistics of the EV5 family CPUs
       are recorded about six cycles after the  instruction  that
       triggered  the  sample.   So,  when  using  prof's -asm or
       -lines option, the samples should be associated  with  one
       of  the previously executed few instructions of lines. The
       icacheacc, icachemisses, and dtbmisses statistics are usually
 attributed precisely.

       To perform a detailed analysis of short sections of kernel
       code,  use  the  kprofile  command  with  triggered   mode
       (invoked  with  the  -t  option).  When you use this mode,
       kprofile performs all of the required setup  for  enabling
       the  counters as normal, but does not invoke them. You can
       insert counter start or stop commands into the kernel code
       to be instrumented as follows:

       Turn counters on:  wrperfmon (PFOPT, 1) Turn counters off:
       wrperfmon (0)

       You can turn the counters on and off repeatedly to collect
       data over many iterations or multiple sections of code.

       The macro PFOPT is defined in <sys/pfcntr.h>.

NOTES    [Toc]    [Back]

       The interrupt load that profiling places on the system may
       affect performance, but usually the effect  is  insignificant.


       The  kernel in use must have the pfm pseudo-device configured
 into it. To do this, use one of the  following  methods:
  Add  the  following line to the kernel configuration
       file, and rebuild the kernel. Do not use  this  method  if
       CPU  hot-swap  is supported by the system, because it does
       not allow pfm to be easily unconfigured, as required for a
       hot-swap;   instead,   use  the  sysconfig  method  below.
       pseudo-device       pfm Enter the following  command  from
       the  root account. Do not configure pfm if CPU hot-swap is
       anticipated.  # sysconfig -c pfm

              If pfm is configured, the  CPU  hot-swap  procedure
              requires that it be unconfigured, using the following
 command, before any CPU is swapped: # sysconfig
              -u pfm

              The  autosysconfig program can be used to automatically
 load the configurable pfm device at each system
 startup.

       The format of the data files produced by uprofile in Tru64
       UNIX is different from the format produced in versions  of
       DIGITAL  UNIX  prior  to  Version 4.0. The Tru64 UNIX data
       files include the names of selected statistics in  profile
       displays.  To  convert  these  data files to the industrystandard
 format, at the expense of losing the names of the
       statistics, use the pdtostd command.


RESTRICTIONS    [Toc]    [Back]

       The  EV4 victim and novictim statistics rely on the external
 performance counter pin connections  as  described  in
       the  EV4 chip specification. The DEC 3000/400, /500, /600,
       and /800 workstations have these connections. Attempts  to
       display  either  of  these  statistics  on other platforms
       (while allowed) will typically generate empty data.

       The uprofile command is only supported on EV4  Pass  3  or
       later processors. Attempts to use it on a Pass 2 processor
       will gather PC samples for every process  running  on  the
       system.

       Using kprofile to generate statistics for a single command
       is only possible  on  EV4  Pass  3  or  later  processors.
       Attempts  to  do  this  on  a Pass 2 processor will gather
       statistics for the entire system, as  if  no  command  had
       been specified.

       Using  kprofile  with  triggered mode also requires an EV4
       Pass 3 or later processor and  cannot  be  performed  with
       per-process monitoring.

       Only  one tool can use the performance counters at a time.
       A message similar to "the counter device  is  busy"  indicates
  that some other tool is using the performance counters
 (or has used them but not cleaned  up  properly).  If
       you  are  sure  no one else is using the performance counters,
 running uprofile/kprofile with  superuser  privilege
       will attempt to reset the busy status and proceed.

FILES    [Toc]    [Back]

       The  performance  counter  device  file.   The  statistics
       file(s) generated by  uprofile.   The  statistics  file(s)
       generated  by  kprofile.  The statistics file(s) generated
       with the -pids option.  The default kernel to profile.

SEE ALSO    [Toc]    [Back]

      
      
       Introduction: prof_intro(1)

       pdtostd(1), pfm(7), prof(1), runon(1), psrinfo(1), sysconfig(8), autosysconfig(8), processor_sets(4)

       Programmer's Guide



                                                      uprofile(1)
[ Back ]
 Similar pages
Name OS Title
rpccp_remove_profile HP-UX Removes all profile elements and the profile from the specified name service entry
kernelversion Linux program to report major version of kernel
pixstats Tru64 Analyzes profile data
rpccp_show_profile HP-UX Shows the elements of a profile
monstartup FreeBSD control execution profile
profil HP-UX execution time profile
moncontrol OpenBSD control execution profile
monstartup OpenBSD control execution profile
moncontrol FreeBSD control execution profile
syseventEp IRIX Event Profile Generator
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service