pfm - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> pfm (7)

pfm(7)

NAME [Toc] [Back]

       pfm - The on-chip performance counter pseudo-device

SYNOPSIS [Toc] [Back]

       pseudo-device pfm

DESCRIPTION [Toc] [Back]

       The  pfm pseudo-device is the interface to Alpha implementation-specific
 on-chip performance counters.   A  set  of
       ioctl   calls  form  the  interface,  as  defined  in  the
       <sys/pfcntr.h> header file.

       The kernel in use must have the pfm pseudo-device  configured
  into  it. To do this, use one of the following methods:
 Add the following line to  the  kernel  configuration
       file and rebuild the kernel. Do not use this method if CPU
       hot-swap is supported by the system, because it  does  not
       allow pfm to be easily unconfigured as required for a hotswap;
 instead, use the sysconfig method below.
            pseudo-device       pfm Enter the  following  command
       from  the  root  account. Do not configure pfm if CPU hotswap
 is anticipated.
            # sysconfig -c pfm

              If pfm is configured, the  CPU  hot-swap  procedure
              requires that it be unconfigured, using the following
 command, before any CPU is swapped.
                   # sysconfig -u pfm

              The autosysconfig program can be used to  automatically
 load the configurable pfm device at each system
 startup.

EV4 INTERFACE DESCRIPTION [Toc] [Back]

       The EV4 implementations (21064, 21064A, 21066, and  21068)
       have two counters, each of which can be independently programmed
 to count certain internal or external events. Each
       counter interrupts the system when a certain number of the
       selected events have been counted. Any one of the  following
  three  actions  can  happen at each interrupt (tick):
       Counters (PFM_COUNTERS) IPL histogramming  (PFM_IPL)  User
       or kernel PC profiling (PFM_PROFILING)

       These  values  are  defined  in  <sys/pfcntr.h> and can be
       selected orthogonally  by  bitwise  ORing  the  selections
       together  and passing the result to the PCNTSETITEMS ioctl
       request.

       If counters are enabled,  the  interrupt  count  for  this
       event  is  incremented.   This records the number of times
       each event has happened, in  multiples  of  the  interrupt
       frequency  selected (PCNTSETMUX). Note that the driver can
       only count the interrupts generated; no direct  access  to
       the EV4 on-chip counter values is provided.

       If  IPL histogramming is enabled, the appropriate entry in
       the IPL array is incremented. The entries are:  0-5  refer
       to IPL0-IPL5.  6 is unused. (IPL6 is the level of the performance
 counter interrupts.)  7 counts "idle" ticks  (IPL
       = 0 and current_thread = idle_thread).  8 counts user mode
       ticks.

       If profiling is enabled, a PC sample is added to the  profile
 histogram if the mode is correct (kernel or user).

       Each  CPU  in a multiprocessor platform has separate counters,
 and the device can  be  opened  in  three  different
       ways:  PCNTOPENONE opens and collects data on only the CPU
       that the program is running on.   PCNTOPENEACH  opens  all
       CPUs  but keeps data for each one separately.  PCNTOPENALL
       opens all CPUs, aggregating the data for all CPUs into one
       collection.

       These values are defined in <sys/pfcntr.h> and are bitwise
       ORed into the mode passed to the device  open  call.  Note
       that  if  PCNTOPENONE is selected, the opening thread/process
 must be bound to that processor; otherwise, the  open
       will fail. It must also remain bound to that processor for
       the duration of  the  driver  usage  or  extremely  unpredictable
 results will occur.

       The following ioctl calls apply to the performance counter
       pseudo-device. Note that most of the EV4 ioctls  can  also
       be used on EV5, EV6, and EV7:

       Disables  performance counter interrupts on the CPU. Takes
       no arguments.  Enables performance counter  interrupts  on
       the CPU. Takes no arguments.  Selects the statistics to be
       counted by each performance counter and the interrupt frequency.
  Takes  a  pointer to a struct iccsr that contains
       the MUX register values desired. The fields in this register
  are:  Controls the interrupt frequency of performance
       counter 0. If  set,  interrupt  frequency  is  every  2^12
       events.  If  clear,  interrupt  frequency  is  every  2^16
       events.  Controls the interrupt frequency  of  performance
       counter  1.  If  set,  interrupt  frequency  is  every 2^8
       events.  If  clear,  interrupt  frequency  is  every  2^12
       events.   Selects  the event counted by counter 0. One of:
       PF_ISSUES,    PF_PIPEDRY,     PF_LOADI,     PF_PIPEFROZEN,
       PF_BRANCHI,     PF_CYCLES,    PF_PALMODE,    PF_NONISSUES,
       PF_EXTPIN0 Selects the event counted by counter 1. One of:
       PF_DCACHE,  PF_ICACHE,  PF_DUAL, PF_BRANCHMISS, PF_FPINST,
       PF_INTOPS, PF_STOREI, PF_EXTPIN1 Contains two  bits,  each
       of   which  disables  data  collection  on  the  specified
       counter. For example, set to 2 to disable  counter  1  and
       enable  counter 0. Cannot be set to 3 (which disables both
       counters, causing PCNTSETMUX to return  EINVAL).   Do  not
       set these fields. Must be zero.  Selects the data items to
       be collected at each  tick:  Counters  (PFM_COUNTERS)  IPL
       histogramming   (PFM_IPL)  User  or  kernel  PC  profiling
       (PFM_PROFILING   -   see   PCNTSETUADDR,    PCNTSETURANGE,
       PCNTSETKADDR, and PCNTSETKRANGE)

              These  values are defined in <sys/pfcntr.h> and can
              be  selected  orthogonally  by  bitwise  ORing  the
              selections  together  into the integer argument. If
              no items are selected, returns  EINVAL.   Sets  the
              on-chip  counters  to  count  all  system activity.
              Takes no arguments and returns no errors.  Sets the
              on-chip  counters  to count only those threads/processes
 with the PCB_PME_BIT set in their PCBs,  and
              sets  the PCB_PME_BIT for this process. This bit is
              inherited across  fork/exec,  setting  it  for  all
              children. Takes no arguments and returns no errors.
              Clears the PCB_PME_BIT in the PCB  of  the  current
              process.  Takes no arguments and returns no errors.
              Clears the driver's internal  counters  appropriate
              to   the   actions  selected.  If  PFM_COUNTERS  is
              enabled, the interrupt counters and  cycle  counter
              value  are  reset.  If  PFM_IPL is enabled, the IPL
              histogram is reset. If neither is enabled (PFM_PROFILING
   only),   returns  EINVAL  and  nothing  is
              cleared. Takes no arguments.  Returns the  driver's
              counter  values  and  the  pcc  value(s).  Takes  a
              pointer to an array of struct pfcntrs; the array is
              filled  in  with  the  values. Sample usage of this
              ioctl is: struct pfcntrs cntrs[NUM_OF_CPUS]; struct
              pfcntrs  *pfcntrs  =  cntrs; ioctl (fd, PCNTGETCNT,
              &pfcntrs);

              If the driver is opened in mode  PCNTOPENEACH,  the
              underlying  array must be big enough to hold all of
              the  data  for  each  CPU;  otherwise,  EFAULT   is
              returned.  If  the  driver  is  opened in mode PCNTOPENONE
 or PCNTOPENALL, the array can be one  element.
  If  PFM_COUNTER is not enabled, returns EINVAL.
  Returns the number of bytes of data available
              to  read  for  getting the PC profiling samples. By
              default this will be equal to  one  fourth  of  the
              address  range being profiled. (By default, profiling
 data is kept as one bucket  per  four  instructions,
  which  corresponds  to  a default profiling
              stride of 4 instructions per sample count.) If  the
              driver  is opened in mode PCNTOPENEACH, this number
              of bytes will be multiplied by the number of  CPUs.

              To  set the profiling address range and stride (and
              select user or kernel profiling), use the PCNTSETURANGE
  or PCNTSETKRANGE ioctl, respectively. To set
              the address range without changing the stride,  you
              can  also  use  the  PCNTSETUADDR  or  PCNTSETKADDR
              ioctl.

              The PCNTGETRSIZE ioctl takes a pointer  to  a  long
              and returns no errors. The returned value will be 0
              if profiling is not currently selected  or  if  the
              address  range  and  mode  have not been specified.
              Returns the  current  IPL  histogram(s).   Takes  a
              pointer  to an array of struct pfipls; the array is
              filled in with the values.  Sample  usage  of  this
              ioctl  is:  struct pfipls ipls[NUM_OF_CPUS]; struct
              pfipls *pfipls = ipls;  ioctl  (fd,  PCNTGETIPLHIS,
              &pfipls);

              If  the  driver is opened in mode PCNTOPENEACH, the
              underlying array must be big enough to hold all  of
              the  data  for each CPU. If the underlying array is
              not big enough, EFAULT might be returned  or  other
              data in the program might be overwritten.

              If the driver is opened in mode PCNTOPENONE or PCNTOPENALL,
 the array can be one element. If  PFM_IPL
              is  not  enabled,  returns  EINVAL.  If kernel mode
              profiling  is  turned  on  (with  PCNTSETKADDR   or
              PCNTSETKRANGE),  directs  the  profiler  to collect
              data on the caller of certain system  utility  routines
  (for example, bcopy, bzero, simple_lock). If
              kernel mode profiling is  not  turned  on,  returns
              EINVAL.  (See also the descriptions of PCNTSETKADDR
              and PCNTSETKRANGE for information about  their  use
              in PCNTCALLER mode.)  Sets the kernel address range
              to profile and turns on kernel mode  PC  profiling.
              If  the  device  is not open for profiling, returns
              EINVAL. If memory cannot be obtained for the sample
              data, returns ENOMEM.

              If  PCNTCALLER  kernel  profiling  mode is engaged,
              specifies an additional address  range  to  collect
              profiling  data on the caller of a routine, instead
              of the  routine  itself.  Takes  a  start  and  end
              address  range.  Up  to 4 additional address ranges
              may  be  added;  additional  attempts  will  return
              ENOSPC. If the addresses are out of range of kernel
              text, not aligned, or  otherwise  invalid,  returns
              EFAULT.

              Note that PCNTSETKRANGE performs the same functions
              as PCNTSETKADDR and, in addition, lets you set  the
              profiling stride.  Sets the kernel address range to
              profile and sets the profile stride (the number  of
              consecutive  instructions grouped together for each
              sample count). The stride must be a  power  of  two
              (for  example,  0, 1, 2, 4, 8). A zero stride means
              there should be only  one  counter  for  the  whole
              address range. This ioctl also turns on kernel mode
              PC profiling. If the device is not open for profiling,
  returns  EINVAL. If memory cannot be obtained
              for the sample data, returns ENOMEM.

              If PCNTCALLER kernel  profiling  mode  is  engaged,
              specifies  an  additional  address range to collect
              profiling data on the caller of a routine,  instead
              of  the  routine  itself.  Takes  a  start  and end
              address range, and ignores  the  stride.  Up  to  4
              additional  address ranges may be added; additional
              attempts will return ENOSPC. If the  addresses  are
              out of range of kernel text, not aligned, or otherwise
  invalid,  returns  EFAULT.   Sets  the   user
              address  range to profile and turns on user mode PC
              profiling. If the device is not open for profiling,
              returns  EINVAL.  If  memory cannot be obtained for
              the sample data, returns ENOMEM. Note that  PCNTSETURANGE
 performs the same functions as PCNTSETUADDR
              and,  in  addition,  lets  you  set  the  profiling
              stride.  Sets the user address range to profile and
              sets the profile stride (the number of  consecutive
              instructions   grouped  together  for  each  sample
              count). The stride must be  a  power  of  two  (for
              example,  0, 1, 2, 4, 8). A zero stride means there
              should be only one counter for  the  whole  address
              range.  This  ioctl also turns on user mode PC profiling.
 If the device is not  open  for  profiling,
              returns  EINVAL.  If  memory cannot be obtained for
              the sample data, returns ENOMEM.

       Only one process can have the pfm device open at any point
       in  time.  If  the device is opened with PCNTOPENONE, only
       the specified CPU  is  considered  open;  subsequent  open
       attempts  will  return EBUSY. If the device is opened with
       PCNTOPENALL or PCNTOPENEACH, all CPUs must  be  available;
       otherwise, returns EBUSY.

       EBUSY  will  also be returned if another tool is using the
       performance  counters  (or  has  used  them  but  has  not
       restored  the  default  performance counter interrupt handler).
 In this case, if you are sure no  other  users  are
       using  the  performance counters, re-execute the open call
       with superuser privilege. This will reset the busy  status
       and proceed to use the counters.

       It is sufficient to open the device read-only. Opening the
       device will disable interrupts (PCNTDISABLE) and  log  all
       system  activity  (PCNTLOGALL), generating simple counters
       only. The counters are not  cleared.  Closing  the  device
       automatically  disables  interrupts and resets the service
       routines (PCNTDISABLE).

EV4 DETAILED STAT DESCRIPTIONS [Toc] [Back]

       Following are more detailed descriptions of  each  of  the
       events  that  can  be  counted by the two on-chip counters
       associated with the EV4 implementations.  For more  information,
 consult the 21064 chip specification.

       Counter  0:  This  counter  is incremented by one for each
       cycle in which two instructions are issued and  is  incremented
  by  1/2 for each cycle in which one instruction is
       issued.  The number of cycles in which one instruction  is
       issued can be found by using the Dual Issues field and the
       equation S = (I - D) * 2, where S =  Single  Issues,  D  =
       Dual  Issues, and I = Issues.  This counter is incremented
       by one for each cycle in which nothing is  issued  due  to
       the  lack  of  valid  instruction  stream data. The causes
       could be instruction cache refill operations (due to  normal
 sequential operation or delays while fetching the target
 of a branch) or delays caused by the draining  of  the
       pipeline  in  response  to  an exception.  This counter is
       incremented for each load instruction.  Note:  If  a  load
       misses  in  the  primary  data  cache,  the  replay of the
       instruction will cause the load counter to be  incremented
       again.   This  counter  is  incremented  for each cycle in
       which nothing is issued due to a resource conflict  within
       the  pipeline.   Examples are: Not all source and destination
 registers are available A load miss or  write  buffer
       overflow  occurs  A conditional branch cannot be issued in
       the cycle following a jump Memory Barrier instruction processing
  can  cause  the  pipe  to  freeze This counter is
       incremented for each branch instruction.  This counter  is
       incremented  for  each cycle.  This counter is incremented
       for each cycle spent in PALmode.  This counter  is  incremented
  by one for each cycle in which no instructions are
       issued and is incremented by 1/2 for each cycle  in  which
       only  one  instruction  is  issued.  This  counter  is the
       inverse of the Issues counter: Non-issues =  1  -  Issues.
       This  counter  is incremented for each external event supplied
 to external pin 0.  On  the  DEC  3000/500  and  DEC
       3000/400,  this  pin  is connected to logic that indicates
       external cache misses with victims. A  victim  is  a  data
       block  that  must be written back to main memory before it
       is reused.

       Counter 1: This counter is incremented  for  each  primary
       data  cache  miss.  Note:  this counter actually is incremented
 each time a primary data cache probe does not  complete
  in  one  cycle.  This includes all misses, but also
       includes hits that are stalled for other reasons  such  as
       bus  traffic  holding  previously  misses  pending.   This
       counter is incremented for each primary instruction  cache
       miss.  This counter is incremented for each cycle in which
       two instructions are dual-issued.  This counter is  incremented
   for  each  incorrectly  predicted  branch.   This
       counter is incremented  for  each  floating-point  operate
       instruction.  The  floating-point  operate instructions do
       not include the floating-point load, floating-point branch
       and  floating-point  store  instructions.  This counter is
       incremented for each integer operate instruction  as  well
       as  for  each  Load Address and Load Address High instruction.
  This counter is incremented for each store instruction.
  This counter is incremented for each external event
       supplied to external pin 1. On the DEC  3000/500  and  DEC
       3000/400,  this  pin  is connected to logic that indicates
       external cache misses without victims.

       Most items count  the  instances  of  different  types  of
       instructions.  These  counters  are  incremented  for each
       occurrence, and they do not  give  information  about  the
       cost  of  executing  the  instruction. The Pipe Frozen/Dry
       counter increments for each frozen or dry cycle,  not  for
       each instance of pipe freeze or pipe dry.

EV5 INTERFACE DESCRIPTION [Toc] [Back]

       The  EV5 implementations (21164, 21164A, and 21164PC) have
       three counters, each of which can  be  independently  programmed
 to count certain internal or external events. They
       operate in much the same way as on EV4. Most  of  the  EV4
       ioctl  calls  can  also  be  used  on  EV5.  Here are some
       descriptions for EV5-specific  ioctl  calls:  Selects  the
       events  counted  by  all three counters. The argument is a
       bitwise OR of  one  event  name  for  each  counter.   See
       <sys/pfcntr.h>   for   the  identifiers  for  the  events:
       PF5_MUX0_*, PF5_MUX1_*, PF5_MUX2_*.  Selects the  sampling
       interrupt  frequency  for all three counters. The argument
       is a bitwise  OR  of  one  frequency  indicator  for  each
       counter.  A  frequency of 256 requires superuser privilege
       because it can place an extremely heavy load on  the  system.
 Only carefully selected rare events should be counted
       with such a high frequency. A lower frequency  is  usually
       advisable,     for     example:     PF5_C0_INT_EVERY_65536
       PF5_C1_INT_EVERY_65536   PF5_C2_INT_EVERY_16384    Enables
       selected  counters.  (PCNT5RESTART zeroes them first.) The
       argument is the address of the pmctrs_ev5_long member of a
       union pmctrs_ev5, with the following additional field-member
 assignments:  pmctrs_ev5_cpu  =  PMCTRS_ALL_CPUS  pmctrs_ev5_select
  =  any  combination  of PF5_SEL_COUNTER_0,
       PF5_SEL_COUNTER_1, and PF5_SEL_COUNTER_2 using  a  bitwise
       OR  operator Disables selected counters.  Clears or writes
       selected counters on selected CPUs. The  argument  is  the
       address  of  the  pmctrs_ev5_long  member  of a union pmctrs_ev5.
 See <sys/pfcntr.h> for  more  information.   Sets
       contexts  in  which to count. The argument is a bitwise OR
       of selected PF5_CTXT_* values.   Similar  to  EV4's  PCNTGETCNT
  except  that the argument is a pointer to an array
       of struct pfcntrs_ev5.  Similar to PCNT5GETCNT except that
       the  driver's  counter  values (i.e., the number of interrupts
 from each counter) are shifted left by  the  counter
       width.  The  current  raw  hardware  counters are read and
       added to the tally.  Reads the hardware counters from  the
       selected  CPU.  The  argument  is  the address of the pmctrs_ev5_long
 member of a union pmctrs_ev5.  See  <sys/pfcntr.h>
 for more information.

EV5 DETAILED STAT DESCRIPTIONS [Toc] [Back]

       Following  are  more  detailed descriptions of each of the
       events that can be counted by the three  on-chip  counters
       associated  with the EV5 implementations.  For more information,
 see the 21164 or 21164PC chip specification.






   All EV5 Implementations (EV5, EV56, PCA56)
       Counter 0: This counter is  incremented  for  each  cycle.
       (Note  that  counter  2  also has a cycles counter.)  This
       counter is incremented for each instruction.

       Counter 1: This counter is incremented for each  cycle  in
       which valid instructions are ready for issue, but none are
       issued  because  of  a  pipeline  stall  or  because   the
       resources  they  need  are not available.  This counter is
       incremented for each cycle in which some but  not  all  of
       the maximum of four instructions are issued.  This counter
       is incremented for each cycle in which no instructions are
       ready to issue.  This counter is incremented for each time
       an instruction has to be executed again (instead of  those
       behind  it  in  the  pipeline) because resources it needed
       were found to be unavailable the first time  it  executed.
       This  counter  is  incremented for each cycle in which one
       instruction is issued.  This counter  is  incremented  for
       each  cycle  in  which  two instructions are issued.  This
       counter is incremented  for  each  cycle  in  which  three
       instructions  are issued.  This counter is incremented for
       each cycle in which four instructions  are  issued.   This
       counter  is  incremented  for each branch, jump, or return
       instruction.  This counter is incremented for each integer
       operation.  This counter is incremented for each floatingpoint
 operation.  This counter  is  incremented  for  each
       load  operation.   This  counter  is  incremented for each
       store operation.  This counter  is  incremented  for  each
       Instruction Cache access.  This counter is incremented for
       each Data Cache access.

       Counter 2: This  counter  is  incremented  for  each  long
       pipeline  stall  (over 15 cycles).  This counter is incremented
 for each PC misprediction.  This counter is  incremented
  for  each  branch  misprediction.  This counter is
       incremented for each instruction not found in  either  the
       Instruction  Cache  or the associated Refill Buffer.  This
       counter is incremented for each Instruction Cache miss for
       which  the  instruction's  page entry is not stored in the
       Instruction Translation Buffer.  This  counter  is  incremented
  for  each  load of a value that is not in the Data
       Cache.  This counter is incremented for  each  Data  Cache
       miss  for  which  the data page entry is not stored in the
       Data Translation Buffer.  This counter is incremented  for
       each  load  from  an address that misses in the Data Cache
       but is merged with another load from the same address that
       is  already  in  the Missed Address File.  This counter is
       incremented for each Data Cache miss  (for  a  load)  that
       causes  the  replay  of  a later instruction that uses the
       loaded value.  This counter is incremented for each  store
       that  is replayed because the Write Buffer is full and for
       each load that is replayed because the Missed Address File
       is  full.   This counter is incremented for each cycle for
       which the perf_mon_h External Input  pin  is  true.   This
       counter is incremented for each cycle.  (Note that counter
       0 also has a cycles counter.)  This counter is incremented
       for  each  stall  cycle  resulting  from a Memory Barrier.
       This counter is incremented for each Locked Load  instruction.








   EV5 and EV56 Implementations Only    [Toc]    [Back]
       Counter  1: This counter is incremented for each Secondary
       Cache access (for  either  instructions  or  data).   This
       counter  is  incremented  for each read from the Secondary
       Cache.  This counter is incremented for each write to  the
       Secondary   Cache.   (Note  that  counter  2  also  has  a
       scachewrites counter.)  This counter  is  incremented  for
       each  time  a  data  block  in the Secondary Cache must be
       written back to main memory before  it  is  reused.   This
       counter  is  incremented  for each access to the optional,
       board-level Backup Cache.  This counter is incremented for
       each time a data block in the Backup Cache must be written
       back to main memory before it is reused.  This counter  is
       incremented for each system request.

       Counter  2: This counter is incremented for each Secondary
       Cache miss.  This counter is  incremented  for  each  Secondary
  Cache  Read miss.  This counter is incremented for
       each Secondary Cache Write miss.  This counter  is  incremented
  for  each  Secondary Cache Shared Write operation.
       This counter is incremented for each Secondary Cache Write
       operation.  (Note  that  counter 1 also has a scachewrites
       counter.)  This counter is incremented for  each  miss  in
       the  optional  board-level  Backup Cache.  This counter is
       incremented for each System  Invalidate  operation.   This
       counter is incremented for each System Read Request.

   PCA56 Implementation Only    [Toc]    [Back]
       Counter  1:  This  counter  is  incremented  for each read
       request from the MBOX.  This counter  is  incremented  for
       each  Dstream  read request that hits in the bcache.  This
       counter is incremented for each Dstream read fill  to  the
       Bcache.   This  counter  is  incremented  for  each  write
       request from the MBOX.  This counter  is  incremented  for
       each  write  that  hits a clean block in the Bcache.  This
       counter is incremented for each VICTIM command  issued  by
       the 21164PC.  This counter is incremented each time a second
 READ_MISS is sent  to  the  system  while  an  earlier
       READ_MISS command is still outstanding.

       Counter  2:  This  counter is incremented for each Dstream
       read request from the MBOX.  This counter  is  incremented
       for  each  read  request  that  hits  in the Bcache.  This
       counter is incremented for each read fill to  the  Bcache.
       This  counter  is  incremented for each write that hits in
       the Bcache.  This counter is incremented  for  each  write
       fill  to the Bcache.  This counter is incremented for each
       system READ or FLUSH hit in the Bcache.  This  counter  is
       incremented  for  each system READ or FLUSH request.  This
       counter is incremented each time a third READ_MISS is sent
       to  the  system  while  two earlier READ_MISS commands are
       still outstanding.

EV6 INTERFACE DESCRIPTION [Toc] [Back]

       The EV6 implementation (21264) has two counters,  each  of
       which  can  be  programmed  to  count  certain internal or
       external events. They operate in much the same way as  the
       counters  on  EV4 and EV5. Most of the EV4 ioctl calls can
       also be used on  EV6.  Below  are  some  descriptions  for
       EV6-specific  ioctl  calls.  Note  that  the EV6 interface
       should also be used on EV7 systems.   Selects  the  events
       counted  by the two counters. The argument is a bitwise OR
       of one event name for each  counter.   See  <sys/pfcntr.h>
       for   the   identifiers   for   the   events:  PF6_MUX0_*,
       PF6_MUX1_*.  Enables selected counters. PCNT6RESTART zeros
       them  first. PCNT6ENABWRITE sets them to specified values.
       The argument is the address of the pmctrs_ev6_long  member
       of  a  union  pmctrs_ev6,  with  the  following additional
       field-member assignments: pmctrs_ev6_cpu = PMCTRS_ALL_CPUS
       pmctrs_ev6_select  =  any combination of PF6_SEL_COUNTER_0
       and PF6_SEL_COUNTER_1 using a bitwise OR  operator.   Disables
  selected counters.  Clears or writes selected counters
 on selected CPUs. The argument is the address of  the
       pmctrs_ev6_long   member   of   a  union  pmctrs_ev6.  See
       <sys/pfcntr.h> for more  information.   Similar  to  EV4's
       PCNTGETCNT  except  that  the  argument is a pointer to an
       array of struct pfcntrs_ev6.  Reads the hardware  counters
       from  the selected CPU. The argument is the address of the
       pmctrs_ev6_long  member  of  a   union   pmctrs_ev6.   See
       <sys/pfcntr.h>   for   more   information.    Similar   to
       PCNT6GETCNT except that the driver's counter values (i.e.,
       the  number  of  interrupts from each counter) are shifted
       left by the counter width. The current raw hardware  counters
 are read and added to the tally.

EV6 DETAILED STAT DESCRIPTIONS [Toc] [Back]

       Following  are  more  detailed descriptions of each of the
       events that can be counted by  the  two  on-chip  counters
       associated with the EV6 implementation.  For more information,
 see the 21264 chip specification.

       Counter 0: This counter is  incremented  for  each  cycle.
       (Note  that  counter  1  also has a cycles counter.)  This
       counter is incremented for every retired instruction.

       Counter 1: This counter is  incremented  for  each  cycle.
       (Note  that  counter  0  also has a cycles counter.)  This
       counter  is  incremented  for  each  retired   conditional
       branch.   This  counter  is  incremented  twice  for  each
       retired single  dstream  translation  buffer  (DTB)  miss.
       This  counter  is  incremented for each retired double DTB
       miss.   This  counter  is  incremented  for  each  retired
       instruction  translation  buffer (ITB) miss.  This counter
       is incremented for  each  retired  unaligned  trap.   This
       counter is incremented for each replay trap.

EV67 AND EV7 DETAILED STAT DESCRIPTIONS [Toc] [Back]

       Following  are  some  descriptions  of  events that can be
       counted by the on-chip counters associated with  the  EV67
       implementation. The EV67 counters may be used in two mutually
 exclusive modes: traditional aggregate  and  profileme.
   The EV67 traditional aggregate counters are not completely
 independent. Any one statistic may be selected, or
       one  of  the  following  pairs  may be selected: (cycles0,
       replay); (retinst, cycles1); (retinst, bcachemisses).  EV7
       provides the same statistics that EV67 does.

       Counter  0:  This  counter  is incremented for each cycle.
       (Note that counter 1 also has  a  cycles  counter.)   This
       counter is incremented for every retired instruction.

       Counter  1:  This  counter  is incremented for each cycle.
       (Note that counter 0 also has  a  cycles  counter.)   This
       counter  is incremented for each miss in the Backup Cache.
       This counter is incremented for each replay trap.

       EV67 profile-me mode and  traditional  aggregate  counters
       work  differently:  instead  of counting events as done by
       traditional aggregate counters, instructions in profile-me
       mode   are  uniformly  selected  and  various  events  are
       recorded   during   the   execution   of   each   selected
       instruction.

       The  descriptions below are written for the perspective of
       a uprofile or kprofile user. For  example,  the  *_per_ret
       statistics   actually  cause  the  pfm  driver  to  return
       (statistic, retired) pairs which are  later  processed  by
       uprofile  or  kprofile.   Similarly, the freq statistic is
       merely the same as the retired statistic until uprofile or
       kprofile postprocesses it.

       Any  one  of  the  following  profile-me statistics may be
       selected.  This statistic is incremented if  the  profiled
       execution  is  aborted.  This ratio is the abort statistic
       scaled by 100 and divided by the retired statistic.   This
       statistic  is incremented if the profiled execution causes
       an arithmetic trap.  This statistic is incremented if  the
       profiled  execution  is  a taken conditional branch.  This
       ratio is the cbr_taken statistic scaled by 100 and divided
       by  the  retired statistic.  This statistic is incremented
       by the approximate number of cycles the execution  was  in
       flight.  This ratio is the cycles statistic divided by the
       retired statistic.  This statistic is incremented  by  the
       approximate  retire delay of the profiled execution.  This
       ratio is the delay statistic scaled by 100 and divided  by
       the  retired  statistic.  This statistic is incremented if
       the profiled  execution  causes  a  Dstream  fault.   This
       statistic  is incremented if the profiled execution causes
       a DTB single miss.  This ratio is the  dtb_miss  statistic
       scaled  by 100 and divided by the retired statistic.  This
       statistic is incremented if the profiled execution  causes
       a  DTB  double miss (3 level page tables).  This statistic
       is incremented if the profiled execution causes a DTB double
  miss (4 level page tables).  This statistic is incremented
 if the profiled execution is killed  early  in  the
       pipeline.   This  ratio is the early_kill statistic scaled
       by 100 and divided by the retired statistic.  This statistic
  is  incremented  if  the  profiled execution causes a
       floating-point disabled trap.  This  statistic  is  incremented
  if  the  profiled execution retires.  uprofile and
       kprofile average this statistic  within  basic  blocks  to
       provide  instruction  execution frequency estimates.  This
       statistic is incremented if the profiled execution was not
       yet  prefetched  for the cache. Note the profiled instruction
 may experience an unrecorded icache miss if the fetch
       is  in  progress.  This ratio is the icache_miss statistic
       scaled by 100 and divided by the retired statistic.   This
       statistic is incremented if the profiled execution experienced
 an icache parity error.  This  statistic  is  incremented
  by  the approximate number of bcache misses during
       the profiled execution.  This statistic is incremented  by
       the approximate number of replay traps during the profiled
       execution.  This statistic is incremented by the  approximate
  number  of  instruction  retires during the profiled
       execution.  This statistic is incremented if the  profiled
       execution  is  pre-empted by an interrupt.  This statistic
       is incremented if the profiled execution causes an istream
       access  violation.   This  statistic is incremented if the
       profiled execution causes an ITB miss.  This statistic  is
       incremented  if the profiled execution causes a load-store
       order trap.  This statistic is incremented if the profiled
       execution causes an unaligned load or store.  This statistic
 is  incremented  if  the  profiled  execution  stalled
       before it was mapped.  This ratio is the map_stall statistic
 scaled by 100 and divided by  the  retired  statistic.
       This  statistic  is  incremented if the profiled execution
       experiences a misprediction.  This ratio is the mispredict
       statistic scaled by 100 and divided by the retired statistic.
  This statistic is incremented if the profiled execution
  causes  a  reserved  opcode trap.  This statistic is
       incremented if the  profiled  execution  causes  a  replay
       trap.   This  ratio is the replay_trap statistic scaled by
       100 and divided by the retired statistic.  This  statistic
       is  incremented  if  the profiled execution retires.  This
       statistic is incremented if the profiled execution  causes
       a  trap.   This  ratio is the trap statistic scaled by 100
       and divided by the retired statistic.  This  statistic  is
       incremented if the profiled execution is valid.

       For more information, see the 21264a chip specification.

NOTES [Toc] [Back]

       The  notes in this section pertain only to EV4 processors.

       Disabling an EV4 counter cannot actually disable  it  from
       interrupting  the CPU. However, the interrupt will be dismissed
 without recording any data.

       Connections of the CPU's External Input pins  to  external
       events  are  platform  dependent.  The DEC 3000/400, /500,
       /600, /800 workstations have these connections; they count
       BCache Misses and BCache Misses with Victims.

       Generating  statistics on a per-process basis is only possible
 on 21064 Pass 3 or later processors. Attempts to  do
       this on a Pass 2 or earlier will gather statistics for the
       entire system.

FILES [Toc] [Back]

       The device entry (character, dev# 26/0) Structure  definitions

pfm(7)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

DESCRIPTION [Toc] [Back]

EV4 INTERFACE DESCRIPTION [Toc] [Back]

EV4 DETAILED STAT DESCRIPTIONS [Toc] [Back]

EV5 INTERFACE DESCRIPTION [Toc] [Back]

EV5 DETAILED STAT DESCRIPTIONS [Toc] [Back]

EV6 INTERFACE DESCRIPTION [Toc] [Back]

EV6 DETAILED STAT DESCRIPTIONS [Toc] [Back]

EV67 AND EV7 DETAILED STAT DESCRIPTIONS [Toc] [Back]

NOTES [Toc] [Back]

FILES [Toc] [Back]

SEE ALSO [Toc] [Back]