*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->OpenBSD man pages -> i386/pctr (4)              
Title
Content
Arch
Section
 

PCTR(4)

Contents


NAME    [Toc]    [Back]

     pctr - driver for CPU performance counters

SYNOPSIS    [Toc]    [Back]

     pseudo-device pctr 1

DESCRIPTION    [Toc]    [Back]

     The pctr device provides access to the performance  counters
on Intel
     brand processors, and to the TSC on others.

     Intel  processors have two 40-bit performance counters which
can be programmed
 to count events such as cache misses, branch  target
buffer hits,
     TLB  misses,  dual-issues, interrupts, pipeline flushes, and
more.

     There is one ioctl call to read the status of all  counters,
and one ioctl
     call  to  program the function of each counter.  All require
the following
     includes:

           #include <sys/types.h>
           #include <machine/cpu.h>
           #include <machine/pctr.h>

     The current state of all  counters  can  be  read  with  the
PCIOCRD ioctl,
     which takes an argument of type struct pctrst:

           #define PCTR_NUM 2
           struct pctrst {
                   u_int pctr_fn[PCTR_NUM];
                   pctrval pctr_tsc;
                   pctrval pctr_hwc[PCTR_NUM];
                   pctrval pctr_idl;
           };

     In  this structure, ctr_fn contains the functions of the two
counters, as
     previously set by the PCIOCS0 and PCIOCS1  ioctls  (see  below).  pctr_hwc
     contains the actual value of the two hardware counters.  pctr_tsc is a
     free-running, 64-bit cycle counter.  Finally, pctr_idl is  a
64-bit count
     of idle-loop iterations.

     The  functions  of  the  two counters can be programmed with
ioctls PCIOCS0
     and PCIOCS1, which require a writeable file  descriptor  and
take an argument
  of  type unsigned int.  The meaning of this integer is
dependent on
     the particular CPU.

   Time stamp counter    [Toc]    [Back]
     The time stamp counter is available  on  all  machines  with
Pentium and Pentium
  Pro  counters,  as  well as on some 486s and non-intel
CPUs.  It is set
     to zero at boot time, and then increments with  each  cycle.
Because the
     counter is 64-bits wide, it does not overflow.

     The  time  stamp counter can be read directly from user-mode
using the
     rdtsc() macro, which returns a 64-bit value of type pctrval.
The following
 example illustrates a simple use of rdtsc to measure the
execution
     time of a hypothetical subroutine called functionx():

           void
           time_functionx(void)
           {
                   pctrval tsc;

                   tsc = rdtsc();
                   functionx();
                   tsc = rdtsc() - tsc;
                   printf ("Functionx took %qd cycles.0, tsc);
           }

     The value of the time stamp counter is also returned by  the
PCIOCRD
     ioctl, so that one can get an exact timestamp on readings of
the hardware
     event counters.

   Pentium counters    [Toc]    [Back]
     The Pentium counters are programmed with a 9  bit  function.
The top three
     bits contain the following flags:

     P5CTR_K   Enables  counting  of  events that occur in kernel
mode.

     P5CTR_U  Enables counting of events that occur in user mode.
You must
              set  at  least  one of P5CTR_U and P5CTR_K to count
anything.

     P5CTR_C  When this flag is  set,  the  counter  attempts  to
count the number
              of  cycles  spent  servicing  a  particular  event,
rather than simply
              the number of occurrences of that event.

     The bottom 6 bits set the particular event counted.  Here is
the event
     type  of each permissible value for the bottom 6 bits of the
counter function:


           0x00  Data read
           0x01  Data write
           0x02  Data TLB miss
           0x03  Data read miss
           0x04  Data write miss
           0x05  Write (hit) to M or E state lines
           0x06  Data cache lines written back
           0x07  Data cache snoops
           0x08  Data cache snoop hits
           0x09  Memory accesses in both pipes
           0x0a  Bank conflicts
           0x0b  Misaligned data memory references
           0x0c  Code read
           0x0d  Code TLB miss
           0x0e  Code cache miss
           0x0f  Any segment register load
           0x12  Branches
           0x13  BTB hits
           0x14  Taken branch or BTB hit
           0x15  Pipeline flushes
           0x16  Instructions executed
           0x17  Instructions executed in the V-pipe
           0x18  Bus utilization (clocks)
           0x19  Pipeline stalled by write backup
           0x1a  Pipeline stalled by data memory read
           0x1b  Pipeline stalled by write to E or M line
           0x1c  Locked bus cycle
           0x1d  I/O read or write cycle
           0x1e  Non-cacheable memory references
           0x1f  AGI (Address Generation Interlock)
           0x22  Floating-point operations
           0x23  Breakpoint 0 match
           0x24  Breakpoint 1 match
           0x25  Breakpoint 2 match
           0x26  Breakpoint 3 match
           0x27  Hardware interrupts
           0x28  Data read or data write
           0x29  Data read miss or data write miss

   Pentium Pro counters    [Toc]    [Back]
     The Pentium Pro counter  functions  contain  several  parts.
The most significant
   byte   (an   8-bit   integer   shifted   left  by
P6CTR_CM_SHIFT) contains
     a counter mask.  If non-zero, this sets a threshold for  the
number of
     times an event must occur in one cycle for the counter to be
incremented.
     The counter mask can therefore be used to  count  cycles  in
which an event
     occurs  at  least  some number of times.  The next byte contains several
     flags:

     P6CTR_U   Enables counting of  events  that  occur  in  user
mode.

     P6CTR_K    Enables  counting  of events that occur in kernel
mode.  You must
               set at least one of P6CTR_K and P6CTR_U  to  count
anything.

     P6CTR_E    Counts  edges rather than cycles.  For some functions this allows
 you to get  an  estimate  of  the  number  of
events rather than
               the number of cycles occupied by those events.

     P6CTR_EN   Enable  counters.   This  bit  must be set in the
function for
               counter 0 in order for either of the  counters  to
be enabled.
               This  bit  should  probably be set in counter 1 as
well.

     P6CTR_I   Inverts the sense of the counter mask.  When  this
bit is set,
               the  counter  only  increments  on cycles in which
there are no
               more events than specified in the counter mask.

     The next byte, also known as the unit mask,  contains  flags
specific to
     the  event  being  counted.   For events dealing with the L2
cache, the following
 flags are valid:

     P6CTR_UM_M  Count events involving modified cache lines.

     P6CTR_UM_E  Count events involving exclusive cache lines.

     P6CTR_UM_S  Count events involving shared cache lines.

     P6CTR_UM_I  Count events involving invalid cache lines.
     To measure all L2 cache activity, all these bits  should  be
set.  They can
     be  set with the macro P6CTR_UM_MESI which contains the bitwise or of all
     of the above.

     For event types dealing with bus transactions, there is  another flag that
     can be set in the unit mask:

     P6CTR_UM_A  Count all appropriate bus events, not just those
initiated by
                 the processor.

     Finally, the least significant byte of the counter  function
is the event
     type to count.  The following values are available:

     0x03 LD_BLOCKS
           Number of store buffer blocks.
     0x04 SB_DRAINS
           Number of store buffer drain cycles.
     0x05 MISALIGN_MEM_REF
           Number of misaligned data memory references.
     0x06 SEGMENT_REG_LOADS
           Number of segment register loads.
     0x10 FP_COMP_OPS_EXE (ctr0 only)
           Number of computational floating-point operations executed.
     0x11 FP_ASSIST (ctr1 only)
           Number of floating-point exception  cases  handled  by
microcode.
     0x12 MUL (ctr1 only)
           Number of multiplies.
     0x13 DIV (ctr1 only)
           Number of divides.
     0x14 CYCLES_DIV_BUSY (ctr0 only)
           Number of cycles during which the divider is busy.
     0x21 L2_ADS
           Number of L2 address strobes.
     0x22 L2_DBUS_BUSY
           Number of cycles during which the data bus was busy.
     0x23 L2_DBUS_BUSY_RD
           Number  of  cycles  during which the data bus was busy
transferring
           data from L2 to the processor.
     0x24 L2_LINES_IN
           Number of lines allocated in the L2.
     0x25 L2_M_LINES_INM
           Number of modified lines allocated in the L2.
     0x26 L2_LINES_OUT
           Number of lines removed from the L2 for any reason.
     0x27 L2_M_LINES_OUTM
           Number of modified lines removed from the L2  for  any
reason.
     0x28 L2_IFETCH/mesi
           Number of L2 instruction fetches.
     0x29 L2_LD/mesi
           Number of L2 data loads.
     0x2a L2_ST/mesi
           Number of L2 data stores.
     0x2e L2_RQSTS/mesi
           Number of L2 requests.
     0x43 DATA_MEM_REFS
           All   memory   references,  both  cacheable  and  noncacheable.
     0x45 DCU_LINES_IN
           Total lines allocated in the DCU.
     0x46 DCU_M_LINES_IN
           Number of M state lines allocated in the DCU.
     0x47 DCU_M_LINES_OUT
           Number of M state lines evicted from  the  DCU.   This
includes evictions
 via snoop HITM, intervention or replacement.
     0x48 DCU_MISS_OUTSTANDING
           Weighted  number  of  cycles  while a DCU miss is outstanding.
     0x60 BUS_REQ_OUTSTANDING
           Number of bus requests outstanding.
     0x61 BUS_BNR_DRV
           Number of bus clock cycles during which the  processor
is driving
           the BNR pin.
     0x62 BUS_DRDY_CLOCKS/a
           Number of clocks during which DRDY is asserted.
     0x63 BUS_LOCK_CLOCKS/a
           Number of clocks during which LOCK is asserted.
     0x64 BUS_DATA_RCV
           Number  of bus clock cycles during which the processor
is receiving
           data.
     0x65 BUS_TRAN_BRD/a
           Number of burst read transactions.
     0x66 BUS_TRAN_RFO/a
           Number of read for ownership transactions.
     0x67 BUS_TRANS_WB/a
           Number of write back transactions.
     0x68 BUS_TRAN_IFETCH/a
           Number of instruction fetch transactions.
     0x69 BUS_TRAN_INVAL/a
           Number of invalidate transactions.
     0x6a BUS_TRAN_PWR/a
           Number of partial write transactions.
     0x6b BUS_TRANS_P/a
           Number of partial transactions.
     0x6c BUS_TRANS_IO/a
           Number of I/O transactions.
     0x6d BUS_TRAN_DEF/a
           Number of deferred transactions.
     0x6e BUS_TRAN_BURST/a
           Number of burst transactions.
     0x6f BUS_TRAN_MEM/a
           Number of memory transactions.
     0x70 BUS_TRAN_ANY/a
           Number of all transactions.
     0x79 CPU_CLK_UNHALTED
           Number of cycles during which  the  processor  is  not
halted.
     0x7a BUS_HIT_DRV
           Number  of bus clock cycles during which the processor
is driving
           the HIT pin.
     0x7b BUS_HITM_DRV
           Number of bus clock cycles during which the  processor
is driving
           the HITM pin.
     0x7e BUS_SNOOP_STALL
           Number  of  clock cycles during which the bus is snoop
stalled.
     0x80 IFU_IFETCH
           Number of instruction fetches, both cacheable and noncacheable.
     0x81 IFU_IFETCH_MISS
           Number of instruction fetch misses.
     0x85 ITLB_MISS
           Number of ITLB misses.
     0x86 IFU_MEM_STALL
           Number of cycles that the instruction fetch pipe stage
is stalled,
           including cache misses, ITLB misses, ITLB faults,  and
victim cache
           evictions.
     0x87 ILD_STALL
           Number  of  cycles that the instruction length decoder
is stalled.
     0xa2 RESOURCE_STALLS
           Number of cycles during which there  are  resource-related stalls.
     0xc0 INST_RETIRED
           Number of instructions retired.
     0xc1 FLOPS (ctr0 only)
           Number  of computational floating-point operations retired.
     0xc2 UOPS_RETIRED
           Number of UOPs retired.
     0xc4 BR_INST_RETIRED
           Number of branch instructions retired.
     0xc5 BR_MISS_PRED_RETIRED
           Number of mispredicted branches retired.
     0xc6 CYCLES_INT_MASKED
           Number of processor cycles for  which  interrupts  are
disabled.
     0xc7 CYCLES_INT_PENDING_AND_MASKED
           Number  of  processor  cycles for which interrupts are
disabled and
           interrupts are pending.
     0xc8 HW_INT_RX
           Number of hardware interrupts received.
     0xc9 BR_TAKEN_RETIRED
           Number of taken branches retired.
     0xca BR_MISS_PRED_TAKEN_RET
           Number of taken mispredicted branches retired.
     0xd0 INST_DECODER
           Number of instructions decoded.
     0xd2 PARTIAL_RAT_STALLS
           Number of cycles or events for partial stalls.
     0xe0 BR_INST_DECODED
           Number of branch instructions decoded.
     0xe2 BTB_MISSES
           Number of branches that miss the BTB.
     0xe4 BR_BOGUS
           Number of bogus branches.
     0xe6 BACLEARS
           Number of times BACLEAR is asserted.

     Events marked /mesi require the P6CTR_UM_[MESI] bits in  the
unit mask.
     Events marked /a can take the P6CTR_UM_A bit.

     Unlike the Pentium counters, the Pentium Pro counters can be
read directly
 from user-mode without need to invoke  the  kernel.   The
macro
     rdpmc(ctr) takes 0 or 1 as an argument to specify a counter,
and returns
     that counter's 40-bit value (which will be of type pctrval).
This is
     generally  preferable  to  making a system call as it introduces less distortion
 in measurements.  However, you should  be  aware  of
the possibility
     of  an  interrupt  between  invocations  of  rdpmc()  and/or
rdtsc().

FILES    [Toc]    [Back]

     /dev/pctr

ERRORS    [Toc]    [Back]

     [ENODEV]  An attempt was made to set the  counter  functions
on a CPU that
               does not support counters.

     [EINVAL]  An invalid counter function was provided as an argument to the
               PCIOCS0 or PCIOCS1 ioctl.

     [EPERM]   An attempt was made to set the counter  functions,
but the device
 was not open for writing.

SEE ALSO    [Toc]    [Back]

      
      
     pctr(1), ioctl(2)

HISTORY    [Toc]    [Back]

     A pctr device first appeared in OpenBSD 2.0.

AUTHORS    [Toc]    [Back]

     The    pctr   device   was   written   by   David   Mazieres
<[email protected]>.

BUGS    [Toc]    [Back]

     Not all counter functions are completely accurate.  Some  of
the functions
     don't seem to make any sense at all.

OpenBSD      3.6                          August     15,     1996
[ Back ]
 Similar pages
Name OS Title
uperf OpenBSD performance counters driver
pctr OpenBSD display CPU performance counters
i386_pmc_info NetBSD interface to CPU performance counters
i386_pmc_startstop NetBSD interface to CPU performance counters
i386_pmc_read NetBSD interface to CPU performance counters
r10k_evcntrs IRIX Programming the processor event counters
ecadmin IRIX configure and control the global event counters
libperfex IRIX A procedural interface to processor event counters
ecfind IRIX report processes using process-based event counters
perfex IRIX Command line interface to processor event counters
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service