pfm - The on-chip performance counter pseudo-device
pseudo-device pfm
The pfm pseudo-device is the interface to Alpha implementation-specific
on-chip performance counters. A set of
ioctl calls form the interface, as defined in the
<sys/pfcntr.h> header file.
The kernel in use must have the pfm pseudo-device configured
into it. To do this, use one of the following methods:
Add the following line to the kernel configuration
file and rebuild the kernel. Do not use this method if CPU
hot-swap is supported by the system, because it does not
allow pfm to be easily unconfigured as required for a hotswap;
instead, use the sysconfig method below.
pseudo-device pfm Enter the following command
from the root account. Do not configure pfm if CPU hotswap
is anticipated.
# sysconfig -c pfm
If pfm is configured, the CPU hot-swap procedure
requires that it be unconfigured, using the following
command, before any CPU is swapped.
# sysconfig -u pfm
The autosysconfig program can be used to automatically
load the configurable pfm device at each system
startup.
EV4 INTERFACE DESCRIPTION [Toc] [Back] The EV4 implementations (21064, 21064A, 21066, and 21068)
have two counters, each of which can be independently programmed
to count certain internal or external events. Each
counter interrupts the system when a certain number of the
selected events have been counted. Any one of the following
three actions can happen at each interrupt (tick):
Counters (PFM_COUNTERS) IPL histogramming (PFM_IPL) User
or kernel PC profiling (PFM_PROFILING)
These values are defined in <sys/pfcntr.h> and can be
selected orthogonally by bitwise ORing the selections
together and passing the result to the PCNTSETITEMS ioctl
request.
If counters are enabled, the interrupt count for this
event is incremented. This records the number of times
each event has happened, in multiples of the interrupt
frequency selected (PCNTSETMUX). Note that the driver can
only count the interrupts generated; no direct access to
the EV4 on-chip counter values is provided.
If IPL histogramming is enabled, the appropriate entry in
the IPL array is incremented. The entries are: 0-5 refer
to IPL0-IPL5. 6 is unused. (IPL6 is the level of the performance
counter interrupts.) 7 counts "idle" ticks (IPL
= 0 and current_thread = idle_thread). 8 counts user mode
ticks.
If profiling is enabled, a PC sample is added to the profile
histogram if the mode is correct (kernel or user).
Each CPU in a multiprocessor platform has separate counters,
and the device can be opened in three different
ways: PCNTOPENONE opens and collects data on only the CPU
that the program is running on. PCNTOPENEACH opens all
CPUs but keeps data for each one separately. PCNTOPENALL
opens all CPUs, aggregating the data for all CPUs into one
collection.
These values are defined in <sys/pfcntr.h> and are bitwise
ORed into the mode passed to the device open call. Note
that if PCNTOPENONE is selected, the opening thread/process
must be bound to that processor; otherwise, the open
will fail. It must also remain bound to that processor for
the duration of the driver usage or extremely unpredictable
results will occur.
The following ioctl calls apply to the performance counter
pseudo-device. Note that most of the EV4 ioctls can also
be used on EV5, EV6, and EV7:
Disables performance counter interrupts on the CPU. Takes
no arguments. Enables performance counter interrupts on
the CPU. Takes no arguments. Selects the statistics to be
counted by each performance counter and the interrupt frequency.
Takes a pointer to a struct iccsr that contains
the MUX register values desired. The fields in this register
are: Controls the interrupt frequency of performance
counter 0. If set, interrupt frequency is every 2^12
events. If clear, interrupt frequency is every 2^16
events. Controls the interrupt frequency of performance
counter 1. If set, interrupt frequency is every 2^8
events. If clear, interrupt frequency is every 2^12
events. Selects the event counted by counter 0. One of:
PF_ISSUES, PF_PIPEDRY, PF_LOADI, PF_PIPEFROZEN,
PF_BRANCHI, PF_CYCLES, PF_PALMODE, PF_NONISSUES,
PF_EXTPIN0 Selects the event counted by counter 1. One of:
PF_DCACHE, PF_ICACHE, PF_DUAL, PF_BRANCHMISS, PF_FPINST,
PF_INTOPS, PF_STOREI, PF_EXTPIN1 Contains two bits, each
of which disables data collection on the specified
counter. For example, set to 2 to disable counter 1 and
enable counter 0. Cannot be set to 3 (which disables both
counters, causing PCNTSETMUX to return EINVAL). Do not
set these fields. Must be zero. Selects the data items to
be collected at each tick: Counters (PFM_COUNTERS) IPL
histogramming (PFM_IPL) User or kernel PC profiling
(PFM_PROFILING - see PCNTSETUADDR, PCNTSETURANGE,
PCNTSETKADDR, and PCNTSETKRANGE)
These values are defined in <sys/pfcntr.h> and can
be selected orthogonally by bitwise ORing the
selections together into the integer argument. If
no items are selected, returns EINVAL. Sets the
on-chip counters to count all system activity.
Takes no arguments and returns no errors. Sets the
on-chip counters to count only those threads/processes
with the PCB_PME_BIT set in their PCBs, and
sets the PCB_PME_BIT for this process. This bit is
inherited across fork/exec, setting it for all
children. Takes no arguments and returns no errors.
Clears the PCB_PME_BIT in the PCB of the current
process. Takes no arguments and returns no errors.
Clears the driver's internal counters appropriate
to the actions selected. If PFM_COUNTERS is
enabled, the interrupt counters and cycle counter
value are reset. If PFM_IPL is enabled, the IPL
histogram is reset. If neither is enabled (PFM_PROFILING
only), returns EINVAL and nothing is
cleared. Takes no arguments. Returns the driver's
counter values and the pcc value(s). Takes a
pointer to an array of struct pfcntrs; the array is
filled in with the values. Sample usage of this
ioctl is: struct pfcntrs cntrs[NUM_OF_CPUS]; struct
pfcntrs *pfcntrs = cntrs; ioctl (fd, PCNTGETCNT,
&pfcntrs);
If the driver is opened in mode PCNTOPENEACH, the
underlying array must be big enough to hold all of
the data for each CPU; otherwise, EFAULT is
returned. If the driver is opened in mode PCNTOPENONE
or PCNTOPENALL, the array can be one element.
If PFM_COUNTER is not enabled, returns EINVAL.
Returns the number of bytes of data available
to read for getting the PC profiling samples. By
default this will be equal to one fourth of the
address range being profiled. (By default, profiling
data is kept as one bucket per four instructions,
which corresponds to a default profiling
stride of 4 instructions per sample count.) If the
driver is opened in mode PCNTOPENEACH, this number
of bytes will be multiplied by the number of CPUs.
To set the profiling address range and stride (and
select user or kernel profiling), use the PCNTSETURANGE
or PCNTSETKRANGE ioctl, respectively. To set
the address range without changing the stride, you
can also use the PCNTSETUADDR or PCNTSETKADDR
ioctl.
The PCNTGETRSIZE ioctl takes a pointer to a long
and returns no errors. The returned value will be 0
if profiling is not currently selected or if the
address range and mode have not been specified.
Returns the current IPL histogram(s). Takes a
pointer to an array of struct pfipls; the array is
filled in with the values. Sample usage of this
ioctl is: struct pfipls ipls[NUM_OF_CPUS]; struct
pfipls *pfipls = ipls; ioctl (fd, PCNTGETIPLHIS,
&pfipls);
If the driver is opened in mode PCNTOPENEACH, the
underlying array must be big enough to hold all of
the data for each CPU. If the underlying array is
not big enough, EFAULT might be returned or other
data in the program might be overwritten.
If the driver is opened in mode PCNTOPENONE or PCNTOPENALL,
the array can be one element. If PFM_IPL
is not enabled, returns EINVAL. If kernel mode
profiling is turned on (with PCNTSETKADDR or
PCNTSETKRANGE), directs the profiler to collect
data on the caller of certain system utility routines
(for example, bcopy, bzero, simple_lock). If
kernel mode profiling is not turned on, returns
EINVAL. (See also the descriptions of PCNTSETKADDR
and PCNTSETKRANGE for information about their use
in PCNTCALLER mode.) Sets the kernel address range
to profile and turns on kernel mode PC profiling.
If the device is not open for profiling, returns
EINVAL. If memory cannot be obtained for the sample
data, returns ENOMEM.
If PCNTCALLER kernel profiling mode is engaged,
specifies an additional address range to collect
profiling data on the caller of a routine, instead
of the routine itself. Takes a start and end
address range. Up to 4 additional address ranges
may be added; additional attempts will return
ENOSPC. If the addresses are out of range of kernel
text, not aligned, or otherwise invalid, returns
EFAULT.
Note that PCNTSETKRANGE performs the same functions
as PCNTSETKADDR and, in addition, lets you set the
profiling stride. Sets the kernel address range to
profile and sets the profile stride (the number of
consecutive instructions grouped together for each
sample count). The stride must be a power of two
(for example, 0, 1, 2, 4, 8). A zero stride means
there should be only one counter for the whole
address range. This ioctl also turns on kernel mode
PC profiling. If the device is not open for profiling,
returns EINVAL. If memory cannot be obtained
for the sample data, returns ENOMEM.
If PCNTCALLER kernel profiling mode is engaged,
specifies an additional address range to collect
profiling data on the caller of a routine, instead
of the routine itself. Takes a start and end
address range, and ignores the stride. Up to 4
additional address ranges may be added; additional
attempts will return ENOSPC. If the addresses are
out of range of kernel text, not aligned, or otherwise
invalid, returns EFAULT. Sets the user
address range to profile and turns on user mode PC
profiling. If the device is not open for profiling,
returns EINVAL. If memory cannot be obtained for
the sample data, returns ENOMEM. Note that PCNTSETURANGE
performs the same functions as PCNTSETUADDR
and, in addition, lets you set the profiling
stride. Sets the user address range to profile and
sets the profile stride (the number of consecutive
instructions grouped together for each sample
count). The stride must be a power of two (for
example, 0, 1, 2, 4, 8). A zero stride means there
should be only one counter for the whole address
range. This ioctl also turns on user mode PC profiling.
If the device is not open for profiling,
returns EINVAL. If memory cannot be obtained for
the sample data, returns ENOMEM.
Only one process can have the pfm device open at any point
in time. If the device is opened with PCNTOPENONE, only
the specified CPU is considered open; subsequent open
attempts will return EBUSY. If the device is opened with
PCNTOPENALL or PCNTOPENEACH, all CPUs must be available;
otherwise, returns EBUSY.
EBUSY will also be returned if another tool is using the
performance counters (or has used them but has not
restored the default performance counter interrupt handler).
In this case, if you are sure no other users are
using the performance counters, re-execute the open call
with superuser privilege. This will reset the busy status
and proceed to use the counters.
It is sufficient to open the device read-only. Opening the
device will disable interrupts (PCNTDISABLE) and log all
system activity (PCNTLOGALL), generating simple counters
only. The counters are not cleared. Closing the device
automatically disables interrupts and resets the service
routines (PCNTDISABLE).
EV4 DETAILED STAT DESCRIPTIONS [Toc] [Back] Following are more detailed descriptions of each of the
events that can be counted by the two on-chip counters
associated with the EV4 implementations. For more information,
consult the 21064 chip specification.
Counter 0: This counter is incremented by one for each
cycle in which two instructions are issued and is incremented
by 1/2 for each cycle in which one instruction is
issued. The number of cycles in which one instruction is
issued can be found by using the Dual Issues field and the
equation S = (I - D) * 2, where S = Single Issues, D =
Dual Issues, and I = Issues. This counter is incremented
by one for each cycle in which nothing is issued due to
the lack of valid instruction stream data. The causes
could be instruction cache refill operations (due to normal
sequential operation or delays while fetching the target
of a branch) or delays caused by the draining of the
pipeline in response to an exception. This counter is
incremented for each load instruction. Note: If a load
misses in the primary data cache, the replay of the
instruction will cause the load counter to be incremented
again. This counter is incremented for each cycle in
which nothing is issued due to a resource conflict within
the pipeline. Examples are: Not all source and destination
registers are available A load miss or write buffer
overflow occurs A conditional branch cannot be issued in
the cycle following a jump Memory Barrier instruction processing
can cause the pipe to freeze This counter is
incremented for each branch instruction. This counter is
incremented for each cycle. This counter is incremented
for each cycle spent in PALmode. This counter is incremented
by one for each cycle in which no instructions are
issued and is incremented by 1/2 for each cycle in which
only one instruction is issued. This counter is the
inverse of the Issues counter: Non-issues = 1 - Issues.
This counter is incremented for each external event supplied
to external pin 0. On the DEC 3000/500 and DEC
3000/400, this pin is connected to logic that indicates
external cache misses with victims. A victim is a data
block that must be written back to main memory before it
is reused.
Counter 1: This counter is incremented for each primary
data cache miss. Note: this counter actually is incremented
each time a primary data cache probe does not complete
in one cycle. This includes all misses, but also
includes hits that are stalled for other reasons such as
bus traffic holding previously misses pending. This
counter is incremented for each primary instruction cache
miss. This counter is incremented for each cycle in which
two instructions are dual-issued. This counter is incremented
for each incorrectly predicted branch. This
counter is incremented for each floating-point operate
instruction. The floating-point operate instructions do
not include the floating-point load, floating-point branch
and floating-point store instructions. This counter is
incremented for each integer operate instruction as well
as for each Load Address and Load Address High instruction.
This counter is incremented for each store instruction.
This counter is incremented for each external event
supplied to external pin 1. On the DEC 3000/500 and DEC
3000/400, this pin is connected to logic that indicates
external cache misses without victims.
Most items count the instances of different types of
instructions. These counters are incremented for each
occurrence, and they do not give information about the
cost of executing the instruction. The Pipe Frozen/Dry
counter increments for each frozen or dry cycle, not for
each instance of pipe freeze or pipe dry.
EV5 INTERFACE DESCRIPTION [Toc] [Back] The EV5 implementations (21164, 21164A, and 21164PC) have
three counters, each of which can be independently programmed
to count certain internal or external events. They
operate in much the same way as on EV4. Most of the EV4
ioctl calls can also be used on EV5. Here are some
descriptions for EV5-specific ioctl calls: Selects the
events counted by all three counters. The argument is a
bitwise OR of one event name for each counter. See
<sys/pfcntr.h> for the identifiers for the events:
PF5_MUX0_*, PF5_MUX1_*, PF5_MUX2_*. Selects the sampling
interrupt frequency for all three counters. The argument
is a bitwise OR of one frequency indicator for each
counter. A frequency of 256 requires superuser privilege
because it can place an extremely heavy load on the system.
Only carefully selected rare events should be counted
with such a high frequency. A lower frequency is usually
advisable, for example: PF5_C0_INT_EVERY_65536
PF5_C1_INT_EVERY_65536 PF5_C2_INT_EVERY_16384 Enables
selected counters. (PCNT5RESTART zeroes them first.) The
argument is the address of the pmctrs_ev5_long member of a
union pmctrs_ev5, with the following additional field-member
assignments: pmctrs_ev5_cpu = PMCTRS_ALL_CPUS pmctrs_ev5_select
= any combination of PF5_SEL_COUNTER_0,
PF5_SEL_COUNTER_1, and PF5_SEL_COUNTER_2 using a bitwise
OR operator Disables selected counters. Clears or writes
selected counters on selected CPUs. The argument is the
address of the pmctrs_ev5_long member of a union pmctrs_ev5.
See <sys/pfcntr.h> for more information. Sets
contexts in which to count. The argument is a bitwise OR
of selected PF5_CTXT_* values. Similar to EV4's PCNTGETCNT
except that the argument is a pointer to an array
of struct pfcntrs_ev5. Similar to PCNT5GETCNT except that
the driver's counter values (i.e., the number of interrupts
from each counter) are shifted left by the counter
width. The current raw hardware counters are read and
added to the tally. Reads the hardware counters from the
selected CPU. The argument is the address of the pmctrs_ev5_long
member of a union pmctrs_ev5. See <sys/pfcntr.h>
for more information.
EV5 DETAILED STAT DESCRIPTIONS [Toc] [Back] Following are more detailed descriptions of each of the
events that can be counted by the three on-chip counters
associated with the EV5 implementations. For more information,
see the 21164 or 21164PC chip specification.
All EV5 Implementations (EV5, EV56, PCA56)
Counter 0: This counter is incremented for each cycle.
(Note that counter 2 also has a cycles counter.) This
counter is incremented for each instruction.
Counter 1: This counter is incremented for each cycle in
which valid instructions are ready for issue, but none are
issued because of a pipeline stall or because the
resources they need are not available. This counter is
incremented for each cycle in which some but not all of
the maximum of four instructions are issued. This counter
is incremented for each cycle in which no instructions are
ready to issue. This counter is incremented for each time
an instruction has to be executed again (instead of those
behind it in the pipeline) because resources it needed
were found to be unavailable the first time it executed.
This counter is incremented for each cycle in which one
instruction is issued. This counter is incremented for
each cycle in which two instructions are issued. This
counter is incremented for each cycle in which three
instructions are issued. This counter is incremented for
each cycle in which four instructions are issued. This
counter is incremented for each branch, jump, or return
instruction. This counter is incremented for each integer
operation. This counter is incremented for each floatingpoint
operation. This counter is incremented for each
load operation. This counter is incremented for each
store operation. This counter is incremented for each
Instruction Cache access. This counter is incremented for
each Data Cache access.
Counter 2: This counter is incremented for each long
pipeline stall (over 15 cycles). This counter is incremented
for each PC misprediction. This counter is incremented
for each branch misprediction. This counter is
incremented for each instruction not found in either the
Instruction Cache or the associated Refill Buffer. This
counter is incremented for each Instruction Cache miss for
which the instruction's page entry is not stored in the
Instruction Translation Buffer. This counter is incremented
for each load of a value that is not in the Data
Cache. This counter is incremented for each Data Cache
miss for which the data page entry is not stored in the
Data Translation Buffer. This counter is incremented for
each load from an address that misses in the Data Cache
but is merged with another load from the same address that
is already in the Missed Address File. This counter is
incremented for each Data Cache miss (for a load) that
causes the replay of a later instruction that uses the
loaded value. This counter is incremented for each store
that is replayed because the Write Buffer is full and for
each load that is replayed because the Missed Address File
is full. This counter is incremented for each cycle for
which the perf_mon_h External Input pin is true. This
counter is incremented for each cycle. (Note that counter
0 also has a cycles counter.) This counter is incremented
for each stall cycle resulting from a Memory Barrier.
This counter is incremented for each Locked Load instruction.
EV5 and EV56 Implementations Only [Toc] [Back]
Counter 1: This counter is incremented for each Secondary
Cache access (for either instructions or data). This
counter is incremented for each read from the Secondary
Cache. This counter is incremented for each write to the
Secondary Cache. (Note that counter 2 also has a
scachewrites counter.) This counter is incremented for
each time a data block in the Secondary Cache must be
written back to main memory before it is reused. This
counter is incremented for each access to the optional,
board-level Backup Cache. This counter is incremented for
each time a data block in the Backup Cache must be written
back to main memory before it is reused. This counter is
incremented for each system request.
Counter 2: This counter is incremented for each Secondary
Cache miss. This counter is incremented for each Secondary
Cache Read miss. This counter is incremented for
each Secondary Cache Write miss. This counter is incremented
for each Secondary Cache Shared Write operation.
This counter is incremented for each Secondary Cache Write
operation. (Note that counter 1 also has a scachewrites
counter.) This counter is incremented for each miss in
the optional board-level Backup Cache. This counter is
incremented for each System Invalidate operation. This
counter is incremented for each System Read Request.
PCA56 Implementation Only [Toc] [Back]
Counter 1: This counter is incremented for each read
request from the MBOX. This counter is incremented for
each Dstream read request that hits in the bcache. This
counter is incremented for each Dstream read fill to the
Bcache. This counter is incremented for each write
request from the MBOX. This counter is incremented for
each write that hits a clean block in the Bcache. This
counter is incremented for each VICTIM command issued by
the 21164PC. This counter is incremented each time a second
READ_MISS is sent to the system while an earlier
READ_MISS command is still outstanding.
Counter 2: This counter is incremented for each Dstream
read request from the MBOX. This counter is incremented
for each read request that hits in the Bcache. This
counter is incremented for each read fill to the Bcache.
This counter is incremented for each write that hits in
the Bcache. This counter is incremented for each write
fill to the Bcache. This counter is incremented for each
system READ or FLUSH hit in the Bcache. This counter is
incremented for each system READ or FLUSH request. This
counter is incremented each time a third READ_MISS is sent
to the system while two earlier READ_MISS commands are
still outstanding.
EV6 INTERFACE DESCRIPTION [Toc] [Back] The EV6 implementation (21264) has two counters, each of
which can be programmed to count certain internal or
external events. They operate in much the same way as the
counters on EV4 and EV5. Most of the EV4 ioctl calls can
also be used on EV6. Below are some descriptions for
EV6-specific ioctl calls. Note that the EV6 interface
should also be used on EV7 systems. Selects the events
counted by the two counters. The argument is a bitwise OR
of one event name for each counter. See <sys/pfcntr.h>
for the identifiers for the events: PF6_MUX0_*,
PF6_MUX1_*. Enables selected counters. PCNT6RESTART zeros
them first. PCNT6ENABWRITE sets them to specified values.
The argument is the address of the pmctrs_ev6_long member
of a union pmctrs_ev6, with the following additional
field-member assignments: pmctrs_ev6_cpu = PMCTRS_ALL_CPUS
pmctrs_ev6_select = any combination of PF6_SEL_COUNTER_0
and PF6_SEL_COUNTER_1 using a bitwise OR operator. Disables
selected counters. Clears or writes selected counters
on selected CPUs. The argument is the address of the
pmctrs_ev6_long member of a union pmctrs_ev6. See
<sys/pfcntr.h> for more information. Similar to EV4's
PCNTGETCNT except that the argument is a pointer to an
array of struct pfcntrs_ev6. Reads the hardware counters
from the selected CPU. The argument is the address of the
pmctrs_ev6_long member of a union pmctrs_ev6. See
<sys/pfcntr.h> for more information. Similar to
PCNT6GETCNT except that the driver's counter values (i.e.,
the number of interrupts from each counter) are shifted
left by the counter width. The current raw hardware counters
are read and added to the tally.
EV6 DETAILED STAT DESCRIPTIONS [Toc] [Back] Following are more detailed descriptions of each of the
events that can be counted by the two on-chip counters
associated with the EV6 implementation. For more information,
see the 21264 chip specification.
Counter 0: This counter is incremented for each cycle.
(Note that counter 1 also has a cycles counter.) This
counter is incremented for every retired instruction.
Counter 1: This counter is incremented for each cycle.
(Note that counter 0 also has a cycles counter.) This
counter is incremented for each retired conditional
branch. This counter is incremented twice for each
retired single dstream translation buffer (DTB) miss.
This counter is incremented for each retired double DTB
miss. This counter is incremented for each retired
instruction translation buffer (ITB) miss. This counter
is incremented for each retired unaligned trap. This
counter is incremented for each replay trap.
EV67 AND EV7 DETAILED STAT DESCRIPTIONS [Toc] [Back] Following are some descriptions of events that can be
counted by the on-chip counters associated with the EV67
implementation. The EV67 counters may be used in two mutually
exclusive modes: traditional aggregate and profileme.
The EV67 traditional aggregate counters are not completely
independent. Any one statistic may be selected, or
one of the following pairs may be selected: (cycles0,
replay); (retinst, cycles1); (retinst, bcachemisses). EV7
provides the same statistics that EV67 does.
Counter 0: This counter is incremented for each cycle.
(Note that counter 1 also has a cycles counter.) This
counter is incremented for every retired instruction.
Counter 1: This counter is incremented for each cycle.
(Note that counter 0 also has a cycles counter.) This
counter is incremented for each miss in the Backup Cache.
This counter is incremented for each replay trap.
EV67 profile-me mode and traditional aggregate counters
work differently: instead of counting events as done by
traditional aggregate counters, instructions in profile-me
mode are uniformly selected and various events are
recorded during the execution of each selected
instruction.
The descriptions below are written for the perspective of
a uprofile or kprofile user. For example, the *_per_ret
statistics actually cause the pfm driver to return
(statistic, retired) pairs which are later processed by
uprofile or kprofile. Similarly, the freq statistic is
merely the same as the retired statistic until uprofile or
kprofile postprocesses it.
Any one of the following profile-me statistics may be
selected. This statistic is incremented if the profiled
execution is aborted. This ratio is the abort statistic
scaled by 100 and divided by the retired statistic. This
statistic is incremented if the profiled execution causes
an arithmetic trap. This statistic is incremented if the
profiled execution is a taken conditional branch. This
ratio is the cbr_taken statistic scaled by 100 and divided
by the retired statistic. This statistic is incremented
by the approximate number of cycles the execution was in
flight. This ratio is the cycles statistic divided by the
retired statistic. This statistic is incremented by the
approximate retire delay of the profiled execution. This
ratio is the delay statistic scaled by 100 and divided by
the retired statistic. This statistic is incremented if
the profiled execution causes a Dstream fault. This
statistic is incremented if the profiled execution causes
a DTB single miss. This ratio is the dtb_miss statistic
scaled by 100 and divided by the retired statistic. This
statistic is incremented if the profiled execution causes
a DTB double miss (3 level page tables). This statistic
is incremented if the profiled execution causes a DTB double
miss (4 level page tables). This statistic is incremented
if the profiled execution is killed early in the
pipeline. This ratio is the early_kill statistic scaled
by 100 and divided by the retired statistic. This statistic
is incremented if the profiled execution causes a
floating-point disabled trap. This statistic is incremented
if the profiled execution retires. uprofile and
kprofile average this statistic within basic blocks to
provide instruction execution frequency estimates. This
statistic is incremented if the profiled execution was not
yet prefetched for the cache. Note the profiled instruction
may experience an unrecorded icache miss if the fetch
is in progress. This ratio is the icache_miss statistic
scaled by 100 and divided by the retired statistic. This
statistic is incremented if the profiled execution experienced
an icache parity error. This statistic is incremented
by the approximate number of bcache misses during
the profiled execution. This statistic is incremented by
the approximate number of replay traps during the profiled
execution. This statistic is incremented by the approximate
number of instruction retires during the profiled
execution. This statistic is incremented if the profiled
execution is pre-empted by an interrupt. This statistic
is incremented if the profiled execution causes an istream
access violation. This statistic is incremented if the
profiled execution causes an ITB miss. This statistic is
incremented if the profiled execution causes a load-store
order trap. This statistic is incremented if the profiled
execution causes an unaligned load or store. This statistic
is incremented if the profiled execution stalled
before it was mapped. This ratio is the map_stall statistic
scaled by 100 and divided by the retired statistic.
This statistic is incremented if the profiled execution
experiences a misprediction. This ratio is the mispredict
statistic scaled by 100 and divided by the retired statistic.
This statistic is incremented if the profiled execution
causes a reserved opcode trap. This statistic is
incremented if the profiled execution causes a replay
trap. This ratio is the replay_trap statistic scaled by
100 and divided by the retired statistic. This statistic
is incremented if the profiled execution retires. This
statistic is incremented if the profiled execution causes
a trap. This ratio is the trap statistic scaled by 100
and divided by the retired statistic. This statistic is
incremented if the profiled execution is valid.
For more information, see the 21264a chip specification.
The notes in this section pertain only to EV4 processors.
Disabling an EV4 counter cannot actually disable it from
interrupting the CPU. However, the interrupt will be dismissed
without recording any data.
Connections of the CPU's External Input pins to external
events are platform dependent. The DEC 3000/400, /500,
/600, /800 workstations have these connections; they count
BCache Misses and BCache Misses with Victims.
Generating statistics on a per-process basis is only possible
on 21064 Pass 3 or later processors. Attempts to do
this on a Pass 2 or earlier will gather statistics for the
entire system.
The device entry (character, dev# 26/0) Structure definitions
Commands: kprofile(1), uprofile(1), prof(1), sysconfig(8),
autosysconfig(8)
pfm(7)
[ Back ] |