PERF_COUNTERS(5) PERF_COUNTERS(5)
r10k_evcntrs, r10k_event_counters, r10k_counters - Programming the
processor event counters
The R1x000 processors include counters that can be used to count the
frequency of events during the execution of a program. The information
returned by the counters can be helpful in optimizing the program. The
perfex(1) and ssrun(1) commands provide convenient interfaces to hardware
counter information.
The R10000 processor supplies two performance counters for counting
certain hardware events. Each counter can track one event at a time and
there are a choice of sixteen events per counter. There are also two
associated control registers which are used to specify which event the
relevant counter is counting.
The R12000 and R14000 processors supply two performance counters for
counting hardware events. Each counter can track one event at a time, and
you can choose among 32 events per counter.
Using performance counters in a machine with both R10000 and
R12000/R14000 processors is currently undefined.
Each counter is a 32-bit read / write register and is incremented by one
each time the event specified in its associated control register occurs.
Furthermore, the control registers allow one to indicate that the events
are only counted in a specific mode. The modes may be user mode or
several choices of kernel mode, or some combination of kernel and user
mode.
The counters can optionally assert an interrupt upon overflow, which is
defined to be when the most significant bit of one of the counter
registers (bit 31) becomes set. If such an overflow interrupt is enabled
for that event in the associated control register, then the interrupt
will be presented to the cpu. Whether the interrupt is asserted or not
the counting of events will continue after overflow.
THE CONTROL REGISTERS [Toc] [Back] The format of the control registers is as follows:
31 8 4 3 2 1 0
___________________________________________________________________
___________________________________________________________________
Bit 4 is the interrupt enable bit, which specifies whether overflows for
the specified event will generate interrupts or not. Bits 3 through 0
specify either the mode the event is counted in or the count enable bits.
These bits will enable counting when they match the equivalent KSU
settings in the status register of the R10000 or R12000/R14000. That is:
Page 1
PERF_COUNTERS(5) PERF_COUNTERS(5)
U bit <----> KSU = 2, EXL = 0, ERL = 0 (user mode)
S bit <----> KSU = 1, EXL = 0, ERL = 0 (supervisor mode, not supported)
K bit <----> KSU = 0, EXL = 0, ERL = 0 (kernel mode)
EXL bit <---> EXL = 1, ERL = 0 (transient kernel mode)
ERL is a field in the status register on coprocessor 0. It is set when
the processor hits an error and is forced into kernel mode.
If the KSU bits in the status register are 2, and the ERL and EXL bits
are both off, events enabled with the U bit will be counted. In this way,
a program that intends to use the performance counters directly must
specify the events that are to be counted and the modes in which they are
to be counted.
The following events can be tracked by the performance counters on R10000
processors:
0=cycles
Incremented on each clock cycle.
1=issued instructions
Incremented each time an instruction is issued to ALU, FPU or
load/store units.
2=issued loads
Incremented when a load, prefetch, or synchronization instruction is
issued.
3=issued stores
Incremented when a store instruction is issued.
4=issued store conditionals
Incremented when a conditional store instruction is issued.
5=failed store conditionals
Incremented when a store-conditional instruction fails. A failed
store-conditional instruction will, in the normal course of events,
graduate; so this event represents a subset of the store conditional
instructions counted on event 20 (graduated store conditionals).
6=Decoded branches
Incremented when a branch is decoded (for revision 2.x processors)
or resolved (for revision 3.x processors).
7=Quadwords written back from secondary cache
Incremented when data is written back from secondary cache to the
system interface.
Page 2
PERF_COUNTERS(5) PERF_COUNTERS(5)
8=correctable secondary cache data array ECC errors
Incremented when single-bit ECC erros are detected on data read from
secondary cache.
9=primary instruction cache misses
Incremented when the next instruction is not in primary instruction
cache.
10=secondary instruction cache misses
Incremented when the next instruction is not in secondary
instruction cache.
11=instruction misprediction from secondary cache way prediction table
Incremented when the secondary cache way mispredicted an
instruction.
12=external interventions
Incremented when an external intervention is entered into the Miss
Handling Table (MHT), provided that the intervention is not an
invalidate type.
13=external invalidations
Incremented when an intervention is entered into the Miss Handling
Table, provided that the intervention is an invalidate type.
14=virtual coherency conditions or ALU/FPU completion cycles
Incremented on virtual coherency conditions (on revision 2.x R10000
processors) or on ALU/FPU functional unit completions cycles (on
revision 3.x R10000 processors).
15=graduated instructions
Incremented when an instruction is graduated.
16=cycles
Incremented on each clock cycle.
17=graduated instructions
Incremented when an instruction is graduated.
18=graduated loads
Incremented on a graduated load, prefetch, or synchronization
instruction.
19=graduated stores
Incremented on a graduated store instruction.
20=graduated store conditionals
Incremented when a graduated conditional store instruction is
issued.
Page 3
PERF_COUNTERS(5) PERF_COUNTERS(5)
21=graduated floating-point instructions
Incremented when a graduated floating-point instruction is issued.
22=quadwords written back from primary data cache
Incremented when data is written back from primary data cache to
secondary cache.
23=TLB misses
Incremented when a translation lookaside buffer (TLB) refill
exception occurs.
24=mispredicted braches
Incremented when a branch is mispredicted.
25=primary (L1) data cache misses.
Incremented when the next data item is not in primary data cache.
26=secondary (L2) data cache misses.
Incremented when the next data item is not in secondary data cache.
27=data mispredicted from secondary cache way prediction table
Incremented when the secondary cache way mispredicted a data item.
28=external intervention hits in secondary cache (L2)
Set as follows when an external intervention is determined to have
hit in secondary cache:
00 Invalid, ho hit detected
01 Clean, shared
10 Clean, exclusive
11 dirty, exclusive
29=external invalidation hits in secondary cache (L2)
Set when an external invalidate request is determined to have hit in
the secondary cache. Its value is equivalent to that described for
event 28.
30=store/fetch exclusive to clean block in secondary cache (L2)
Incremented on each cycle by the number of entries in the Miss
Handling Table (MHT) waiting for a memory operation to complete.
31=store/fetch exclusive to shared block in secondary cache (L2)
Incremented when an update request is issued for a line in the
secondary cache. If the line is in the clean state, the counter is
incremented by one. If the line is in the shared state, the counter
is incremented by two. The conditional counting mechanism can be
used to select whether one, both, or neither of these events is
chosen.
Note that the definition of events 6 and 14 on counter 0 differ depending
on the R10000 chip revision. The chip revision can be determined via the
command hinv(1).
Page 4
PERF_COUNTERS(5) PERF_COUNTERS(5)
The following events can be tracked by the performance counters on R12000
and R14000 processors:
0=cycles
Incremented on each clock cycle.
1=decoded instructions
Incremented by the total number of instructions decoded on the
previous cycle. Since decoded instructions may later be killed (for
a variety of reasons), this count reflects the overhead due to
incorrectly speculated branches and exception processing.
2=decoded loads
Incremented when a load instruction was decoded on the previous
cycle. Prefetch, cache operations, and synchronization instructions
are not included in the count of decoded loads.
3=decoded stores
Incremented if a store instruction was decoded on the previous
cycle. Store conditionals are included in this count.
4=mishandling table occupancy
Incremented on each cycle by the number of currently valid entries
in the Miss Handling Table (MHT). The MHT has five entries. Four
entries are used for internally generated accesses; the fifth entry
is reserved for externally generated events. All five entries are
included in this count. See event 8 for a related definition.
5=failed store conditionals
Incremented when a store-conditional instruction fails. A failed
store-conditional instruction will, in the normal course of events,
graduate; so this event represents a subset of the store-conditional
instructions counted on event 20 (graduated store-conditionals).
6=resolved conditional branches
Incremented both when a branch is determined to have been
mispredicted and when a branch is determined to have been correctly
predicted. When this determination of the accuracy of a branchprediction
is known, the branch is known as "resolved." This counter
correctly reflects the case of multiple floating-point conditional
branches being resolved in a single cycle.
7=Quadwords written back from secondary cache
Incremented on each cycle that the data for a quadword is written
back from secondary cache to the system interface unit.
8=correctable secondary cache data array ECC errors
Incremented on the cycle following the correction of a single-bit
error in a quadword read from the secondary cache data array.
Page 5
PERF_COUNTERS(5) PERF_COUNTERS(5)
9=primary instruction cache misses
Incremented one cycle after an instruction fetch request is entered
into the Miss Handling Table.
10=secondary instruction cache misses
Incremented the cycle after a refill request is sent to the system
interface module of the CPU. This is normally just after the L2 tags
are checked and a miss is detected, but it may be delayed if the
system interface module is busy with another request.
11=instruction misprediction from secondary cache way prediction table
Incremented when the secondary cache control begins to retry an
access because it hit in the unpredicted way, provided the access
that initiated the access was an instruction fetch.
12=external interventions
Incremented on the cycle after an intervention is entered into the
Miss Handling Table, provided that the intervention is not an
invalidated type.
13=external invalidations
Incremented on the cycle after an intervention is entered into the
Miss Handling Table, provided that the intervention is an invalidate
type.
14=ALU/FPU progress cycles
Incremented on the cycle after either ALU1, ALU2, FPU1, or FPU2
marks an instruction as done.
15=graduated instructions
Incremented by the number of instructions that were graduated on the
previous cycle. Integer multiply and divide instructions each count
two graduated instructions because they occupy two entries in the
active list.
16=executed prefetch instructions
Incremented on the cycle after a prefetch instruction does its tagcheck,
regardless of whether a data cache line refill is initiated.
17=prefetch primary data cache misses
Incremented on the cycle after a prefetch instruction does its tagcheck
and a refill of the corresponding data cache line is
initiated.
18=graduated loads
Incremented by the number of loads that graduated on the previous
cycle. Prefetch instructions are included in this count. Up to four
loads can graduate in one cycle.
19=graduated stores
Incremented on the cycle after a store graduates. Only one store can
graduate per cycle. Store conditionals are included in this count.
Page 6
PERF_COUNTERS(5) PERF_COUNTERS(5)
20=graduated store conditions
Incremented on the cycle following the graduation of a storeconditional
instruction. Both failed and successful storeconditional
instructions are included in this count; so sucessful
store-conditionals can be determined as the difference between this
event and event 5 (failed store-conditionals).
21=graduated floating-point instructions
Incremented by the number of floating-point instructions that
graduated on the previous cycle. There can be 0 to 4 such
instructions.
22=quadwords written back from primary data cache
Incremented on each cycle that a quadword of data is valid and is
written from primary data cache to secondary cache.
23=TLB misses
Incremented on the cycle after the translation lookaside buffer
(TLB) miss handler is invoked.
24=mispredicted branches
Incremented on the cycle after a branch is restored because it was
mispredicted.
25=primary data cache misses
Incremented one cycle after a request is entered into the SCTP
logic, provided that the request was initially targeted at the
primary data cache. Such requests fall into three categories:
1) Primary data cache misses.
2) Requests to change the state of
secondary and primary data cache
lines from clean to dirty ("update"
requests) due to stores that hit
a clean line in the primary data
cache.
3) Requests initiated by cache
operation instructions.
26=secondary data cache misses
Incremented the cycle after a refill request is sent to the system
interface module of the CPU. This is normally just after the L2 tags
are checked and a miss is detected, but it can be delayed if the
system interface module is busy with another request.
27=data misprediction from secondary cache way prediction table
Incremented when the secondary cache control begins to retry an
access because it hit in the unpredicted way. The counter is
incremented only if access that initiated the access was not an
instruction fetch.
Page 7
PERF_COUNTERS(5) PERF_COUNTERS(5)
28=state of external intervention hits in secondary cache
Set on the cycle after an external intervention is determined to
have hit in the secondary cache. The value of the event is equal to
the state of the secondary cache line that was hit. Setting a
performance control register to select this event has a special
effect on the conditional counting behavior. If event 28 or 29 is
selected, the sense of the "Negated conditional counting" bit is
inverted. See the description of conditional counting for details.
The values are:
00 Invalid, ho hit detected
01 Clean, shared
10 Clean, exclusive
11 dirty, exclusive
29=state of invalidation hits in secondary cache (L2)
Set on the cycle after an external invalidate request is determined
to have hit in secondary cache. Its value is equivalent to that
described for event 28.
30=Miss Handling Table entries accessing memory
Incremented on each cycle by the number of entries in the Miss
Handling Table (MHT) waiting for a memory operation to complete. It
is always less than or equal to the value tracked by counter 4. An
entry is considered to begin accessing memory when the cache control
logic recognizes that a request must be sent via the SysA/D bus. An
entry is included in this count from that point until the entry is
removed from the MHT. For example, once the secondary cache tags are
checked and an secondary cache miss is recognized, the entry that
originated the request is included in this count. It continues to be
included until the last word of the refilled line is written into
the secondary cache and the MHT entry is removed. Unlike counter 4,
the fifth slot of the MHT, which is reserved for externally
generated requests, is not included in this count.
31=store/prefetch exclusive to shared block in secondary cache (L2)
Incremented on the cycle after an update request is issued for a
line in the secondary cache. If the line is in the clean state, the
counter is incremented by one. If the line is in the shared state,
the counter is incremented by two. The conditional counting
mechanism can be used to select whether one, both, or neither of
these events is chosen.
The kernel maintains 64-bit virtual counters for the user program using
the hardware counters. The view of the counters as being 64-bits is
maintained through the programming interfaces that use them, even though
the actual counters are only 32 bits. Similarly, there are only two
hardware counters per CPU, but the programming interface supports the
view that there are actually 32 counters. That is, a user program can
specify that more than one event per hardware counter is to be counted,
up to sixteen events per counter. The kernel will then multiplex the
events across clock tick boundaries. So, if a program is tracking more
than one event per counter, on every clock tick the kernel will check to
Page 8
PERF_COUNTERS(5) PERF_COUNTERS(5)
see if it is necessary to switch the events being tracked. If necessary,
it will save the counts for the previous events and set up the counters
for the next event. Thus, to the program there are 32 64-bit counters
available.
The performance counters are available to the user program primarily
through the perfex(1) and ssrun(1) commands. You can also access the
counters through the /proc(4) interface. A limited and more specialized
functionality is also provided through the syssgi(2) interface, but this
is not intended to be the general interface.
Using perfex, you can select the events to be counted on hardware
counters and the executable program to be run. The perfex command prints
the values of the hardware counters following the run. See the perfex(1)
man page for more information.
The ssrun command is part of the SpeedShop performance analysis package,
and it provides input to the WorkShop cvperf(1) user interface or, in
ASCII format, to the prof(1) command. See the various man pages, the
SpeedShop User's Guide, and the Developer Magic: Performance Analyzer
User's Guide for more information.
Through /proc, ioctls allow you to start or stop using the counters, to
read the counts in your own counters, or to modify the way the counters
are being used. Since this interface specifies a process ID as a
parameter, it is possible, in general, for a process to read or
manipulate the counters of another process, as long as the process
belongs to the same process group or is root.
There are also ioctls that allow the program to specify overflow
thresholds on a per-event basis and to supply a signal to be sent to the
program upon overflow. That is, the fact that an interrupt can be
generated whenever a particular counter overflows can be exploited to
allow a program to specify a threshold n for an event such that after n
occurrences of the event an interrupt will be generated. In addition,
while the kernel is servicing the counter overflow interrupt, it can
perform some user-specified action, such as sending a user-specified
signal to the program whenever an overflow is generated or incrementing a
PC bucket for profiling. The latter choice is a more specialized
functionality and is not part of the general /proc interface.
For a process using the counters in user mode, the control block for the
counters is kept in the u-area. Thus, once the process forks, the child
acquires the same state of the counters as the parent, which implies that
the next time the child runs the performance counters will be run for the
child, tracking the same events as its parent. Therefore, the counter
values are zeroed for the child upon fork so that at a later time the
child's counters will accurately depict the activity of the child. For
this reason, it is possible for the parent to fork and then wait for the
child to exit. When the child exits, if the kernel sees that the parent
is waiting for the child it will add the child's 64-bit counters to those
of the parent, and the parent will thus have the event trace of the
Page 9
PERF_COUNTERS(5) PERF_COUNTERS(5)
child. Other methods for a parent to acquire a child's counters are
discussed with the PIOCSAVECCNTRS ioctl.
Operation Modes for the Performance Counters
There are two basic modes that the counters are used in, user mode and
system mode. Using them in user mode allows the counters to be shared
among any number of user programs. In this mode the kernel saves and
restores the counts and state of the counters across context switch
boundaries. System mode is defined when a user with root privileges uses
the counters in kernel mode (user mode and/or EXL mode may also be
specified, but kernel mode is essential). In this mode there are no
context switch boundaries and so other programs will not be able to use
the counters when they are in use in system mode.
Therefore, when the counters are already in use in user mode, a program
which attempts to use them in system mode will fail with EBUSY since the
two modes cannot co-exist (unless certain commands are employed to force
releasing of the counters in user mode and the acquiring of them in
system mode- to be discussed later). Likewise, if the counters are in use
in system mode, any program attempting to use the counters will fail with
EBUSY (root-level or otherwise).
The approach taken to these two operating modes is that system mode has a
higher priority. For this reason there is a syssgi command to forcibly
acquire the counters in system mode if the counters are not actively in
use by a running program. And any users of the counters who are not
currently running will not be able to acquire them when they run again.
This latter situation holds at all times. That is, there may be several
programs sharing the counters in user mode. If at any moment they happen
to all be switched out, the counters are temporarily free. At this point
it is possible for a super-user to acquire the counters in system mode.
Then, when the other programs are run again, they won't be able to
acquire the counters since they are in use in system mode. Since this
program will then be run at this point without the intended event
counting, the kernel will arrange it such that this program will not use
the counters again, unless they are explicitly restarted. This is because
the values in the counters are no longer representative of the program.
To re-iterate, a root-level program may receive EBUSY from the kernel if
it tries to acquire the counters in system mode through /proc and they
are actively in use at the time of the system call. If they are in use in
user mode by other programs but those programs are not running at the
time of the system call, then the counters will be successfully acquired
in system mode and the other programs will not be able to acquire them
again- the kernel will not try to start up the counters for those other
programs again.
In order to make this situation visible to the program, a generation
number is employed to reflect the current state of the counters. In this
case, whenever the kernel does turn off the use of the counters for a
Page 10
PERF_COUNTERS(5) PERF_COUNTERS(5)
program because the mode of operation has switched from user mode to
system mode, the generation number for the counters for the user programs
will be increased. Thus, subsequent reads of the counters will return the
new number and should signal the program that the counter values are not
to be trusted. The number will be discussed in greater detail later.
To support using the counters in system mode, each cpu has its own
control block for the counters, pointed to in its private area. There is
also a global counter control block which maintains counter state for the
entire system. When the counters are being used in system mode they are
not read and stored across context switch boundaries. In fact, unless
they are explicitly read by a program, the counters are not read by the
kernel until there is an overflow interrupt. When this occurs the cpu on
which the interrupt occurs updates its own private virtual counters, no
changes are made to the global counter control block.
When the counters are read in system mode via PIOCGETEVCTRS through
/proc, the per-cpu counters are all added together into the global
counters so that the global counters represent the sum total of the
counted events for the entire system. This same coalescing of the per-cpu
counters happens when the counters are released. Note that it is also
possible to read a particular cpu's counters via the syssgi
HWPERF_GET_CPUCNTRS command.
/proc Commands for the Performance Counters
To support the /proc interface for the counters, there are several data
structures defined in /usr/include/sys/hwperftypes.h that are used to
either pass parameters with the calls or to receive data back from the
kernel.
struct hwperf_ctrlreg {
ushort_t hwp_ev :11, /* event counted */
hwp_ie :1, /* overflow intr enable */
hwp_mode:4; /* user/kernel/EXL */
};
typedef union {
short hwperf_spec;
struct hwperf_ctrlreg hwperf_creg;
} hwperf_ctrl_t;
typedef struct {
hwperf_ctrl_t hwp_evctrl[HWPERF_EVENTMAX];
} hwperf_eventctrl_t;
Each event is described to the kernel through an hwperf_ctrl_t. Where
relevant, the ioctls take the address of an hwperf_eventctrl_t, the array
of 32 hwperf_ctrl_t's. If the user is not interested in an event, then
Page 11
PERF_COUNTERS(5) PERF_COUNTERS(5)
care must be taken to ensure that the corresponding element in this array
is zero.
For a user to gain access to the counters, it must indicate which events
are of interest and how they are to be counted; whether overflow
thresholds are to be used to generate overflow interrupts or not, and
what those thresholds are per event; and what signal the user program
would like to receive from the kernel upon overflow interrupt. All of
this information is conveyed with the structure hwperf_profevctrarg_t:
typedef struct hwperf_profevctrarg {
hwperf_eventctrl_t hwp_evctrargs;
int hwp_ovflw_freq[HWPERF_EVENTMAX];
int hwp_ovflw_sig; /* SIGUSR1,2 */
} hwperf_profevctrarg_t;
With the above structure as parameter the user program must take care to
zero the hwp_ovflw_freq elements for which no overflow thresholds are
intended. The hwp_ovflw_sig field is used to tell the kernel which signal
the program wants to receive upon overflow interrupt. The acceptable
signals are between 1 and 32 (SIG32). This field should be zero if no
signals are wanted.
The following structure is an array of 32 64-bit virtual counters and is
used when a program wants to read the virtual counters of a process:
typedef struct {
__uint64_t hwp_evctr[HWPERF_EVENTMAX];
} hwperf_cntr_t;
The ioctls available through /proc are the following:
PIOCENEVCTRS - Start using the counters for a process, either in user
mode or system mode. It initializes the counters for the
target process and, if the process is running, starts
them. Otherwise, the counters will be started the next
time the process is run. Fails with EINVAL if events are
specified events improperly, or if an input overflow
frequency (threshold) is negative.
If supervisor or kernel mode is specified for any of
the events and the caller does not have root privileges,
it will fail with EPERM. EBUSY may be returned for two
possible reasons:
(1) the counters are already in use in system mode or,
(2) the caller is requesting the counters in system
mode and, at the time of the request, the counters are
in use in user mode, on at least one cpu (this command
Page 12
PERF_COUNTERS(5) PERF_COUNTERS(5)
will not forcibly acquire the counters for a root
process).
Returns a positive generation number if successful.
PIOCGETEVCTRS - Read the virtual counters of the target process.
The address of an hwperf_cntr_t must be supplied in
the call.
Returns a positive generation number if successful.
PIOCGETEVCTRL - Retrieve the control information for the process's
counters: which events are being counted and the mode
they are being counted in. The kernel will copyout an
array of 32 event specifiers, so the user must supply
an address of an hwperf_eventctrl_t.
Returns a positive generation number if successful.
PIOCSETEVCTRL - Modify how a program is using the counters, whether it
be events and/or their associated mode of operation, or
overflow threshold values, or overflow signal. Once the
counters have been acquired this is how their operation
for a program is modified without releasing the
counters. Each time the PIOCSETEVCTRL is made the
generation number for the target process's counters will
be incremented. The parameter to this call is the
address of an hwperf_profevctrarg_t.
Returns a positive generation number if successful.
PIOCRELEVCTRS - Release the performance counters- the target process
will not have any events counted after this call. Note
that the virtual counters associated with the target
may still be read as long as the process has not exited.
No parameters are necessary.
PIOCSAVECCNTRS - Allow a parent process to receive the counter values
of one of its children when it exits, without having to
wait for the child (when the parent is waiting no
explicit call is necessary). When the child exits its
counter values will be added to the parent's, whether
the parent is using its counters or not. No parameters
are necessary other than target pid.
An example of how these commands would be used is given here. Suppose
that we wanted to count instruction cache misses and data cache misses
for our own program. That means that we want to count event 9 for both
counters, and these events would be counted in user mode. The following
code would accomplish this. Note that the constants used are defined in
Page 13
PERF_COUNTERS(5) PERF_COUNTERS(5)
/usr/include/sys/hwperfmacros.h, and evctr_args is an
hwperf_profevctrarg_t.
pid = getpid();
sprintf(pfile, "/proc/%05d", pid);
fd = open(pfile, O_RDWR);
for (i = 0; i < HWPERF_CNTEVENTMAX; i++) {
if (i == 9) {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_mode = HWPERF_CNTEN_U;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ie = 1;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ev = i;
evctr_args.hwp_ovflw_freq[i] = 0;
} else {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_spec = 0;
evctr_args.hwp_ovflw_freq[i] = 0;
}
}
for (i = HWPERF_CNT1BASE; i < HWPERF_EVENTMAX; i++) {
if (i == 9) {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_mode = HWPERF_CNTEN_U;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ie = 1;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ev = i - HWPERF_CNT1BASE;
evctr_args.hwp_ovflw_freq[i] = 0;
} else {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_spec = 0;
evctr_args.hwp_ovflw_freq[i] = 0;
}
}
evctr_args.hwp_ovflw_sig = 0;
generation1 = ioctl(fd, PIOCENEVCTRS, (void *)&evctr_args);
if (generation1 < 0) {
perror("failed to acquire counters");
exit errno;
}
. . . . . (body of program) . . . .
/* now read the counter values */
if ((generation2 = ioctl(fd, PIOCGETEVCTRS, (void *)&cnts)) < 0) {
perror("PIOCGETEVCTRS returns error");
exit(errno);
}
/* generation number should be the same */
if (generation1 != generation2) {
printf("program lost event counters0);
Page 14
PERF_COUNTERS(5) PERF_COUNTERS(5)
exit 0;
}
/* release the counters */
if ((ioctl(fd, PIOCRELEVCTRS)) < 0) {
perror("prioctl PIOCRELEVCTRS returns error");
exit(errno);
}
/* print out the counts */
printf("instruction cache misses: %d/0, cnts.hwp_evctr[9]);
printf("data cache misses: %d/0, cnts.hwp_evctr[25]);
exit 0;
Syssgi Commands for the Performance Counters
The syssgi commands that access the event counters are not intended for
general use. Rather, specialized commands are implemented through this
interface. Note that all the commands are the first argument to the
syssgi command SGI_EVENTCTR. The available commands are:
HWPERF_PROFENABLE - Enable sprofil-like profiling using the
performance counters rather than the clock.
Returns EINVAL on incorrect input, or EBUSY
if the counters are already in use in system
mode. The second argument to this command is
the address of an hwperf_profevctrarg_t, the
argument is a profp, the fourth is the profcnt,
both referring to input necessary for profiling.
Returns a positive generation number if
successful.
HWPERF_ENSYSCNTRS - Forcibly acquire the counters in system mode.
ROOT PERMISSIONS ARE REQUIRED FOR THIS COMMAND.
Note that the counters must be set up in kernel
mode (usr and EXL may be included, but kernel mode
is required), EINVAL will be returned otherwise.
That is, at least one of the events must be
counted in kernel mode. Will fail with EBUSY if
the counters are already in use in system mode.
Otherwise, the command is guaranteed to return
the counters in system mode. Starts up the
counters on all the cpus, with all the cpus
counting the same events.
Page 15
PERF_COUNTERS(5) PERF_COUNTERS(5)
Takes as input (third parameter of syssgi call)
the address of an hwperf_profevctrarg_t, which
is set up just as it is for the PIOCENEVENTCTRS
(see example above).
Returns a positive generation number if
successful.
HWPERF_GET_SYSCNTRS - Read the global system counters to get the global
event counts. All of the per-cpu counters will be
aggregated into the global counters and the
results will be returned to the caller. Caller
must supply in third argument the address of
an hwperf_cntr_t.
Returns a positive generation number if
successful.
HWPERF_GET_CPUCNTRS - Read a particular cpu's event counters. The third
parameter is a cpuid, the fourth is the address
of an hwperf_cntr_t.
Returns a positive generation number if
successful, 0 otherwise (which would indicate
an invalid cpuid.)
HWPERF_GET_SYSEVCTRL - Retrieve the control information for the systems
event counters: which events are being counted
and the modes they are being counted in. The third
parameter must be the address of an
hwperf_eventctrl_t. Returns EINVAL if the counters
are not in use.
Returns a positive generation number if
successful.
HWPERF_SET_SYSEVCTRL - Modify how the system counters are operating,
whether it be events being counted and/or their
associated mode of operation, or overflow
threshold values, or overflow signal.
MUST BE ROOT TO ISSUE THIS COMMAND, or else EPERM
will be returned.
Once the counters have been acquired this is how
their operation is modified without releasing
them. Each time the system call
syssgi(SGI_EVENTCTR, HWPERF_SET_SYSEVCTRL,...)
is issued the generation number for the system's
counters is incremented. The third parameter to
Page 16
PERF_COUNTERS(5) PERF_COUNTERS(5)
this call is the address of an
hwperf_profevctrarg_t.
Returns a positive generation number if
successful.
HWPERF_RELSYSCNTRS - Stop using the counters in system mode and to
make the counters available again.
ROOT PERMISSION REQUIRED.
Returns 0 upon success.
The following list, ordered by events traced, details revision 3 of the
R10000 CPU counters that return information different from the
R12000/R14000 CPU counters. If an event is not listed here, it is the
same on both CPU types.
Event R10000 R12000/R14000
1 Issued instructions Decoded instructions
2 Issued loads Decoded loads
3 Issued stores Decoded stores
4 Issued store conditionals Decoded store conditionals
16 Cycles
17 Graduated instructions Data cache misses
30 Store/fetch exclusive to clean MHT entries
/usr/include/sys/hwperftypes.h
/usr/include/sys/hwperfmacros.h
ecadmin(1M), ecstats(1M), perfex(1M), libperfex(3C), and libperfex(3F).
PPPPaaaaggggeeee 11117777 [ Back ]
|