PROFILER(1M) PROFILER(1M)
profiler: prfld, prfstat, prfdc, prfsnap, prfpr - UNIX system profiler
prfld [ system_namelist ]
prfstat [ range [ domain ] ]
prfdc file [ period [ off_hour ] ]
prfsnap file
prfpr file [ cutoff [ system_namelist ] ]
Prfld, prfstat, prfdc, prfsnap, and prfpr form a system of programs to
facilitate an activity study of the UNIX operating system.
Prfld is used to initialize the recording mechanism in the system. It
generates a table containing the starting address of each system
subroutine as extracted from system_namelist.
Prfstat is used to enable or disable the sampling mechanism. The range
parameter selects what values will be sampled at the sampling points.
The current choices for range are pc to select PC-style sampling, stack
to sample stack backtraces, and off to disable profile sampling. The
domain parameter selects when the sample values will be collected and
defaults to time which uses a 1ms sampling clock. The current choices
for domain are:
domain description
______________________________________________________________
time time
switch context switches
ipl non-zero interrupt priority level
cycles instruction cycles
dcache1 primary data cache misses
dcache2 secondary data cache misses
icache1 primary instruction cache misses
icache2 secondary instruction cache misses
scfail failed store conditional instructions
brmiss mispredicted branch instructions
upgclean exclusive upgrades on clean secondary cache lines
upgshared exclusive upgrades on shared secondary cache lines
All platforms support the time, switch, and ipl domains but only
platforms based on the R10K CPU and its successors support the other
domains. Samples which occur while executing user code will be
attributed to the synthetic function user_code.
The time and cycles domains produce time-based samplings but are
different. The cycles domain can be useful when you believe that the
activity of the kernel may be correlated with the time domain sampling.
Such correlations can occur when application activity is triggered by
clock timeouts, etc.
Page 1
PROFILER(1M) PROFILER(1M)
The switch domain allows profiling to be done in performance situations
where MP contention is causing processes to be constantly descheduled
resulting in an idle system. Trying to profile such a problem in the
time domain would show most of the system's time being spent under the
kernel idle() routine with a smattering of time elsewhere - basically not
very useful. Profiling in the switch domain allows you to determine what
the common code paths are leading up to the context switch.
The ipl domain is a special subset of the time domain. It produces a
time-based sampling but only those samples which occur when the interrupt
priority level is non-zero are taken. All other samples are attributed
to user_code or low_ipl depending on whether the interrupt occurred while
executing user code or executing kernel code at IPL0, respectively. This
allows one to rapidly find where interrupts are being held off by code
holding non-zero interrupt priority levels.
For PC sampling, profiler overhead is less than 1% as calculated for 500
text addresses. For stack sampling profiling overhead is less than 10%
of run time. Without any arguments, prfstat will display the current
sampling mode. Prfstat will also reveal the number of text addresses
being measured.
Prfdc and prfsnap perform the PC sampling data collection function of the
profiler by copying the current value of all the text address counters to
a file where the data can be analyzed. Prfdc will store the counters
into file every period minutes and will turn off at off_hour (valid
values for off_hour are 0-24). Prfsnap collects data at the time of
invocation only, appending the counter values to file.
Prfpr formats the data collected by prfdc or prfsnap. Each text address
is converted to the nearest text symbol (as found in system_namelist) and
is printed if the percent activity for that range is greater than cutoff.
cutoff may be given as a floating-point number >= 0.01. If cutoff is
zero, then all samples collected are printed, even if their percentage is
less than 0.01%.
For stack sampling, the SpeedShop kernprof(1) special executable and the
rtmond(1M) kernel data transport are used to collect the stack trace
data. This data can then be analyzed with SpeedShop tools like prof(1)
to produce a performance profile which provides far more information than
that offered by PC sampling. The data may be collected on the machine
being profiled or an any machine that can be reached via the network.
See kernprof(1) for a description of all the options it supports.
PC sampling:
# prfld
# prfstat pc
PC profiling enabled
9055 kernel text addresses
# prfsnap /tmp/P;find /usr/bin -name xxx -print; prfsnap /tmp/P
Page 2
PROFILER(1M) PROFILER(1M)
# prfpr /tmp/P .3
IRIX anchor 6.2 03131015 IP22
03/17/96 20:36
03/17/96 20:36
CPU 0 - 1253 total samples; cutoff 0.300000
wait_for_interrupt 51.1572
bzero 0.4789
bcopy 0.4789
get_buf 1.4366
bflush 0.3990
syscall 0.3192
idle 37.1907
dnlc_search 0.3192
efs_dirlookup 0.3192
iget 0.3192
user 0.7981
Total 93.22
# prfstat off
profiling disabled
9055 kernel text addresses
Stack sampling on machine alpha and collecting data on machine beta:
alpha# prfld <kernel-file>
alpha# prfstat stack
STACK profiling enabled
9055 kernel text addresses
alpha# /usr/etc/rtmond
...
beta% ssrun -usertime /usr/bin/kernprof -t 5 -p 0 alpha
beta% prof -gprof <alpha's-kernel> kernprof.usertime.<pid>.cpu0
(This assumes that rtmond(1M) is not chkconfig(1M)'ed on on the machine
alpha and thus needs to be started manually.)
/dev/prf interface to profile data and text addresses
/unix default for system namelist file
PPPPaaaaggggeeee 3333 [ Back ]
|