Manual Page Result
0
Command: uprofile | Section: 1 | Source: Digital UNIX | File: uprofile.1.gz
uprofile(1) General Commands Manual uprofile(1)
NAME
uprofile, kprofile - Profile a program (uprofile) or kernel (kprofile)
with Alpha on-chip performance counters
SYNOPSIS
uprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all|-each|-one]
[-stride n] [-display|prof-option...]
[statistic...] program [argument...]
kprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all|-each|-one]
[-stride n] [-display|prof-option...] [-k kernel_name] [-t]
[statistic...] [program [argument...]]
DESCRIPTION
The uprofile command uses the Alpha on-chip performance counters to
produce a finely-grained program-counter profile of a user program.
The command runs the program you specify with the arguments you spec-
ify, collecting the selected statistics on the program's process and
its descendants. It writes the profile data to the umon.out file, by
default. If the program calls shared libraries, those libraries are not
profiled.
The kprofile command uses the Alpha on-chip performance counters to
produce a detailed program-counter profile of the kernel. If you
specify a program, kprofile runs the program with the arguments you
specify, and it collects the selected statistics on the kernel for the
duration of the program's execution. If you do not specify a program,
kprofile collects the selected statistics on the kernel until you enter
Ctrl/C. It writes the profile data to the kmon.out file, by default.
If you specify -display or any of the prof-options, the uprofile and
kprofile commands display the profile by runnning the prof tool (with
any specified prof-options).
You can also run the prof command separately, to help analyze the data
in the umon.out or kmon.out file. The following examples show how to
invoke the prof command to analyze data in the respective files: % prof
a.out umon.out % prof /vmunix kmon.out
PARAMETERS
The name of an event that your particular Alpha hardware can profile,
as detailed in the STATISTICS section, below. If no statistic is named,
machine cycles are counted, giving a CPU-time profile. One statistic
can be specified for each of the hardware counters on your machine.
The name of the executable to run while profiling operations are being
performed. An argument to pass to the program that is run. Multiple
arguments can be specified, as needed by the program.
OPTIONS
Options can be abbreviated to three characters, except the prof-op-
tions, which can be abbreviated (usually to one character) as in a prof
command. For example, -qui is interpreted as quiet, but -q is inter-
preted as -quit. (See the -display option for the supported prof-op-
tions.)
For options that specify a procedure name (proc), C++ procedures can
omit the argument type list, though this will match all overloaded pro-
cedures with that name. To select a specific procedure, specify the
full symbol name (as printed by the nm command). Symbol names contain-
ing spaces, *, and so on must be quoted. Engages verbose mode, which
prints some useful information about the program being profiled. Pre-
vents informational and progress messages from being printed. Speci-
fies the directory path in which the profiling data file or files are
created. [Disables] or enables the addition of the process-id number
to the name of the profiling data file or files. Specifies which mode
to use for profiling on multi-processor machines. Using the -all op-
tion (the default) aggregates the data for all CPUs into one umon.out
file. Using the -each option collects separate profiles for each CPU
and writes the output into a set of files named umon.out.n, where n is
the CPU number. Using the -one option profiles only the current CPU.
For the -one option to work, the uprofile or kprofile program must be
run using the runon command. Sets the granularity of the sample
counts, where n is the number of consecutive instructions grouped to-
gether for each sample count. The default is -stride 4. The -asm,
-heavy, and -lines prof-options need a separate sample count for each
instruction (for their reports to be precise enough), so these options
imply -stride 1. This makes the output file four times bigger than the
default size. The -stride argument must be a power of two (for exam-
ple, 1, 2, 4, 8). Overrides the name of the kernel to profile. (The
default is the booted kernel.) Enables triggered mode for kprofile.
This option sets up all required information for running the perfor-
mance counters, but does not invoke them. See the STATISTICS section
for additional information. Runs prof on the resulting profile data
file(s). The following prof options are supported: Reports the profile
as an annotated disassembly. Excludes procedure proc and its descen-
dants from the profile, but totals all procedures. Excludes procedure
proc and its descendants from the profile and from the total. Reports
the lines that executed the most instructions. Reports the profile per
source line within each procedure. Merges all profile data files into
file. Prints each procedure's starting line number. Includes only
procedure proc in the profile, but totals all procedures. Includes
only procedure proc in the profile and in the total. Profiles the in-
structions executed in each procedure and the calls to procedures.
Truncates the reports after n lines or after (cumulative) n percent of
the whole.
STATISTICS
You specify the statistics that you want to collect for the program be-
ing profiled in one or more statistic parameters.
If you specify multiple statistics, uprofile and kprofile accumulate
their results. You cannot then view the results of any single statis-
tic separately. Because collected data is merged into a single buffer,
interpretation of multiply collected statistics may be difficult.
The Alpha architecture implemented on your machine determines which
statistics can be collected and the number of counters available for
collecting multiple statistics at the same time. The implementation is
indicated by the Alpha chip number, which can be displayed with the
show config console command before booting Digital UNIX, or, after
booting, by using the psrinfo -v command, or by calling getsysinfo
(GSI_PROC_TYPE). Also, if the uprofile command is run without argu-
ments, it will show how many counters and what statistics are available
on your machine.
All of the chips in the EV4 chip set (21064 [EV4], 21064A [EV45],
21066/21068 [LCA4]) have two performance counter registers, each of
which can be separately programmed. The statistics that each counter
can collect are shown in the following table: tab(@); lfHB cfHB l l .
_
Counter0Stats@Counter1Stats
_
0disabled@1disabled issues@dcache pipedry@icache loads@dualissues
pipefrozen@mispredicts branches@floatops cycles@intops PALcycles@stores
nonissues@novictims victims@
_
All of the chips in the EV5 chip set (21164 [EV5], 21164A [EV56], and
21164PC [PCA56]) have three performance counter registers, each of
which can be separately programmed. Some of the counters are common to
all EV5 implementations, some are specific to EV5 and EV56, and some
are specific to PCA56.
The statistics that each of the common EV5 counters can collect are
shown in the following table: tab(@); lfHB lfHB lfHB l l l . _
Counter0Stats@Counter1Stats@Counter2Stats
_
0disabled@1disabled@2disabled cycles0@nonissues@longstalls issues@spli-
tissue@pcmispredicts @pipedry@branchmispredicts @replay@icachemisses
@singleissues@itbmisses @dualissues@dcacheldmisses @tripleissues@dtb-
misses @quadissues@ldsmerged @flowchanges@ldureplays @intops@fullre-
plays @floatops@externalinput @loads@cycles2 @stores@memorybarriers
@icacheacc@lockedloads @dcacheacc
_
The statistics that each of the EV5- and EV56-specific counters can
collect are show in the following table: tab(@); lfHB lfHB l l . _
Counter1Stats@Counter2Stats
_
scacheacc@scachemisses scachereads@scachereadmisses
scachewrites1@scachewritemisses scachevictim@scachesharedwrites
bcacheref@scachewrites2 bcachevictim@bcachemisses sysreqs@systeminvali-
dates @systemreadrequests
_
The statistics that each of the PCA56-specific counters can collect are
shown in the following table: tab(@); lfHB lfHB l l . _
Counter1Stats@Counter2Stats
_
bcachereads@bcachedreads bcachedreadhits@bcachereadhits bcachedread-
fills@bcachereadfills bcachewrites@bcachewritehits bcachecleanwrite-
hits@bcachewritefills bcachevictims@sysreadflushhits readmisstwo@sys-
readflushmisses @readmissthree
_
All of the chips in the EV6 chip set have two performance counter reg-
isters, each of which can be separately programmed. The statistics
that each of the EV6-specific counters can collect are shown in the
following table: tab(@); lfHB cfHB l l . _
Counter0Stats@Counter1Stats
_
0disabled@1disabled cycles0@cycles1 retinst@retcondbranch @retbranch-
miss @retdtb1miss @retdtb2miss @retitbmiss @retunaltrap @replay
_
The default is to gather cycle statistics in the 0th counter and to
disable other counters.
For descriptions of the statistics for all EV4, EV5, and EV6 implemen-
tations, refer to pfm(7).
You can disable any counter by specifying 0disabled, 1disabled, or
2disabled as the counter statistic. You can use this feature to iso-
late specific event types, such as loads, without extraneous data being
generated. You cannot disable all counters at the same time, choose
two statistics for the same counter, or disable a counter once its sta-
tistic is specified.
When you specify no counter statistics, uprofile and kprofile count cy-
cles on counter 0 by default, and display (through prof) a profile in
terms of seconds used by each procedure in the program, except for any
shared libraries.
For noncycle statistics, the displayed profile shows the number of sam-
ples recorded, the sampling interval (events per second), and the total
number of events that this implies. Most non-cycle statistics of the
EV5 family CPUs are recorded about six cycles after the instruction
that triggered the sample. So, when using prof's -asm or -lines option,
the samples should be associated with one of the previously exectuted
few instructions of lines. The icacheacc, icachemisses, and dtbmisses
statistics are usually attributed precisely.
Because EV6 is an out-of-order machine, precise attribution is much
more difficult.
To perform a detailed analysis of short sections of kernel code, use
the kprofile command with triggered mode (invoked with the -t option).
When you use this mode, kprofile performs all of the required setup for
enabling the counters as normal, but does not invoke them. You can in-
sert counter start or stop commands into the kernel code to be instru-
mented as follows: Turn counters on: wrperfmon (PFOPT, 1) Turn coun-
ters off: wrperfmon (0) You can turn the counters on and off repeatedly
to collect data over many iterations or multiple sections of code.
The macro PFOPT is defined in <sys/pfcntr.h>.
NOTES
The interrupt load that profiling places on the system may affect per-
formance, but usually the effect is insignificant.
The kernel in use must have the pfm pseudo-device configured into it.
To do this, add the following line to the kernel configuration file and
rebuild the kernel:
pseudo-device pfm
The format of the data files produced by uprofile changed in Digital
UNIX V4.0 to support improved profile display in terms of the selected
statistics. To convert the data files to the industry-standard format,
at the expense of losing the names of the statistics, use the pdtostd
command.
RESTRICTIONS
The victim and novictim statistics rely on the external performance
counter pin connections as described in the EV4 chip specification.
The DEC 3000/400, /500, /600, and /800 workstations have these connec-
tions. Attempts to display either of these statistics on other plat-
forms (while allowed) will typically generate empty data.
The uprofile command is only supported on EV4 Pass 3 or later proces-
sors. Attempts to use it on a Pass 2 processor will gather PC samples
for every process running on the system.
Using kprofile to generate statistics for a single command is only pos-
sible on EV4 Pass 3 or later processors. Attempts to do this on a Pass
2 processor will gather statistics for the entire system, as if no com-
mand had been specified.
Using kprofile with triggered mode also requires an EV4 Pass 3 or later
processor and cannot be performed with per-process monitoring.
FILES
The performance counter device file. The statistics file(s) generated
by uprofile. The statistics file(s) generated by kprofile. The sta-
tistics file(s) generated with the -pids option. The default kernel to
profile.
SEE ALSO
prof_intro(1), pdtostd(1), pfm(7), prof(1), runon(1), psrinfo(1),
sysconfi(8), autosysconfig(8)
Programmer's Guide delim off
uprofile(1)