Manual Page Result
0
Command: hiprof | Section: 5 | Source: Digital UNIX | File: hiprof.5.gz
hiprof(5) File Formats Manual hiprof(5)
NAME
hiprof - Hierarchical instruction profiler
SYNOPSIS
atom appl_prog -tool hiprof [-env threads] [-toolargs="arg1 arg2..."]
[atom_options...]
PARAMETERS
File name of a fully linked shared or nonshared executable to be pro-
filed. This program should be compiled with the -g1, -g2, or -g3 op-
tion to obtain more complete profiling information. If the default
symbol table level (-g0) has been used, line number information, static
procedure names, and file names are unavailable to the profiler.
OPTIONS
Identifies the hiprof tool to atom. Specifies that the hiprof tool is
being invoked on an application that runs in a threaded environment. To
make run-time analysis of an application threadsafe, you must specify
-env threads in the hiprof command. Only POSIX threads created using
the pthread_create function are supported.
The threadsafe instrumented executable is named
appl_prog.hiprof.threads by default. You may omit the -env
threads option if the application does not create threads; in
this case the instrumented executable is named appl_prog.hiprof.
Passes arguments (listed below, in this section) to the hiprof
tool's instrumentation routines. Use whitespace characters to
separate arguments from their parameters (if any) and from other
arguments.
If you need to represent spaces within argument parameters (such
as within a parameter to the -exc argument), use matching sin-
gle-quotes or matching double-quotes, making sure that you avoid
having the shell interpret those characters as shell-special
characters. For example:
-toolargs="-exc 'strstreambase::strstreambase(char*, \
int, char*)'"
-toolargs='-exc "operator -" -exc "ostream::operator \
<<" -exc main -exc "operator new(unsigned long)"' Specifies op-
tions to the atom command. See the atom(1) reference page for
descriptions of other options accepted by the atom command, such
as those that enable instrumentation of shared libraries, spec-
ify the names of instrumented objects, and request debugging in-
formation.
After you have instrumented an application that uses libc.so,
libpthread.so, or other shared libraries, you must set the
LD_LIBRARY_PATH environment variable to point to the directory
containing the instrumented shared libraries. Typically, this
would be the current directory or the directory specified by the
-shlibdir option. (You may leave LD_LIBRARY_PATH pointing to
this directory while running other, uninstrumented applica-
tions.)
The hiprof tool allows the following options to be passed in the
-toolargs option for use by the hiprof tool's instrumentation routine
when instrumenting appl_prog. Except where noted, these options can
also be passed to the instrumented program at execution time by being
defined as part of the HIPROF_ARGS environment variable. Causes hiprof
to apply more precise, pthread-dependent profiling process-wide. This
style of profiling measures the cost of calls during each call. By de-
fault, hiprof uses threadsafe, pthread-independent profiling, which
shows the cost of calls proportional to the number of calls. This op-
tion cannot be defined as part of the HIPROF_ARGS environment variable.
Causes hiprof to use CPU time obtained from the hardware cycle counter
rather than from instruction counts. This option cannot be defined as
part of the HIPROF_ARGS environment variable. Specifies the directory
path in which hiprof creates the profile files. The path specified with
-dirname is pre-pended to the path and filename specified with -hiout,
if any. See Specifying Profile File Names and Locations. Excludes time
spent in procname from the profile. This switch can be used multiple
times to exclude multiple procedures. To represent all of the varia-
tions of an overloaded C++ function name, you can specify just the part
of the name up to but not including the "(". Invokes a simpler heuris-
tic for mapping recursion into a hierarchical report when used with the
-calltime, -cputime, or -pagefaults option. Indicates that a call-
shared program forks. You must specify the -fork option if libc.so is
not being fully instrumented and the call-shared program being instru-
mented makes a fork or vfork system call. When the -fork option is
specified, each child process produces a separate profiling data file
(or possibly several if the -threads option is also specified) unless
it makes an exec system call. A profile generated from all of the pro-
filing data files represents the behavior of the parent process and its
children; a profile generated from any single profiling data file rep-
resents the single process or thread associated with that file. Speci-
fies a name and, optionally, a directory path for the profile file. The
filename specified overrides the default appl_prog portion of the pro-
file filename. Any directory path specified with -dirname is pre-pended
to filename. See Specifying Profile File Names and Locations. Disables
use of a trace buffer for -cputime. This is useful for studying the
performance of hiprof. This option cannot be defined as part of the
HIPROF_ARGS environment variable. Excludes user execution time from
the profile. This option cannot be defined as part of the HIPROF_ARGS
environment variable. Include (or not include) the process ID of the
process running the program in the name of the hiprof profile file pro-
duced by the instrumented application. Measures pagefaults instead of
program execution time. Only works for nonthreaded programs. This op-
tion cannot be defined as part of the HIPROF_ARGS environment variable.
Causes the process running the instrumented application to catch the
signal indicated by sig (see signal(4)). When it receives that signal,
the process writes the current profiling data to the output file,
reinitializes the profile by setting the execution time to zero, and
resumes execution. Incorporates cycle counter estimates of system time
into instruction count estimates of user time when used with the -call-
time option. When used with the -calltime or -cputime options (and
-env threads is specified on the atom command line), causes hiprof to
apply more localized, pthread-dependent profiling to each individual
thread in the process. Otherwise, hiprof provides process-wide profil-
ing for the modes enabled by these options. When used with the -call-
time, -cputime, or -pagefaults options, produces a text-format profile
file instead of a binary profiling data file. This file is similar to
the output from gprof, although it cannot be combined or filtered. It
also contains additional statistics on the instrumentation that has
been used on appl_prog. By default, the profile file contains binary
data that the gprof utility can combine with other profiles and filter,
prior to generating a report.
When -textout is specified with -env threads, each thread is in-
dividually profiled, as if -threads had also been specified.
While the instrumented appl_prog is being executed, options specified
in the definition of the HIPROF_ARGS environment variable override any
corresponding settings in the -toolargs options. For example:
% setenv HIPROF_ARGS "-dirname /tmp/profiles -pids"
DESCRIPTION
The hiprof tool is an Atom-based program profiling tool that produces
both flat and hierarchical profiles. The flat profile shows the execu-
tion time spent in any given procedure. The hierarchical profile shows
the time spent in a given procedure and all its descendents. The hier-
archical profile enables the user to answer questions of the form "How
much time is spent in printf() and all procedures called by printf()?".
The hiprof tool's output is similar to that generated by the -pg option
of the cc command. However, hiprof uses code instrumentation rather
than PC-sampling to gather statistics. The gprof command is usually
used to filter and merge output files and to format profile reports.
The hiprof tool generates an instrumented version of appl_prog. The
instrumented program behaves identically to the original except that it
writes out an execution profile after it is done.
If you are instrumenting a shared-library program, you will probably
need to set the LD_LIBRARY_PATH environment variable (see atom(1) for
more information).
Multiple profile files can be created by a single program run because a
separate profile can optionally be generated for each thread of each
process. Nonthreaded programs are treated as programs with just one
thread.
Specifying Profile File Names and Locations
By default, the profile file is created in the current directory and
its name has the following form:
appl_prog.pid.tid.hiout
The pid (process ID) portion of the filename appears only if you spec-
ify the -pids option using either the atom command's -toolargs option
or the HIPROF_ARGS environment variable. The tid (thread ID) portion
appears only if you specify both -env threads on the atom command line
and -threads in either the atom command's -toolargs option or the
HIPROF_ARGS environment variable.
You can specify that the file be created in another directory by using
the -dirname option.
You can specify a different name (including a directory path) for the
appl_prog portion of the filename by using the -hiout option. For ex-
ample, the following -toolargs entry in the atom command line:
-toolargs="-hiout /test/file1"
causes the profile filename to have the form /test/file1.pid.tid.hiout
Any directory path specified with -dirname is pre-pended to the direc-
tory path and filename specified with -hiout, if any.
Resetting the Profile
It is sometimes useful to start profiling part way into the execution
of a program. For example, a user may wish to omit program initializa-
tion from the profile. Also, it is sometimes useful to force the pro-
gram to print its profile even before it has finished executing. For
example, a user might wish to extract the profile of a running file
server. The hiprof tool provides a mechanism to do these things.
If you specify the -sigdump option in the atom command line or define
the -sigdump option in the HIPROF_ARGS environment variable, the speci-
fied signal will be caught by the process. When it receives that sig-
nal, the process writes the current profiling data to the output file,
reinitializes the profile by setting the execution time to zero, and
resumes execution.
The process can be signaled any number of times during its execution.
If you do not specify the -textout option in the atom command line or
define it in the HIPROF_ARGS environment variable (that is, when you
are producing binary profile files for gprof), each signal causes the
process to overwrite any existing file.
If you do specify the -textout option (that is, when you are producing
text-format profile files), the output file will contain two sets of
profile data when the process completes execution: From the beginning
of the program to the point at which the signal was received From the
point each signal was received to the end of the program
For example:
setenv HIPROF_ARGS "-sigdump USR1" application_program.hiprof
& <wait until the desired time> kill -USR1 pid
User Time Estimates
The hiprof tool provides two different ways of estimating user execu-
tion time: instruction counts and the cycle counter. By default, the
hiprof tool estimates execution time by counting the number of user-
level instructions executed. However, if the -cputime option is speci-
fied during instrumentation (that is, to the -toolargs option in the
atom command line), CPU time is estimated using the hardware cycle
counter. This involves looking at the value of the hardware cycle
counter before and after a procedure call to determine the time spent
in the procedure.
The advantage of instruction counts is that they are repeatable and are
unaffected by the presence of the instrumentation code. If a program
is run twice with identical inputs, the instruction counts for both
runs will be identical. The disadvantage of instruction counts is that
they do not account for various second-order effects (cache misses, TLB
misses, and pipeline stalls) which degrade the execution time of a real
program.
The advantage of using the cycle counter is that the effects of cache
misses, TLB misses, and pipeline stalls are accounted for. The disad-
vantage is that the presence of the instrumentation code can degrade
the performance of the cache and TLB seen by the application. If an ap-
plication procedure is short (100 or so instructions), then times re-
ported for both the short procedure and the procedure calling the short
procedure can be unrealistically pessimistic. If a significant frac-
tion of an application's time is spent in a short procedure, it may be
better not to instrument that procedure at all. To exclude procedure
procname from instrumentation, you can specify the -exc procname option
in the atom command line or define it in the HIPROF_ARGS environment
variable. If a procedure is not instrumented, its run time is charged
to its parent and all calls made by the procedure appear to be made by
the parent.
System Time Estimates
By default, the hiprof tool uses instruction counts and omits system
time from its estimates of execution time. However, passing the
-cputime option in the -toolargs option to hiprof's instrumentation
routine causes the instrumentation routine to use the hardware cycle
counter to measure both user and system CPU time. If you specify the
-calltime option to the -toolargs option on the atom command line, you
can specify the -systime option (either in -toolargs or in the
HIPROF_ARGS environment variable) to incorporate cycle counter esti-
mates of system time into instruction count estimates of user time. You
can exclude user execution time from the profile by using the -nouser
option in the -toolargs option at instrumentation time.
Multiple Processes and Threads
When a program calls fork, an additional output file is created for the
new child process. The child's output file reports only the execution
time used by the child process following the fork. The parent's output
file reports the execution time of the parent process both before and
after the fork. Similarly, when a threaded application creates a new
thread, a separate profile is created for that thread.
If a process calls exec and the exec succeeds, then all execution time
statistics from the creation of the process up to the exec are lost.
This occurs because the profile statistics are lost when the exec over-
writes the address space. For the most part, this is not a problem be-
cause calls to exec are usually immediately preceded by a fork. If the
program being invoked by the exec call is instrumented, then the execu-
tion time of the process following the exec is reported in that new
program's output file.
Recursion
Recursion causes complications for hierarchical profilers because the
call graph is not a tree. The hiprof tool uses a heuristic to map the
times from a cyclic graph to a hierarchical report. While the applica-
tion runs, hiprof dynamically detects edges that close cycles in the
call graph. Then, hiprof breaks the cycle by stopping the clock for all
edges in the cycle. Edges that close cycles in the call graph are
marked in the text-format report (generated when the -textout option is
specified in the -toolargs option or in the HIPROF_ARGS environment
variable) with a '+' character and will have zero time assigned to
them.
Although the above heuristic produces the most intuitive reports, it
can be inefficient for some programs that are highly recursive. A sim-
pler algorithm can be invoked by including -fastrecur in the -toolargs
option to the atom command line or in the definition of the HIPROF_ARGS
environment variable. In the simpler algorithm, the clock is stopped
only for the edge closing the cycle. All of the other edges in the cy-
cle continue to accumulate time -- with the result that the sum of the
times of the edges leaving a node can sum to more than the execution
time of the program.
Algorithm
Although hiprof's output format was modeled after gprof's PC-sampling
format, its algorithms (except in the default mode) are different. A
couple of improvements result. For example, the amount of time spent by
a child procedure on behalf of its parent is measured rather than esti-
mated, as it is in PC-sampling. Unlike profilers based on pixie, both
the source and destination of indirect calls can be reported.
The hiprof tool dynamically constructs the procedure call graph during
the execution of the program. This allows the profiler to handle indi-
rect calls that would otherwise be ambiguous from a static analysis of
the program. Nodes in the graph represent procedures, and arcs between
nodes represent procedure calls. During the execution of the program,
the profiler maintains a model of the procedure call stack. When a
procedure is called, the profiler pushes the identity of the called
procedure and the time of the call onto its stack. When a procedure
returns, the profiler pops the top entry off its simulated stack. The
difference in the times of the call and return gives the time spent in
the called procedure and all of its descendents.
A test is performed by the algorithm to avoid double counting times
when a recursion occurs. If multiple calls to the same procedure are
outstanding simultaneously, the profiler only times the first call.
FILES
Default name for instrumented version of appl_prog Default name of pro-
file output file
BUGS
If the cycle counter is used to measure the execution time of a proce-
dure and the procedure call executes more than 2^32 cycles without mak-
ing another procedure call, the reported execution time for that proce-
dure will be too small because the wraparound of the 32-bit cycle
counter is not detected. Wraparound may also occur if not all proce-
dures or shared libraries are profiled. Consequently, when you specify
the -cputime option, you should also specify the -all option.
SEE ALSO
atom(1), hiprof(1), gprof(1), cc(1), dxprof(1). (dxprof(1) is avail-
able only if the Graphical Program Analysis Tools subset is installed
on your system.)
Programmer's Guide
hiprof(5)