From TAU Wiki

Revision as of 16:17, 22 April 2013; Scottb (Talk | contribs)
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Jump to: navigation, search


Configure TAU with:

./configure -cuda=<path to cuda toolkit> -bfd=download


./configure -opencl=<opencl headaers/libaries> -bfd=download

(along with any other options you would normally give to TAU.)


make install

Add <arch>/bin to your path and add <arch>/lib to your LD_LIBRARY_PATH.

Now to collect performance data run your application with tau_exec giving either the option '-cupti' (for CUDA applications) or '-opencl' for OpenCL applications.

tau_exec -T serial,cupti <-cupti|-opencl> ./a.out

MPI applications can be run like this:

mpirun -np 4 tau_exec -T mpi,cupti <-cupti|-opencl> ./a.out

(For CUDA version < 4.1 use -cuda instead of -cupti.)

For traces type:

export TAU_TRACE=1

before the tau_exec command.

And post-process the trace files by doing:

tau2slog2 tau.trc tau.edf -o tau.slog2

Viewing data

To view profiles type:


To view slog2 traces type:

jumpshot tau.slog2

CUPTI Counters

The CUPTI counters available for a given machine can assessed by typing:


Set the counters you wish to collect by exporting them as a colon separated list to the TAU_METRICS variable. ex:

export TAU_METRICS=CUDA.GeForce_GT_240.domain_b.instructions

Then run the application with tau_exec.

PGI OpenACC compiler

PGI uses the driver API to generate CUDA code for its accelerated regions so you need to set:

export TAU_CUPTI_API=driver

before running a PGI OpenACC application.

Personal tools