Openacc

From TAU Wiki

Jump to: navigation, search


Contents


Matrix Multiply

TAU v 2.25.1 has support for the OpenACC directives available in PGI 12.3 and greater. TAU provides instrumentation at the PGI runtime library layer with detailed source information. This simple matrix multiply application written with OpenACC annotations was compiled with the PGI -ta=nvidia flag to generate the executable. To use TAU to profile this application, you may:

Configure TAU:

./configure -c++=pgCC -cc=pgcc -fortran=pgi
make install
export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-pgi

Compile

make

Run:

tau_exec -T pgi -openacc ./mm

Use TAU's analysis tools to view the performance data:

pprof
paraprof


Here we see the time spent in the PGI runtime library routines. The download time for variable a in the source code dominates the execution. We can see the nature of each operation in parenthesis.


Next, this data is presented in ParaProf's thread statistics window.


The driver code.

By clicking on a runtime layer routine, we can see the function in the application where the kernel was invoked along with the associated variable, source line number as well as the size of the array. By right clicking and choosing the 'Show Source Code' window, we can see the source line where this transfer takes place. For the downloadxx_multiply_matrices routine with the variable 'a', the time is attributed on the host at the source location shown below. It represents the transfer time and the time spent waiting on the host for results to be returned from the GPU.

OpenACC example source code

Matrix Multiply using the OpenACC directives and the Makefile to run with TAU.

Image:Mm2.f90

Image:Mmdriv.f90

Image:Makefile

Image:Mm openacc.ppk

Personal tools