From TAU Wiki

Jump to: navigation, search



Link Code Version Machine Date
LLNL website git repo Kyle Spafford fork Keeneland March 2012

These instructions can also be used for CoMD

Building Cruft

For OpenCL:

export OPENCL_INCLUDE_DIR=<path to OpenCL include dir>

Modify the CmakeLists.txt and add these lines:


Then issue

cmake .

You can safety proceed when you encounter reversions.

Selective instrumentation of Loops:


loops file="eam.c" routine="eamForce#"
loops file="ljForce.c" routine="LJ#"


For the OpenCL binary edit src-ocl/eam_kernels.c to move this section about the typedef CL_REAL_T real_t;

#if defined(cl_khr_fp64)  // Khronos extension available?
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)  // AMD extension available?
#pragma OPENCL EXTENSION cl_amd_fp64 : enable

Then set:

export TAU_OPTIONS="-optShared -optVerbose -optTauSelectFile=`pwd`/select.tau"
export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-icpc-pdt

Running Cruft

./cruft -p ag -e -f data/8k.inp.gz


./cruft -f data/8k.inp.gz

And for OpenCL accelerated version:

tau_exec -T serial -opencl ./cruftOCL -p ag -e -f data/8k.inp.gz
tau_exec -T serial -opencl ./cruftOCL -f data/8k.inp.gz

Performance Data

EAM method:

First the serial version of Cruft shows two loops in eam.c consumes most of the time.

In comparison the OpenCL accelerated version two kernels dominate the runtime.

One thing you can check with OpenCL application is the time spent in command queue here the table for each kernel:

Profile Data:

Image:Cruft-EAM.ppk, Image:CruftOCL-EAM.ppk

LJ method:

First the serial version of Cruft shows a single loop accounts for runtime.

In comparison the OpenCL accelerated version the LJ_Force kernel dominate the runtime.

Ones again here is the time spent in the queue for this kernels.

Profile Data:

Image:Cruft-LJ.ppk, Image:CruftOCL-LJ.ppk

Personal tools