From TAU Wiki

Jump to: navigation, search



The Blue Waters SPP Benchmark overview is located here: spp-methodology. Download and build/run instructions for the individual packages are located here: spp-benchmarks

The build and run instructions provided above are specific to the Blue Waters platform. The modifications outlined for the individual benchmarks below cover building and execution with and without TAU instrumentation on the NERSC Cori system. Unload the darshan module before using TAU. The Program Environment suggested by the individual benchmarks will be used where possible. If the environment used in the Blue Waters default build is unavailable the PrgEnv-intel module will be used unless otherwise noted. TAU instrumentation and sampling is done with a configuration like:

"./configure -arch=craycnl -useropt=-g\ -O2 -pdt=/global/homes/w/wspear/bin/pdtoolkit-3.24 -mpi -bfd=download -iowrapper -unwind=download"

Resulting in a TAU_MAKEFILE like: /global/homes/w/wspear/bin/tau2/craycnl/lib/Makefile.tau-intel-mpi-pdt

TAU_MAKEFILE tags will vary depending on the compiler selected.

Some system specific optimizations from the Blue Waters build/run instructions may have been removed without replacement. Compilation options will generally have -dynamic added to allow sampling with tau_exec. Slurm batch scripts are modified versions of the Blue Waters PBS scripts and may include PBS style commands where they are still supported by Slurm.


This benchmark build on Cori. The uninstrumented binary can be profiled with tau_exec. Source instrumentation with PDT works. Use this makefile to build on Cori. Substitute ftn and cc for tau_f90.sh and tau_cc.sh to build uninstrumented.

OFLAGS = tau_f90.sh -O3 -dynamic -mp1 -c -extend_source -assume byterecl -o
PFLAGS = tau_f90.sh -O3 -dynamic -mp1 -extend_source -assume byterecl -o

LIB = -lmpich
BIN = ../bin

OBJECTS = memory.o bound.o relax.o swap.o io.o source.o \
          custompi.o station.o structure.o operator.o \
          viscoop.o pml.o cerjan.o pmcl3d.o md5.o set_names.o \
          sgsndyna.o sgsnswap.o

pmcl3d: $(OBJECTS)
        $(PFLAGS) pmcl3d    $(OBJECTS)  $(LIB) 
        cp pmcl3d $(BIN)
md5.o: md5.c
        tau_cc.sh -O2 -c -g -o $@ md5.c

pmcl3d.o: pmcl3d.f
        $(OFLAGS) pmcl3d.o      pmcl3d.f

sgsndyna.o: sgsndyna.f
        $(OFLAGS) sgsndyna.o    sgsndyna.f

sgsnswap.o: sgsnswap.f
        $(OFLAGS) sgsnswap.o    sgsnswap.f

memory.o: memory.f
        $(OFLAGS) memory.o      memory.f

relax.o: relax.f
        $(OFLAGS) relax.o       relax.f

bound.o: bound.f
        $(OFLAGS) bound.o       bound.f

swap.o: swap.f
        $(OFLAGS) swap.o        swap.f

io.o: io.f
        $(OFLAGS) io.o          io.f

structure.o: structure.f
        $(OFLAGS) structure.o   structure.f

operator.o: operator.f
        $(OFLAGS) operator.o    operator.f

viscoop.o: viscoop.f
        $(OFLAGS) viscoop.o     viscoop.f

pml.o: pml.f
        $(OFLAGS) pml.o         pml.f

cerjan.o: cerjan.f
        $(OFLAGS) cerjan.o      cerjan.f

source.o: source.f
        $(OFLAGS) source.o      source.f

station.o: station.f
        $(OFLAGS) station.o     station.f

custompi.o: custompi.f
        $(OFLAGS) custompi.o    custompi.f

set_names.o: set_names.f
        $(OFLAGS) set_names.o   set_names.f

        rm -f pmcl3d $(BIN)/pmcl3d *.o parstat*

        rm fort.* core.* CHK* SS* V* SRCT*

Use this SLURM batch script instead of the provided scripts to run a small test.

#!/bin/bash -login
#PBS -N awp-odc_cpu_small
#PBS -l walltime=00:03:00,nodes=1
#PBS -j eo
#SBATCH --constraint=haswell
#SBATCH --partition=debug

#module swap $( module list | grep -o PrgEnv-.*$ ) PrgEnv-cray


bash ../pre-run

#export TAU_VERBOSE=1

unlink IN3D
ln -s ../IN3D_small IN3D

start_time=$(date +%s)
#srun -n 8 tau_exec -T INTEL,PDT -ebs ../../../../src-v1.1.2/pmcl3d
srun -n 8 ../../../../src-v1.1.2/pmcl3d
end_time=$(date +%s)
echo Completed in $(( end_time - start_time )) seconds


In progress. The BlueWaters package (uninstrumented) is not yet building on Cori.


This builds and runs on Cori when using the build.sh_NERSC script. It requires minimal adjustments. I advise downloading and running the 'small' test case for initial setup. tau_exec works properly if -dynamic is added to LDFLAGS in the Makefile in the ks_imp_rhmc directory.

Compilation with the TAU compiler wrapper scripts fails with a simple compiler command substitution. Field_utilities.c hangs compiler based instrumentation. Set TAU_OPTIONS like:

export TAU_OPTIONS="-optVerbose -optTauSelectFile=/global/homes/w/wspear/SPP/milc/MILC-apex_BW/select.tau"

with select.tau containing:


When running, the run_small.sh script can be adjusted to better suit Cori (e.g. increasing cores per node to 32). Setting export PROF_NUM_ITERS=100 will result in the execution ending very quickly which is helpful in setting up initial tests. To use tau_exec or set up TAU runtime variables edit milc_in.sh. The srun case can be edited as follows to invoke tau_exec: command="srun -n $N --ntasks-per-socket=$S tau_exec -T INTEL,PDT -ebs ./su3_rhmd_hisq"


The standard cpu build works if you manually load PrgEnv-gnu before running it. The gpu build fails (there is no cudatoolkit module on Cori). The benchmark package contains a 100 core run example. I would prefer to find a smaller test case.


Tested on BRIDGES with Intel compilers. Testing was successful except that standard profiles were not generated. Merged xml profile output showed up as expected.

Personal tools