[Tau-announcements] TAU v2.21.2 released

Sameer Shende sameer at cs.uoregon.edu
Mon Mar 26 18:00:21 PDT 2012


	We are pleased to announce the release of TAU v2.21.2:


The following new features have been added since TAU 2.21.1 released on Dec. 14, 2012.

1. Port to IBM BlueGene/Q
To use TAU on IBM BlueGene/Q, please use the -arch=bgq configuration flag.
For e.g.,
./configure -arch=bgq -mpi -BGQTIMERS -pdt=/soft/perftools/tau/pdtoolkit-3.17 -pdt_c++=xlC -bfd=download; make install

2. ParaProf
ParaProf now includes an updated 3D display with support for showing topology views for both interval and atomic events. It supports topology displays for IBM BlueGene/Q.

3. Support for OpenACC instrumentation using PGI v12.3 compiler
PGI v12.3 compiler supports OpenACC [http://www.openacc-standard.org] standard.
This release of TAU includes support for instrumentation of OpenACC code using
instrumentation at the PGI runtime library level. This allows us to view the events seen from the host for data transfer and execution of kernels (both synchronous and asynchronous). The information captured shows low level details including time to upload and download data showing the variable name, function name, block sizes and associates it with the source file and line number. We would like to thank the Portland Group for their support of the TAU project. For an example, please refer to:


4. TAU supports CUDA 4.1.

TAU's measurement layer is capable of utilizing the new features of the
CUpti package available in CUDA 4.1. The preferred method for
instrumentation is to use the '-cupti' option to tau_exec which takes
advantage of these capabilities (the '-cuda' option is retained for CUDA
4.0 and earlier). We would like to thank NVIDIA Corporation for their support of the TAU project.

5. Device memory tracking in CUDA.

Details about the GPU device memory usage: block size, local memory,
registers and shared memory (static and dynamic) are tracked for each
invocation of the a CUDA kernel. For an example, please refer to:


6. Tracking queue wait time in OpenCL.

The time spent waiting (i.e. queued / submitted) between when the host
enqueues a kernel and it begins executing on the device is now tracked
for each OpenCL kernel. For an example, please refer to:


7. Updates to Opari2
TAU now includes the latest Opari2 v1.0.1 release for better support of OpenMP programs.

8. Debugging callstack support in TAU
When TAU_TRACK_SIGNALS=1 is used, an abnormal program termination triggers capturing the state of the program callstack (stored as program metadata). This capability now works with multi-language programs including Python, Fortran, C++, C.

9. Videos
We have updated our documentation and videos on our website.

   There are also other bug fixes in this release.

   Please let us know if we may assist you with our tools in any way.
   - Sameer
  (for tau-team@ cs.uoregon.edu)

More information about the Tau-announcements mailing list