[Tau-announcements] TAU v2.20, LiveDVD released

Sameer Shende sameer at cs.uoregon.edu
Thu Nov 11 15:37:46 PST 2010

	We are pleased to announce the release of TAU v2.20 and the POINT VI-HPS LiveDVD:


The following new features have been added since TAU 2.19.2 released on July 9, 2010.

1. Support for tracking GPGPU performance data

We are now introducing support for profiling and tracing GPGPU applications using CUDA and OpenCL. To enable these measurements, simply configure TAU with -cuda=<dir> and use:
% tau_exec -cuda ./a.out
% tau_exec -opencl ./a.out

	while running an un-instrumented or instrumented application. TAU will intercept the interactions between the host and the device and generate performance information at the Cuda driver or the OpenCL library level. It will also track the volume of data transferred between the host and the device. We would like to thank the NVIDIA Corporation for their support of the TAU project and the assistance provided.

2. Binary rewriting
TAU now supports DyninstAPI v7.0 and re-writing of static binaries as well as instrumentation of routines at the outer-loop level. The usage remains the same:
% configure -dyninst=<dir> -dwarf=<dir> [other options]; make install
% tau_run a.out -o a.inst -f select.tau
% cat select.tau
loops routine="foo#"
# instruments foo1, foo32 etc. at the outer-loop level

3. ParaProf enhancements
ParaProf has a new 3D display for visualizing the hardware topologies. Currently, we support the IBM BlueGene/P's topology metadata collected in TAU and all platforms supported by the Cube data format.
TAU also supports parsing the Google perftools data (paraprof -f google).
ParaProf's derived metric window has been updated and it is now possible to rename, cut, copy and paste applications, experiments and trials across different databases, similar to a file browser.

4. Event based sampling
TAU's runtime interposition tool tau_exec now supports event-based sampling:
% tau_exec -ebs ./a.out
% tau_exec -ebs -ebs_source=PAPI_FP_INS -ebs_period=1000000 ./a.out
generates periodic time or hardware counter based interrupts to generate a trace of events. These traces may be processed (using tau_ebs_process.pl) and loaded in paraprof or converted to OTF format and loaded in Vampir (using tau_ebs2otf.pl tool). Event-based sampling may be used on un-instrumented or instrumented binaries to measure the performance using a hybrid probe and sampling based approach. See README.sampling for further information and our ICPP 2010 paper:


5. Profiling ARMCI
TAU supports profiling of ARMCI constructs using a wrapper interposition library that may be pre-loaded in an application using tau_exec. To use this, please configure TAU with the -armci=<dir> option. This requires Global Arrays v5.0 that supports the PARMCI interface.  You may use:
% mpirun -np 4 tau_exec -armci ./a.out
to instrument the ARMCI calls and generate communication matrix data using TAU_COMM_MATRIX=1.

6. Profiling external libraries
tau_wrap allows us to generate wrapper interposition libraries using the interface header files. These instrumented wrapper libraries may now be pre-loaded using the tau_exec -loadlib=<file.so> command. See examples/iowrapper/hdf5_tau_exec.

7. Pre-computed metrics
Setting the environment variable TAU_PROFILE_FORMAT=merged now generates the profile data (tauprofile.xml) with pre-computed metrics (mean, and standard deviation). This improves the time for loading a large profile dataset and also creates a single file instead of a profile file for every rank. At the end of MPI_Finalize, a tree based reduction is performed and these statistics are computed. The XML snapshot file format allows old paraprof readers to read the new format while allowing the new paraprof to read old and new formats. The new format is read efficiently and the metrics are not computed sequentially in paraprof.

8. Measurement library enhancements
Throttled functions are now explicitly marked [THROTTLED] in the profile output and are stored in the TAU_DISABLE group. TAU's memory and I/O tracking support has been enhanced (tau_exec -io and tau_exec -memory). Mac OS X support for tracking I/O and memory has been enhanced. The measurement library supports the new Score-P measurement substrate (www.score-p.org). Score-P can generate TAU profile data using the SCOREP_ENABLE_PROFILING=1 and SCOREP_ProfileFormat=TauSnapshot environment variables. Using the Score-P measurement system, TAU can also generate OTF2 traces natively without need for merging or conversion prior to analysis in Vampir [www.vampir.eu]. Visit the OTF BOF at SC'10 for further information.

9. Added support for Java Virtual Machine Tool Interface (JVMTI)

TAU now supports both JVMTI and the older JVMPI (found in 1.4 or earlier) configurations. To use this, please configure TAU with the -jdk=<dir> configuration flag and launch the java code using tau_java, a new tool:
% java app
changes to
% tau_java app

	Using the tau_java script, TAU can instrument the Java byte code using the new byte code injection feature of JVMTI and considerably reduce the overhead for instrumentation compared to the older JVMPI implementation. No changes are needed to the Java application source code, virtual machine or the .class files.
10. Instrumentation and Analysis Enhancements
Before reverting to compiler-based instrumentation, TAU now prompts the user. Use -optRevert in the TAU_OPTIONS environment variable to override this feature. TAU's support for the now freely available ParaVer trace visualizer [www.bsc.es/paraver] has been enhanced to support inter-process communication in multi-threaded applications. To use ParaVer with TAU,
% setenv TAU_TRACE 1
% mpirun -np <procs> a.out
% tau_treemerge.pl
% tau_convert -paraver tau.trc tau.edf app.prv
% wxparaver app.prv &

11. New platforms and languages
TAU has been ported to the Cray XE6 and XMT platforms and supports compiler-based instrumentation (-optCompInst in TAU_OPTIONS) for the Cray CCE compilers. TAU also supports the Yorick programming language [http://yorick.sourceforge.net/].


We have the following new packages in the LiveDVD:
1. TAU v2.20
2. Vampir 7.2.0 and VampirServer 2.1.1
3. Scalasca 1.3.2
4. Cube 3.3.1
5. PerfSuite 1.0.0b1
6. DyninstAPI 7.0 beta (b19703c)
7. Kcachegrind 0.6
8. OpenMPI 1.4.2
9. ISP 0.2.0
10. WXParaver 3.99
11. Periscope 1.3
12. Eclipse 3.6 with PTP 4.0.5, GEM 4.0.5, and PyDev 1.6.3

We have also improved the installation of the LiveDVD to hard drive. You may download the ISO image from:


	We would like to thank our partners and sponsors for their help with making this distribution and providing extended core-count limited demo licenses for the Vampir and TotalView tools.

The TAU group will be giving demonstrations of our tools at this year's supercomputing conference at the follow times:

    * 7-9pm Monday (Nov. 15th)
    * 10-12pm and 3-6pm Tuesday (Nov. 16th)
    * 10-12pm and 3-6pm Wednesday (Nov. 17th)
    * 12-1pm Thursday (Nov. 18th)
    * 1-1:45pm Thursday Collaboration Area (Nov. 18th)

Please visit us at demo station 3 at the SC'10 NNSA/ASC booth #2438

    * Tutorial: M08: Hands-on Practical Parallel Application Performance Engineering using PAPI, PerfSuite, Scalasca, Vampir, and TAU Room 394, Monday Nov., 15, at SC'10. Tutorial M08. We will be using the POINT VI-HPS LiveDVD with pre-installed tools.
    * OTF BOF: Room 397 Tuesday, 5:30pm - 6:30pm

	Please let us know if we may assist you further.
	- Sameer
	(for tau-team @ cs.uoregon.edu)
Tau-users mailing list
Tau-users at nic.uoregon.edu

More information about the Tau-announcements mailing list