A.1. Introduction

As systems become larger and applications are distributed, there is a need to sample and analyze MPI applications. For local applications, Freja analyzes one process at a time. The same is true for distributed application; Freja samples and analyzes each rank individually.

It is important, however, to analyze the correct process in the distributed environment. As shown in the figure below, it is possible to analyze the bootstrapping program mpirun rather than the distributed instances of the application. To enable Freja to produce useful samples of the application, Freja needs to be distributed along with the application. Use the command:

$ mpirun ... sample -r application

to launch the sampler on each node. Each sampler then launches and samples an instance of your application and creates a fingerprint file from the execution. Take care to assign unique filenames to each of these files.

MPI Sampling Principles

Figure A.1. MPI Sampling Principles


On some systems (e.g., SGI, Cray), and with some MPI variants (e.g., MPT), the MPI runtime system may optimize the process of launching several ranks per node and improve performance in communication between these nodes by making use of an extra shepherd process. In such case, it is necessary to tell Freja to refrain from sampling the shepherd process by telling it which process generation to sample.

Message Passing Toolkit, runtime system and shepherd process

Figure A.2. Message Passing Toolkit, runtime system and shepherd process


Use sample -g 1 to sample the children processes of a sheperd process.

[Note]Note

If only one rank is requested, the shepard process turns into a compute process and you should give the command sample -g 0 instead.