Guide:TAUChapel

From TAU Wiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 23:25, 4 October 2013 (edit)
Scottb (Talk | contribs)
(Guilde:TAUChapel moved to Guide:TAUChapel: Spelling error)
← Previous diff
Revision as of 17:19, 5 October 2013 (edit)
Scottb (Talk | contribs)
(Performance Results)
Next diff →
Line 82: Line 82:
=== Performance Results === === Performance Results ===
 +
 +There are a couple of options for collecting Chapel performance data with TAU. To begin configure TAU with PDT, pthreads and bfd (for sampling).
 +
 +Compiling Chapel with '''--savec c_code''' will store the intermediate C sources files in '''c_code'''. Compiling the C code with TAU is easy:
 +
 + make -f c_code/Makefile CC=tau_cc.sh
 +
 +However since each source file is included as a header, none of them will be instrumented.

Revision as of 17:19, 5 October 2013

Contents

Chapel

MonteCarlo example

To test out some Chapel's language features let us program a MonteCarlo simulation to calculate PI. We can calculate PI by assessing how many points with coordinates x,y fit in the unit circle, ie x^2+y^2<=1.

Basic

Here is the basic routine that computes PI:

proc compute_pi(p_x: [] real(64), p_y: [] real(64)) : real {

 var c : sync int;
 c = 0;
 forall i in 1..n {
   if (x ** 2 + y ** 2 <= 1) then
       c += 1;
 }
 return c * 4.0 / n;

}

Notice that the forall here will compute each iteration in parallel, hence the need to define variable c as a sync variable. Performance here is limited by the need to synchronize access to c. Take a look of this profile:

Image:pi_with_tasks.png

70% percent of the time is spent in synchronization. Let's see if we can do better.

Procedure promotion

One feature of Chapel is procedure promotion, this is where calling a procedure that takes scalar arguments with an array, will have be as if each element of the array is passed to the procedure in parallel:

proc compute_pi(p_x: [] real(64), p_y: [] real(64)) : real {

 var c : sync int;
 forall i in in_circle(p_x, p_y) {
   c += i;
 }
 return c * 4.0 / n;

}
proc in_circle(x: real(64), y: real(64)): bool
{
  return (x ** 2 + y ** 2) <= 1;
}

Reduction

Furthermore with reorganization will allow us to take advantage of Chapel's built in reduction:

proc compute_pi(p_x: [] real(64), p_y: [] real(64)) : real {

 var c : int;
 c= +reduce in_circle(p_x, p_y);
 return c * 4.0 / n;

}

This also improves performance:

Image:pi_with_data.png

Multiple Locales

Let's look at how the array of x and y values are allocated:

var p_x: [1..n] real(64);
var p_y: [1..n] real(64);

However Chapel provides a way to distribute these array across multiple locales:


const space = {1..n};
var Dom: domain(1) dmapped Block(boundingBox=space) = space;

var p_x: [Dom] real(64);
var p_y: [Dom] real(64);

This Block mapping will allocate the elements block-wise among the locales. Furthermore the reduction used earlier will continue to work.

Performance Results

There are a couple of options for collecting Chapel performance data with TAU. To begin configure TAU with PDT, pthreads and bfd (for sampling).

Compiling Chapel with --savec c_code will store the intermediate C sources files in c_code. Compiling the C code with TAU is easy:

make -f c_code/Makefile CC=tau_cc.sh

However since each source file is included as a header, none of them will be instrumented.

Personal tools