Revision as of 23:44, 15 May 2008

ENZO Performance Study Summary

This is a short overview of the performance result from the ENZO application. For each experiment we used these inits/param files:

inits
param

This is a relatively small experiment but was sufficient to generate some interesting performance results. For this study we used the TAU Performance System® to gather information about ENZO's performance, in particular we are interested in the performance of the AMR simulation at scale. We ran these experiments on NCSA's Intel 64 Linux Cluster (Abe).

TAU Measurement overhead

Here is a short table listing the run-times for various experiments and the instrumentation overhead observed. Each run was on 64 processors (8 nodes).

Run Type	Runtime (seconds)	Overhead %
Uninstrumented runtime	1072	NA
Trace of only MPI event	1085	4.8%
Profile of all significant events	1136	6.0%
Profile with Call-path information	1196	11.6%
Profile of each Phase of execution	1208	12.7%

Runtime Breakdown on 64 processors

Here is a chart showing the contribution each function makes to the overall runtime. Notice that MPI communication time takes over 60% of the total runtime.

Experiment Scalability

Given the amount of time spend in MPI communications we do not expect this experiment to scale well. This chart shows that MPI communication time continues to dominate the runtime to an even greater extent at scale.

Experiment Trace

This graphic shows how load imbalances causes long wait times for MPI_Allreduce. Some processors are experiencing as much as 8 seconds of wait time per reduce.

Experiment Call-Paths

We observe the follow relationships in the experiment callpath:

Almost all the time spend in MPI_Bcast is when it is called from MPI_Allreduce.
Almost all the time spend in MPI_Recv is when it is called from grid::CommunicationSendRegion.
Most all the time spend in MPI_Allgather is when it is called from CommunicationShareGrids.
Almost all the time spend in MPI_Allreduce is when it is called from CommunicationMinValue.

This chart show the details:

@@ Line 1: / Line 1: @@
 =ENZO Performance Study Summary=
-This is a short overview to the performance result of the ENZO application. For each experiment we used these inits/param files:
+This is a short overview of the performance result from the ENZO application. For each experiment we used these inits/param files:
 * [http://giusto.nic.uoregon.edu/~scottb/SingleGrid_dmonly.inits inits]
 * [http://giusto.nic.uoregon.edu/~scottb/SingleGrid_dmonly_amr.param param]
-This is a relatively small experiment but was sufficent to generate some interesting performance results. For this study we used the [http://tau.uoregon.edu TAU Performance System®] to gather information about ENZO's performance, in particular we are interested in the Performance of the AMR simulation at scale. We ran these experiments on NCSA's Intel 64 Linux Cluster (Abe).
+This is a relatively small experiment but was sufficient to generate some interesting performance results. For this study we used the [http://tau.uoregon.edu TAU Performance System®] to gather information about ENZO's performance, in particular we are interested in the performance of the AMR simulation at scale. We ran these experiments on NCSA's Intel 64 Linux Cluster (Abe).
 ==TAU Measurement overhead==
@@ Line 51: / Line 51: @@
 ==Experiment Trace==
-This graphic shows how load imbalances cause long wait times for MPI_Allreduce. Some processors are experiencing as much as 8 seconds of wait time per reduce.
+This graphic shows how load imbalances causes long wait times for MPI_Allreduce. Some processors are experiencing as much as 8 seconds of wait time per reduce.
 [[Image:trace.png|1000px]]

Difference between revisions of "ENZO"

Revision as of 23:44, 15 May 2008

Contents

ENZO Performance Study Summary

TAU Measurement overhead

Runtime Breakdown on 64 processors

Experiment Scalability

Experiment Trace

Experiment Call-Paths

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools