Chapter 6. Optimization Workflow

Table of Contents

6.1. Initial State: Correct, Measurable Program, Good Test Case
6.2. Avoid Unnecessary Memory Accesses
6.3. Optimize Data Layout
6.4. Optimize Access Patterns
6.5. Utilize Reuse Opportunities
6.6. Use Non-Temporal Hints for Data without Temporal Reuse
6.7. Avoid False Sharing
6.8. Avoid Communication between Caches (Coherence Traffic)
6.9. Hide Remaining Misses

The process to optimize an application for good cache performance involves distinct phases, each targeting a specific category of problems. The order of the phases is somewhat important, as some problems obscure others, and certain transformations will enable other approaches.

This chapter outlines one way to approach this rather complex situation from a memory hierarchy standpoint. There are numerous other aspects of improving application performance than is listed here, ranging from the macro scale of properly establishing an efficient architecture and a matching development process, selecting the optimal algorithms, managing time and space complexity of different storage methods, database schema optimization, minimizing database and communication overheads, arranging for parallelization, judicious inlining and denormalization, and all the way down to CPU pipeline granularity tuning.

You may need to perform optimization in any one of these areas.