8.5. Blocking

Blocking Issue

Figure 8.7. Blocking Issue


A blocking issue means that there is an opportunity to reduce the number of cache misses or fetches by processing a smaller piece of the data set at a time, thereby reusing cache lines before they are evicted from the cache. For a general description of blocking, see Section 5.2.3.1, “Blocking”.

Freja can suggest three types of blocking:

The blocking issues contain several sections describing different loop levels in the loop hierarchy related to the issue, descriptions of these and other issue specific sections follows. The issues contain the following sections:

Different loops play different roles for blocking with respect to data reuse of a particular instruction. The following example contains four loop levels, some of which possess important traits to consider when blocking this piece of code. From the inside out:

for (i = 0; i < ITER; i++)
  for (j = 0; j < J_SIZE; j++) {
    a[j][0] = 0;
    for (k = 0; k < K_SIZE; k++)
      for (m = 0; m < M_SIZE; m++)
        b[m] += a[j][k];
  }
    

The following would be one proper blocking of this loop. The for j loop has been split up by the for i loop:

for (jj = 0; jj < J_SIZE; jj += BLOCK)
  for (i = 0; i < ITER; i++)
    for (j = jj; j < min(jj + BLOCK, J_SIZE); j++) {
      a[j][0] = 0;
      for (k = 0; k < K_SIZE; k++)
        for (m = 0; m < M_SIZE; m++)
          b[m] += a[j][k];
    }
    

The ideas underlying the blocking issue, and the sections presented here represents an ideal situation. In reality, due to effects of the sampling process, Freja may not have complete information about all loop levels. In such cases the closest identified loop level tends to be selected.

To help the programmer to judge how close the identified loop is from the optimal blocking level, Freja displays the number of iterations of that loop that corresponds to one data reuse by the instruction under consideration. The programmer can take this number and, while moving outwards from the target instruction, scan loops until the reported iteration count has been accounted for.