A Shared Memory Parallel Block Streaming Model for Irregular Applications
Abstract
Due to worsening machine balance, a lightweight irregular application can utilize only a small fraction of the peak computational capacity on modern processors. Performance of such an application is also unpredictable due to the scattered data accesses. Even though architectural features such as a cache system, hardware prefetchers etc., that reduce the cost of irregular access, are commonly found in most of the modern processors, their design parameters differ widely from one processor to another. Therefore, a performance improving programming technique still needs extensive tuning to gain maximum benefit on a target processor. In this scenario, achieving portable performance becomes difficult. This work proposes a block streaming machine model and hypothesizes that an algorithm based on the model has predictable execution time. To enable adaptation of this model for irregular applications, we also provide algorithmic transformations that can be used to replace the scattered accesses with streaming accesses in a cost predictable way. Further, we experimentally demonstrate usefulness of the model and the transformations for static lightweight irregular computations such as those performed by a numerical partial differential equation solver on modern multicore processors.
Keywords
Full Text:
PDFRefbacks
- There are currently no refbacks.