Exploiting Hierarchical Parallelism Using UPC Open Access
Downloadable ContentDownload PDF
High-Performance Computing (HPC) systems are increasingly moving towards an architecture that is deeply hierarchical and heterogeneous, comprised of multicore processors and hardware accelerators. This architectural shift has added significant programming complexity as the end users must now understand and exploit parallelism at multiple levels. Unfortunately the single-level parallelism execution model embodied in the legacy parallel programming models falls short in exploiting the available multi-level parallelism opportunities in these architectures. This makes the use of richer execution models imperative in order to fully exploit hierarchical parallelism. This thesis explores multi-level parallelism opportunities in such architectures at the language and application levels by investigating possible extensions to the Unified Parallel C (UPC) parallel programming language and integrating it with other complementary programming paradigms. UPC is a parallel extension of ISO C and supports the Partitioned Global Address Space (PGAS) programming model. It presents a globally shared address space that enables one-sided communication constructs for ease-of-use to programmers. In addition, being a PGAS language, it also provides data locality awareness for higher performance. This research identifies programming language features and important runtime-support that can help improve programmability and application performance on hierarchical architectures. It proposes two approaches. The first approach orchestrates computations on multiple sets of thread groups, whereas the second approach extends UPC with nested, shared memory multi-threading for the facilitation of expressing hierarchical execution. This thesis formally proposes these approaches and evaluates their applicability through high performance implementations of several parallel application benchmarks, such as the NAS FT and the Unbalanced Tree Search benchmark. The results demonstrate that the explicit hierarchical programming model is better positioned for modern HPC systems.