Efficient Virtualization and Scheduling for Productive GPU-based High Performance Computing Systems Open Access
Downloadable ContentDownload PDF
Modern Graphics Processing Units (GPUs) are widely used as application accelerators in High Performance Computing (HPC) due to their massive floating-point computational capabilities and highly data-parallel computing architecture. GPU co-processors are present in top supercomputers, such as the Titan, Blue Waters, and Tianhe-1A, to name a few. However, there are productivity challenges in harnessing the full potential of these systems, while limiting power/energy consumption to the extent possible. In order to improve productivity, one must recognize the following facts. First, the most prevailing execution model in these heterogeneous supercomputers is Single-Program Multiple-Data (SPMD). The use of the SPMD model calls for a balance in the compute resources, namely GPUs and CPU cores, which is often not the case due to architectural, cost and system constraints. This gives rise to performance and portability issues for parallel applications. Additionally, multitasking, which is well supported on CPU-based systems, is currently inefficient on heterogeneous GPU-based computers. Second, as an expensive resource, the GPU is most likely to be simultaneously shared by two or more processes, which can lead to a degradation in performance if the sharing is not synergistic and application-aware. Third, with thousands of active nodes in contemporary high-performance computers, and with each GPU board consuming two to three times power compared with a microprocessor chip, the power consumption is very high.This research seeks to address the above challenges by proposing three core ideas based on which productive run-time systems can be built. These are:1. A GPU virtualization approach that hides the underlying asymmetry in the number of computational resources. The framework is based on a resource-management layer that handles the GPU on behalf of all microprocessors, and is supported by a detailed theoretical model to predict the achievable gains for a given application.2. A symbiotic task-scheduling technique that maximizes the performance of a set of tasks that are destined to share the GPU. The proposed symbiosis is achieved by reducing resource contention among concurrent GPU tasks, and by matching tasks with complementary execution characteristics.3. A power-efficient GPU task scheduling approach coupled with both power and performance optimization techniques. The proposed technique is a model-driven approach that predicts the power consumption of various schedules.The experimental results demonstrated up to 16 times speedup with the proposed GPU virtualization approach, and measurements indicated minimal overhead within the virtualization layer. These results also verified the accuracy of the theoretical model that analyzes GPU sharing through virtualization. Further benchmarking results also demonstrated the efficacy of the proposed symbiotic scheduling technique, by showing near-optimal performance and energy saving results as verified with an exhaustive search. By also considering the additional optimization objective of power, the proposed symbiotic scheduling solution has also shown the potential for effective power-reduction.