Novel Techniques for Graph Algorithm Acceleration Open Access
Downloadable ContentDownload PDF
The concept of graph has been around since Euler brought up the Seven Bridges of Knigsberg problem in 1736. Recent years have seen graph computing regains its momentum because of many emerging graph relevant applications, e.g., World-Wide-Web (WWW) networks, social and computer networks, metabolic interactions and chemical compound design graphs. This dissertation strives to provide graph computing systems which are able to quickly compute very large graph datasets with relatively low cost and expose easy programming interface to programmers.The first part of this dissertation introduces the Graphics Processing Units (GPUs) accelerated graph traversal which consists of two projects – Enterprise and iBFS. Particularly, Enterprise is the first work that achieves atomic operation free Breadth-First Search (BFS) on GPUs and iBFS is the first to conduct multiple traversals together on GPUs. Both projects achieve orders of magnitude speedup over state-of-the-art.The second part introduces SIMD-X, a graph framework that supports a variety of graph algorithms on GPUs. SIMD-X not only provides a simple Active-Compute-Combine (ACC) programming model for end users to express graph algorithms on Single Instruction Multiple Data (SIMD) GPUs, but also creates opportunities for system-level optimizations. Together, SIMD-X allows programmers to develop a typical graph algorithm with less than 100 Lines Of Code (LOCs) and achieve an order of magnitude speedup over Gunrock.Finally, this dissertation describes Graphene which can tackle trillion-edge graphs on a single machine with an array of Solid State Drives (SSDs). To enhance the bandwidth utilization of such an array of SSDs, we introduce a bitmap based IO request management component that improves bandwidth efficiency by 4 - 8× and a row-column 2D graph partition approach to balance the graph data access across the disks. Notably, Graphene achieves comparable performance to in-memory systems, e.g., Galois with merely 10% of memory consumption.