GraphScope: A One-Stop Large Graph Processing System

Abstract

Due to diverse graph data and algorithms, programming and orchestration of complex computation pipelines have become the major challenges to making use of graph applications for Web-scale data analysis. GraphScope aims to provide a one-stop and efficient solution for a wide range of graph computations at scale. It extends previous systems by offering a unified and high-level programming interface and allowing the seamless integration of specialized graph engines in a general data-parallel computing environment. As we will show in this demo, GraphScope enables developers to write sequential graph programs in Python and provides automatic parallel execution on a cluster. This further allows GraphScope to seamlessly integrate with existing data processing systems in PyData ecosystem. To validate GraphScope’s efficiency, we will compare a complex, multi-staged processing pipeline for a real-life fraud detection task with a manually assembled implementation comprising multiple systems. GraphScope achieves a 2.86× speedup on a trillion-scale graph in real production at Alibaba.

Publication
Proc. VLDB Endow.