Dr. Longbin Lai obtained his Degree of Bachelor and Master from Shanghai Jiao Tong University (SJTU) in 2010 and 2013 respectively. After that, He joined the Database group in CSE, UNSW, Sydney for Ph.D study under the supervision of Prof. Xuemin Lin and Dr. Lu Qin. He obtained his Ph.D degree in Mar. 2017, and his thesis is available here. He now joins Alibaba Damo Academy to build intelligent systems for big (graph) data processing. His research interests include graph algorithms, graph database, database management, distributed algorithms and systems.
PhD in Computer Science and Engineering, 2017
University of New South Wales, Sydney
Master in Computer Engineering, 2013
Shanghai Jiao Tong University
BSc in Information Security, 2010
Shanghai Jiao Tong University
Subgraph enumeration is a fundamental problem in graph analytics, which aims to find all instances of a given query graph on a large data graph. In this paper, we propose a system called HUGE to efficiently process subgraph enumeration at scale in the distributed context. HUGE features 1) an optimiser to compute the generic optimal execution plan without the constraints of existing works; 2) a hybrid communication layer that supports both pushing and pulling communication; 3) a novel two-stage execution mode with a lock-free and zero-copy cache design, 4) a BFS/DFS-adaptive scheduler to bound memory consumption, and 5) two-layer intra- and inter-machine load balancing. HUGE is generic such that all existing distributed subgraph enumeration algorithms can be plugged into the system to enjoy automatic speed up and bounded-memory execution.
GAIA is a distributed system designed specifically to make it easy for a variety of users to interactively analyze big graph data on large clusters at low latency. It adopts a high-level language called Gremlin for graph traversal, and provides automatic parallel execution. In particular, we advocate a powerful new abstraction called Scope that caters to the specific needs in this new computation model to scale graph queries with complex dependencies and runtime dynamics, while at the same time maintaining the simple and concise programming model. GAIA has been deployed in production clusters at Alibaba to support a variety of business-critical scenarios. Extensive evaluations using both benchmarks and real-world applications have validated the effectiveness of the proposed techniques, which enables GAIA to execute complex Gremlin traversal with orders-of-magnitude better performance than existing high-performance engines, and at much larger scales than recent state-of-the-art Gremlin-enabled systems such as JanusGraph.
Recently there emerge many distributed algorithms that aim at solving subgraph matching at scale. Existing algorithm-level comparisons failed to provide a systematic view of distributed subgraph matching mainly due to the intertwining of strategy and optimization. In this paper, we identify four strategies and three general-purpose optimizations from representative state-of-the-art algorithms. We implement the four strategies with the optimizations based on the common Timely dataflow system for systematic strategy-level comparison. Our implementation covers all representative algorithms. We conduct extensive experiments for both unlabelled matching and labelled matching to analyze the performance of distributed subgraph matching under various settings, which is finally summarized as a practical guide.
Graph pattern matching is one of the most fundamental problems in graph database and is associated with a wide spectrum of applications. Due to its computational intensiveness, researchers have primarily devoted their efforts to improving the performance of the algorithm while constraining the graphs to have singular labels on vertices (edges) or no label. Whereas in practice graphs are typically associated with rich properties, thus the main focus in the industry is instead on powerful query languages that can express a sufficient number of pattern matching scenarios. We demo PatMat in this work to glue together the academic efforts on performance and the industrial efforts on expressiveness. To do so, we leverage the state-of-the-art join-based algorithms in the distributed contexts and Cypher query language - the most widely-adopted declarative language for graph pattern matching. The experiments demonstrate how we are capable of turning complex Cypher semantics into a distributed solution with high performance.