Mining, Indexing, and Searching Graphs in Biological Databases

Jiawei Han
University of Illinois at Urbana-Champaign

Graph search and graph pattern discovery form an important research frontier in Search and Knowledge Building for Biological Datasets.
This problem is challenging due to the presence of an exponential number of frequent subgraphs in large biological datasets.
In this talk, we present our recent progress on developing efficient and scalable methods for mining and searching of graphs in large biological databases. We introduce gSpan and CloseGraph, two efficient methods for mining frequent graph patterns in graph databases, as well as constraint-based graph mining methods. Then we present a graph indexing method, gIndex, and a graph approximate searching method, grafil, both taking advantages of frequent graph mining to construct a compact but highly effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and querying graph patterns in massive datasets but also claim broad applications in related fields. Finally, we show some open research problems and discuss the potential methods to solve those problems.

Presentation (PowerPoint File)

Back to Workshop IV: Search and Knowledge Building for Biological Datasets