Two Clustering Problems for the Stochastic Block Model

Ioana Dumitriu
University of Washington

The Stochastic Block Model (SBM) and its variants are the most widely-used "synthetic" examples for network modeling. They are used for devising, testing, analyzing and benchmarking algorithms in community detection. The problem is formulated as follows: given a network (perhaps known through its local connections, i.e., the adjacency matrix) created by sampling the SBM, under what parameter regimes is it possible to devise an algorithm that correctly or approximately identifies the "clusters" or "communities" in the network? Any such algorithm would of course only work with high probability, as the model could in principle generate the empty graph or the complete graph, which have no non-trivial communities. The analysis that is needed in order to calculate or estimate is quite complex and it involves non-trivial random matrix theory results, as well as combinatorics, convex optimization, linear algebra, etc.

While a lot has been learned in the last five years about the regimes in which such algorithms may exist, the problem in its generality is still wide open. We will discuss progress we made on a couple of models and talk about a number of open questions.

Presentation (PDF File)

Back to Workshop IV: Synergies between Machine Learning and Physical Models