Data Driven Algorithm Design

Nina Balcan
Carnegie Mellon University

Data driven algorithm design for combinatorial problems is an important aspect of modern data science and algorithm design. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm’s behavior, so the algorithm’s performance is a discontinuous function of its parameters.

In this talk, I will present new work that helps put data driven combinatorial algorithm selection on firm foundations. We provide strong computational and statistical performance guarantees for several subset selection and combinatorial partitioning problems (including various forms of clustering), both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively.

Back to Workshop IV: New Architectures and Algorithms