Sample complexity of estimation in logistic regression

Arya Mazumdar
University of California, San Diego (UCSD)

The logistic regression model is one of the most popular data generation models in noisy binary classification problems. In this talk, we will discuss the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.

This is a joint work with Daniel Hsu.


Back to EnCORE Workshop on Computational vs Statistical Gaps in Learning and Optimization