In this talk, we provide a theoretical framework for interpreting neural network decisions by formalizing the problem in a rate-distortion framework. The solver of the associated
optimization, which we coin Rate-Distortion Explanation (RDE), is then accessible to a mathematical analysis. We will discuss theoretical results as well as present numerical experiments showing that our algorithmic approach outperforms established methods, in particular, for sparse explanations of neural network decisions.