Statistical and Learning-Theoretic Challenges in Data Privacy

February 22 - 26, 2010


Privacy is a fundamental problem in modern data analysis. Collections of personal and sensitive data, previously the purview of governments and statistical agencies, have become ubiquitous. Increasing volumes of personal and sensitive data are collected and archived by health networks, government agencies, search engines, social networking websites, and other organizations. The potential social benefits of analyzing these databases are significant: better informed policy decisions, more efficient markets, and more accurate public health data, just to name a few. At the same time, releasing information from repositories of sensitive data can cause devastating damage to the privacy of individuals or organizations whose information is stored there. The challenge is to enable analysis of these databases, without compromising the privacy of the individuals whose data they contain. This problem is studied in several scientific communities and under several names, e.g. “statistical disclosure limitation”, “privacy-preserving data mining”, and “private data analysis”.

The goal of workshop is to establish a coherent theoretical foundation for research on data privacy. This implies work on (1) how the conflicting goals of privacy and utility can or should be formulated mathematically; and (2) how the constraints of privacy—in their various incarnations—affect the accuracy of statistical inference and machine learning. In particular, the goal is to shed light on the interplay between privacy and concepts such as consistency and efficiency of estimators, generalization error of learning, robustness and stability of estimation algorithms, and the generation of synthetic data.

Organizing Committee

Cynthia Dwork (Microsoft Research)
Stephen Fienberg (Carnegie-Mellon University)
Aleksandra Slavkovic (Pennsylvania State University)
Adam Smith, Chair (Pennsylvania State University)