The Fundamental Law of Information Reconstruction tells us that “overly accurate” estimates of “too many” statistics computed from a dataset completely destroys the privacy of the dataset. This says that any large set of statistics must intentionally introduce noise in order to maintain privacy.
Differential privacy is a mathematically rigorous definition of privacy tailored to statistical analysis of large datasets. Differentially private systems simultaneously provide useful statistics to the well-intentioned data analyst and strong protection against arbitrarily powerful adversarial system users — without needing to distinguish between the two. Differentially private systems “don’t care” what the adversary knows, now or in the future. Finally, differentially private systems can rigorously bound and control the cumulative privacy loss that accrues over many interactions with confidential data. These unique properties have led to extensive deployment of differential privacy in industry, and led the US Census Bureau to adopt differential privacy as the disclosure avoidance methodology for the 2020 Decennial Census.
After motivating the definition of differential privacy and giving some intuition on how it can be achieved, we will discuss challenges faced by the Census Bureau in the gargantuan task of deploying differential privacy to provide estimates of billions of statistics (see the Fundamental Law), as well as the difficulties faced by the data consumers whose traditional methods of interacting with Census statistics have been upended.