Estimating Re-identification Risks in the Context of Population-Based Statistics

Bradley Malin
Vanderbilt University

In this talk, I will discuss practical approaches to use statistics from population data (e.g., Census count tables) to determine the likelihood that a particular record in a sample can be linked to less than k people in a population. As an example application, I will demonstrate how we have applied this method to evaluate the relative risks in data sharing policies associated with the HIPAA Privacy Rule. I will also provide a glimpse into how we are leveraging such approaches to perform risk evaluations for specific datasets in the NIH-sponsored Electronic Medical Record and Genomics (eMERGE) network.


Back to Statistical and Learning-Theoretic Challenges in Data Privacy