Privacy-preserving Classification

Kamalika Chaudhuri
University of California, San Diego (UCSD)

Privacy-preserving machine learning is an emerging problem, particularly as a lot of sensitive data such as financial and medical records are increasingly collected and stored, and mined to draw meaningful conclusions. In this talk, we address the problem of privacy-preserving classification -- how to design and release a linear classifier trained on sensitive data so that the privacy of individuals in the training data is preserved. We examine privacy-preserving classification in the differential privacy model of Dwork et al.

In this talk, we present an efficient and differentially private linear classifier. Our classifier works in the ERM (empirical loss minimization) framework, and includes privacy preserving logistic regression and privacy preserving support vector machines. We prove that our classifier is private, provide analytical bounds on the sample requirement of our classifier, and evaluate our classifiers on real data.

Based on joint work with Claire Monteleoni (CCLS, Columbia) and Anand Sarwate (UCSD).

Presentation (PowerPoint File)

Back to Statistical and Learning-Theoretic Challenges in Data Privacy