Secure Logistic Regression with Distributed Databases

Yuval Nardi
Technion - Israel Institute of Technology

Privacy concerns in single databases have seen an in ux of research papers that provide
appropriate solutions. When databases are merged across multiple sources, new concerns
arise, and di erent solutions should be devised. We develop a possible solution for comput-
ing logistic regression when the data are held by separate parties without actually combining
information sources, by exploiting results from the literature on multi-party secure compu-
tation. We provide only the nal result of the calculation compared with other methods
that share intermediate values and thus present an opportunity for compromise of values
in the combined database. We illustrate the nature of the calculations and their accuracy
using an extract of data from the Current Population Survey divided between two parties.

Back to Statistical and Learning-Theoretic Challenges in Data Privacy