Matching Methods for High-Dimensional Data with Applications to Text (with Brandon Stewart and Richard Nielsen)

Molly Roberts
University of California, San Diego (UCSD)

Matching is a popular technique for preprocessing observational data to facilitate
causal inference and reduce model dependence by ensuring that treated and
control units are balanced along pre-treatment covariates. While most applications
of matching balance on a small number of covariates, we identify situations where
matching with thousands of covariates may be desirable, such as causal inference
where confounders are measured with text. With high-dimensional covariates,
traditional matching methods are less effective and may be difficult or impossible to
implement. We characterize the problem of matching in a high-dimensional context
as a tradeoff between dimension reduction and imbalance bounding. We develop a
new method called Topical Inverse Regression Matching (TIRM) that optimizes this
tradeoff by including both a low-dimensional projection of covariates and information
about the probability of treatment. We illustrate our approach by estimating
the effect of censorship on the writing of Chinese bloggers, the effects of gender on
citation counts in international relations, and the effects of targeted killings and
capture by counterterrorists on the popularity of jihadist writings.

Presentation (PDF File)

Back to Workshop III: Cultural Patterns: Multiscale Data-driven Models