Data analyses typically rely on assumptions about missingness mechanisms that lead to observed versus missing data. We explore an approach where the joint distribution of observed data and missing data is specified through non-standard conditional distributions. In this formulation, which traces back to a factorization of the joint distribution, apparently first proposed by J.W. Tukey in a discussion, the modeling assumptions about the conditional factors are either testable or are designed to allow the incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both missing and observed. We apply Tukey’s conditional representation to exponential family models, and we propose a computationally tractable inferential strategy for this class of models.
This is joint work with Edo Airoldi and Alexander Franks.