In this talk, we will investigate the problem of detecting the dependency between two random databases represented as matrices. This is formalized as a hypothesis testing problem, where under the null hypothesis, the two databases are independently generated; under the alternative, the two databases are dependent under some latent row permutation/alignment, but have the same marginal distributions as the null. We determine sharp thresholds (information-theoretic and computational) at which optimal testing error probability exhibits a phase transition from zero to one, as a function of the database's dimensions and their generative distributions.
Back to Long Programs