A unified method for detecting copy number variants and their association with gene expression levels

Mahlet Tadesse
Georgetown University

Copy number variants (CNV) are chromosomal aberrations resulting in DNA segments having an abnormal number of copies. The problem of detecting CNVs has received a lot of attention and several methods have been developed to infer CNVs from high-throughput array-based technologies. There is also a strong interest in identifying associations between CNVs and biological functions. These analyses are commonly done in two stages, by first inferring the CNV calls and using them in the analysis as if they were the true copy numbers. Another commonly used procedure performs the association analysis using the normalized raw intensity measurements. These approaches have several limitations. We propose a hierarchical Bayesian model that handles both the CNV detection and association analysis in a unified manner, by integrating array CGH and gene expression data collected on the same set of subjects. We specify a measurement error model that relates the gene expression levels to the latent copy number states, which in turn are related to the observed surrogate fluorescence intensity measurements via a hidden Markov model. Latent selection indicators that exploit the dependencies between copy number states at adjacent chromosomal locations are incorporated into the model. Model fitting and posterior inference are accomplished via Markov chain Monte Carlo stochastic search techniques. We demonstrate the performance of the method on simulated data and illustrate its application on a genomic study.


Back to Blackwell-Tapia Conference and Awards Ceremony