An Information-theoretic Approach to Methylation Data Analysis

John Goutsias
Johns Hopkins University

We present an information-theoretic approach to analyzing methylation data obtained by whole-genome bisulphite sequencing (WGBS). We quantify stochasticity in DNA methylation using a normalized version of Shannon’s entropy and show that this measure of stochasticity can be used to identify the boundaries of topologically associated domains (TADs), highly conserved structural features of the genome whose loci tend to frequently interact with each other, with much less frequent interactions being observed between loci of adjacent domains. We also discuss the use of the Jensen-Shannon distance as a measure of epigenetic discordance among biological samples and demonstrate its use to delineating lineages and identifying developmentally critical genes. By viewing methylation maintenance as a communications system, we introduce the notion of a methylation channel and discuss its information-theoretic properties. Finally, we introduce a sensitivity index that quantifies the rate by which environmental or external perturbations influence methylation stochasticity along the genome, showing that genomic loci associated with high sensitivity are those most affected by such perturbations.

Presentation (PDF File)

Back to Regulatory and Epigenetic Stochasticity in Development and Disease