Enabling transcript quantification in non-model organisms with RNA-Seq and generative probabilistic models

Colin Dewey
University of Wisconsin-Madison
Biostatistics & Medical Informatics and Computer Science

RNA-Seq is a powerful technology for analyzing transcriptomes that is predicted to replace microarrays. A key feature of this technology is the fact that it requires no prior knowledge of transcript or genome sequences. Thus, RNA-Seq allows for transcriptome analyses in non-model organisms, for which we typically have limited sequence data. Our group is focused on the development of computational and statistical methodology for this exciting application of RNA-Seq. We have developed two methods, RSEM and PSGInfer, which address two major challenges in the analysis of RNA-Seq data in the absence of a reference genome. Our first method, RSEM, provides a principled approach to the handling of multireads: reads that map to multiple transcripts, which are common when using de novo transcriptome assemblies. The second method, PSGInfer, addresses the issues of modeling alternatively spliced genes, for which full-length transcripts are difficult to reconstruct from RNA-Seq data alone. Both methods are based on efficient inference algorithms for generative probabilistic models of the RNA-Seq process. We show how the combination of RNA-Seq with these methods is enabling insights into how regeneration occurs in the axolotl salamander, the genome for which is extremely large and currently unsequenced.

Back to Workshop II: Transcriptomics and Epigenomics