Characterizing transcriptomes from high throughput sequencing data: from yeast to mammals

Moran Yassour
Broad Institute

Experimentally defining the complete transcriptome of eukaryotic organisms has traditionally been a challenging task, but advances in sequencing RNA (RNAseq) offer new and powerful approaches to the study of transcriptomes. Most studies have used RNAseq to quantify the expression levels of known genes, identify splice isoforms and refine gene boundaries. However, many studies depend on an existing annotation or sequenced genomes, limiting the ability of discovering novel transcripts and studying diverse organisms. I will present a series of studies on the development of technologies and tools for RNAseq analysis and their application in organisms ranging from yeast to mouse. I will focus on different approaches I have developed for transcriptome reconstruction, from mapping-first ones that rely only on an available genome sequence, to Trinity a method for de novo assembly of full-length transcripts without requiring a sequenced genome, but with a sensitivity similar to methods that rely on genome alignments. In addition, I will describe systematic approaches to assess the quality of RNA-Seq experiments for annotation and expression quantification, and how we used them in a comparative study on library construction methods for strand specific RNAseq. Finally, I will show how these approaches scale to organisms from yeasts to vertebrates, helping in genome annotation of newly discovered organisms from the Schizosaccharomyces clade, the identification of extensive regulated long antisense transcripts that are conserved across yeast species, transcriptome analysis in the Bemisia tabaci whitefly, for which the genome sequence is not available, and for the discovery of alternatively spliced isoform in mouse.

Back to Workshop I: Next-generation Sequencing Technology and Algorithms for Primary Data Analysis