Due to alternative splicing, a gene may be transcribed into several different mRNA transcripts (called isoforms) in eukaryotic species. How to detect isoforms on a genomic scale and measure their abundance levels in a cell is a central problem in transcriptomics and has broad applications in biology and medicine. Traditional experimental methods for this purpose are time consuming and cost ineffective.
Although deep sequencing technologies such as RNA-Seq provide a possible effective method to address this problem, the inference of isoforms from tens of millions of short sequence reads produced by RNA-Seq has remained
computationally challenging. In this talk, I will first describe the algorithmic framework of a method based on mathematical programming, called IsoInfer, for inferring isoforms from RNA-Seq data. The design of IsoInfer exhibits an interesting combination of combinatorial optimization
techniques (e.g., convex quadratic programming) and statistical concepts (e.g., maximum likelihood estimation). Next, I will introduce our recent improvement of IsoInfer, called IsoLasso. The new method incorporates the well-known LASSO regression technique into the quadratic program of IsoInfer and is likely to deliver isoform solutions with both good accuracy and sparsity. Our extensive experiments on both simulated and real RNA-Seq data demonstrate that this addition could help IsoLasso to filter out lowly expressed isoforms (which are often noisy) and achieve higher sensitivity and precision simultaneously than the other state-of-the-art transcriptome assembly tools.
This is joint work with Jianxing Feng (Tsinghua/Tongji U) and Wei Li (UCR).