Mathematical and Computational Challenges in Reconstructing Evolution

Tandy Warnow
University of Illinois at Urbana-Champaign
Computer Sciences

The estimation of species trees from multi-locus datasets is a basic step in many biological research projects. However, heterogeneity between the loci resulting from processes such as incomplete lineage sorting and horizontal gene transfer make standard approaches (such as concatenation using maximum likelihood) statistically inconsistent. In this talk, I will present the state of the art methods for species tree estimation from multi-locus data sets when gene trees can differ from the species tree due to incomplete lineage sorting. I will also discuss the current understanding about statistical consistency in two contexts: when sequence lengths and number of genes both go to infinity (essentially assuming perfect gene trees) or when the sequence length per gene is bounded but the number of genes goes to infinity. I will also present the state of the art methods for large-scale species tree estimation, and present new techniques for improving the scalability of these methods to large data sets. Much of this talk will be unpublished research, joint with Erin Molloy (Illinois), Mike Nute (Illinois), and Sebastien Roch (Wisconsin).

Presentation (PDF File)

Back to Workshop III: HPC for Computationally and Data-Intensive Problems