Scaling species tree estimation methods to large datasets using NJMerge

Erin Molloy
University of Illinois at Urbana-Champaign
Computer Science

In this talk, I will present a new divide-and-conquer approach for scaling phylogeny estimation methods to large datasets that does not require supertree estimation. Instead, the approach operates by (1) dividing the species set into disjoint (instead of overlapping) subsets, (2) constructing trees on the subsets, and (3) merging the subset trees using a distance matrix computed on the full set of species. For this merger step, I will present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm of Saitou and Nei. I will then show the results of an extensive simulation study demonstrating NJMerge’s utility in scaling two popular species tree estimation methods. This is joint work with my advisor Tandy Warnow.

Back to Science at Extreme Scales: Where Big Data Meets Large-Scale Computing