Next-generation sequence characterization of complex genome structural variation

Can Alkan
University of Washington

Structural variation, in the broadest sense, is defined as the genomic changes among individuals that are not single nucleotide variants. Rapid computational methods are needed to comprehensively detect and characterize specific classes of structural variation using next-gen sequencing technology. We have developed a suite of tools using a new aligner, mrFAST, and algorithms focused on the characterization of structural variants that have been more difficult to assay : (i) inversions and mobile element insertions using read-pair signatures (VariationHunter), (ii) novel sequence insertions coupling read-pair data local sequence assembly (NovelSeq), (iii) absolute copy number of duplicated genes using read-depth analysis coupled with single-unique nucleotide (SUN) identifiers. I will present a summary of our results of 8 high-coverage human genomes regarding these particular classes of structural variation compared to other datasets. Our results demonstrate, for the first time, the ability to assay both copy and content of complex regions of the human genome, opening these regions to disease association studies and further population and evolutionary analyses.The algorithms we have developed will provide a much needed step towards a highly reliable and comprehensive structural variation discovery framework, which, in turn will enable genomics researchers to better understand the variations in the genomes of newly sequenced human individuals including patient genomes

Presentation (PDF File)

Back to Workshop I: Next-generation Sequencing Technology and Algorithms for Primary Data Analysis