IPAM Institute for Pure and Applied Mathematics UCLA NSF
Skip Navigation Links
Home
People
Programs
Visitors
Contact
Donate
Search

Combining multiple genomic data sources into reliable predictions of protein complexes

Ronald Jansen
Memorial Sloan-Kettering Cancer Center
Computational Biology Center

Genome-wide screens for protein-protein interactions can magnify the inaccuracies of interaction experiments because the protein pairs that do not interact vastly outnumber those that do. One strategy to address this problem is to look at the combined evidence in multiple interaction datasets. I will illustrate Bayesian network methods for integrating multiple interaction datasets into a single interaction map that best replicates the observed physical contacts in structures of known protein complexes. Similar methods can be used to integrate genome-wide experimental interaction datasets into a unified map of protein complexes, with assignments of probabilities rather than binary values (‘existing’ or ‘not existing’) to each protein pair. In addition to the integration of existing interaction data, we can achieve reliable de novo predictions of protein complexes by combining genomic features that are only weakly associated with interaction (such as mRNA co-expression, co-essentiality and co-localization). We observe that such de novo predictions are more accurate than existing high-throughput experimental datasets at comparable levels of sensitivity. We validated some predictions with new TAP-tagging experiments.

NSF Math Institutes   |   Webmaster