Combining multiple genomic data sources into reliable predictions of protein complexes
Ronald Jansen Memorial Sloan-Kettering Cancer Center Computational Biology Center
Genome-wide screens for protein-protein interactions can magnify the inaccuracies of interaction experiments because the protein pairs that do not interact vastly outnumber those that do. One strategy to address this problem is to look at the combined evidence in multiple interaction datasets. I will illustrate Bayesian network methods for integrating multiple interaction datasets into a single interaction map that best replicates the observed physical contacts in structures of known protein
complexes. Similar methods can be used to integrate genome-wide experimental interaction datasets into a unified map of protein
complexes, with assignments of probabilities rather than binary values (‘existing’ or ‘not existing’) to each protein pair. In addition to the integration of existing interaction data, we can achieve reliable de novo predictions of protein complexes by combining genomic features that are only weakly associated with interaction (such as mRNA co-expression, co-essentiality and co-localization). We observe that such de novo
predictions are more accurate than existing high-throughput experimental datasets at comparable levels of sensitivity. We validated some
predictions with new TAP-tagging experiments.
|