Annotation of Protein Misfunction: SNPs, Databases and Disease

John Moult
University of Maryland

Many human inherited disease traits are related to single nucleotide polymorphisms (SNPs) in the genes of individuals. Much remains to be learned about the mechanisms that link SNPs to disease. We have developed structure and sequence based models of the impact of SNPs on protein function in vivo. The model has been applied to a set of single nucleotide variants known to cause monogenic disease, and to a set found in the human population, and not known to be associated with disease. There are two surprising findings from the analysis. First, most monogenic disease causing variants act by mildly destabilizing protein structure. The results suggest that most proteins are only just sufficiently stable to operate effectively in vivo. Second, about 1/3 of the SNPs found in the population and not known to be associated with disease appear to seriously impair function at the molecular level. Examination of a set of these cases suggests a variety of mechanisms that make the larger scale system robust with respect to component defects. Some mechanisms, such as simple feedback loops and component redundancy, are familiar from system engineering. Others, such as ‘fuzzy switches’, in which control is distributed over multiple connections in the protein network, are novel. Network level robustness analysis provides insight into the complex trait properties of common diseases, in which disease susceptibility is a subtle property of a combination of SNPs and environmental factors. It also allows the identification of those SNPs that most likely contribute to susceptibility to complex diseases.

More generally, this study illustrates the utility of combining information from multiple databases with more traditional computational biology and structure analysis methods. Integration of all data is facilitated by a ‘knowledge net’ interface, intended to allow a rapid assessment of the known relationships between proteins relevant to a particular disease, as well as access to molecule level information and to the supporting literature.



Back to Long Programs