Proteins that are most essential for functioning and viability of bacterial cell have been shown to exhibit larger number of interactions with other cell components. Thus, by identifying the most connected proteins (or hubs) in protein interaction networks (PINs), one may discover prospective drug targets that can be utilized to combat emergent and drug-resistant pathogens such as Methicillin-Resistant Staphylococcus aureus 252 (MRSA). The advantage of using such hub proteins as drug targets lies in their essentiality, non-replaceable position in the PIN and lower rate of mutation, which can help to counter bacterial resistance.
However, finding or predicting such hub proteins remains a challenging task as the corresponding experiments are very costly, while traditional bioinformatics approaches generally fail in forecasting PIN data due to the general lack of agreement between the existing datasets.
Thus, we have decided to utilize various structural and physicochemical features of proteins, related to traditional QSAR properties for predicting highly connected proteins. Using our own in-house generated PIN for the MRSA cell we have trained a boosting tree-based classifier that uses 75 physical and chemical QSAR descriptors computed for all proteins in the interaction network.
The developed QSAR model has yielded a high prediction accuracy of 80% for the validation set and was used to predict additional hubs in the rest of the MRSA proteome. The predicted hubs have then been evaluated experimentally and 55% of them were confirmed as high interactors what corresponds to >5 fold dataset enrichment for potential hub-proteins provided by the developed QSAR model.