Protein activity is regulated tightly in biological environments. For many proteins, their regulatory mechanisms can be understood in terms of how their 3D structures, or conformations, change during activation. Such molecular models of regulation, however, cannot be constructed for several major pharmaceutical proteins, like GPCRs and immune cell receptors, whose active and inactive states can only be distinguished from each other when their finite temperature conformational ensembles are considered alongside their minimum-energy conformations. Molecular simulations have contributed enormously to the mechanistic understanding of the former category of proteins, but their applicability toward the latter category of proteins requires new methods that can statistically analyze differences between high-dimensional ensemble data and also relate them to protein activity regulation. As such, partial solutions have been formulated, but there are no solutions yet that take into account ensemble information from multiple states, which is necessary for understanding regulation. Here I will present the development of a new class of rigorous solutions that are based on “inverse” machine learning. In the traditional sense, machine learning is used for data classification, that is, a classification function, or machine, is first trained on a set of data points with known group identities, and then used to predict the group identities of unclassified data. In principle, conformational ensembles can also serve as training data, and the trained classification functions can, in turn, be used to predict group identities of new conformations. We have now shown that if the classification function were to be constructed and trained appropriately in some Hilbert space, then the classification function itself is capable of providing causalities in the physical space, and yield molecular-level mechanistic insight into protein activity regulation.
Back to Workshop I: Machine Learning Meets Many-Particle Problems