Historically, the relationship between structure and function in biology has been elucidated iteratively through experimentation and interpretation: experiments are performed; scientists interpret them; then new experiments are performed to interrogate open questions. Hope has arisen that machine learning (especially supervised learning) will enhance scientists' ability to interpret results. If realized, this enhanced capability would accelerate the pace of scientific discovery. Here, we argue that good experiment selection is a critical partner to supervised learning in this goal. Good experiment selection can make our understanding robust to flaws in supervised learning predictions while poor selection can do the opposite. After surveying work on Bayesian sequential experimental design and Bayesian active learning for building interpretable physical models, we focus on iterative experimentation for optimizing peptides. In peptide optimization, we show how ideas from Bayesian optimization can be used to create robustness to prediction errors when using an interpretable supervised learning model. This approach can dramatically improve the effectiveness of machine learning in materials and drug discovery tasks. We show how these ideas were used in a recent collaboration to discover short peptides with specific enzymatic activity, which led to the creation of an orthogonal peptide labeling system.
This is joint work with L. Tallorin, J. Wang, W.E. Kim, S. Sahua, N.M. Kosa, P. Yang, M. Thompson, M.K. Gilson, N.C. Gianneschi and M. D. Burkart