Using experimental results of protein design to guide biomolecular energy-function development

Wait 5 sec.

by Hugh K. Haddox, Gabriel J. Rocklin, Francis C. Motta, Devin Strickland, Samer F. Halabiya, Cameron Cordray, Hahnbeom Park, Eric Klavins, David Baker, Frank DiMaioComputational models of macromolecules have many applications in biochemistry, but physical inaccuracies limit their utility. One class of models uses energy functions rooted in classical mechanics. The standard datasets used to train these models are limited in diversity, pointing to a need for new training data. Here, we sought to explore a new paradigm for training an energy function, where the Rosetta energy function was used to design de novo proteins. Experimental results on these designs were then used to identify failure modes of design, which were subsequently used as a “guiding principle” to retrain the energy function. Specifically, we examined a diverse set of de novo protein designs experimentally tested for their ability to stably fold, identifying unstable designs that were predicted to be stable by the Rosetta energy function. Using deep mutational scanning, we identified single amino-acid mutations that rescued the stability of these designs, providing insight into common failure modes of the energy function. We identified one key failure mode, involving steric clashing in protein cores. We identified similar overpacking when using Rosetta to refine high-resolution protein crystal structures, quantified the degree of overpacking, and refit a small set of energy-function parameters to better recapitulate native-like packing. Following fitting, we largely eliminated the failure mode in the refinement task, while retaining performance on other benchmarks, resulting in an updated version of the Rosetta energy function. This work shows how learning from protein designs can guide energy-function development.