D-LIM: A neural network for interpretable gene–gene interactions

Wait 5 sec.

by Shuhui Wang, Alexandre Allauzen, Philippe Nghe, Vaitea OpuuRecent advances in gene editing can produce large genotype–fitness maps for targeted genes, yet predicting the effects of mutations between genes remains challenging. Indeed, biochemical models require knowledge of underlying parameters and interactions, whereas machine learning methods typically lack interpretability, as they do not link model parameters to biological quantities. We introduce D-LIM, a neural network that infers low-dimensional fitness landscapes directly from mutation–fitness data. The distinctive feature of D-LIM is that it assumes genes act through independent gene-specific molecular phenotypes whose nonlinear interactions determine fitness. When this assumption holds, the model yields accurate predictions and interpretable effective phenotypes. Conversely, failure reveals that a low-dimensional model is insufficient. Applied to deep mutational scanning of metabolic pathways, protein–protein interactions, and yeast environmental adaptation, D-LIM achieves state-of-the-art predictive accuracy. The inferred phenotype–fitness landscapes reveal whether epistatic interactions can be captured by a low-dimensional continuous model and identify potential trade-offs. Moreover, D-LIM estimates mutational effects on the effective phenotypes, enabling weak extrapolation beyond the training domain. D-LIM demonstrates how simple structure constraints in a neural network can help inference and hypothesis generation in biology.