VEFill: accurate and generalizable deep mutational scanning score imputation across protein domains
- Author(s)
- Polunina, PV; Maier, W; Rubin, AF;
- Journal Title
- Molecular Systems Biology
- Publication Type
- Mar 20
- Abstract
- Deep Mutational Scanning (DMS) assays can systematically assess the effects of amino acid substitutions on protein function, but many datasets have incomplete variant coverage due to technical constraints. We developed VEFill (Variant Effect Fill), a gradient boosting model for imputing missing DMS scores across protein domains. Trained on the Human Domainome 1, VEFill integrates ESM-1v sequence embeddings, evolutionary conservation (EVE scores), amino acid substitution matrices, and physicochemical descriptors. The model achieved robust predictive performance (Pearson r = 0.80) and generalized reliably to unseen proteins in stability-based datasets, while showing weaker performance on activity-based assays. Per-protein models confirmed VEFill's effectiveness under limited-data conditions and a reduced two-feature version performed comparably to the full model, suggesting an efficient alternative. Across multiple benchmarking settings, VEFill consistently outperformed baselines once ≥20% of experimental measurements were available. However, true zero-shot prediction without positional context remains challenging, particularly for functionally complex proteins. Overall, VEFill offers an interpretable, scalable framework for DMS score imputation, and enables systematic mutation prioritization including the design of sparse experimental libraries for variant effect studies.
- Publisher
- Springer Nature
- Keywords
- DMS Score Imputation; Deep Mutational Scanning; Machine Learning; Protein Stability; Variant Effect Prediction
- Research Division(s)
- Bioinformatics and Computational Biology
- Publisher's Version
- https://doi.org/10.1038/s44320-026-00203-y
- Open Access at Publisher's Site
https://doi.org/10.1038/s44320-026-00203-y- Terms of Use/Rights Notice
- Refer to copyright notice on published article.
Creation Date: 2026-03-24 02:09:49
Last Modified: 2026-03-24 02:16:41