PON-Del predictor for sequence retaining protein deletions

Wait 5 sec.

by Haoyang Zhang, Muhammad Kabir, Mauno VihinenProtein deletions are frequent among both disease-causing and tolerated variants. Several mechanisms at the DNA, RNA and protein levels can lead to deletions. Many deletions are misclassified in the literature and databases, especially when the mRNA is degraded by the cellular quality-control mechanism. We developed a novel predictor for sequence retaining protein deletions, i.e., variants that do not alter the sequence downstream of the deletion site. We collected an extensive dataset of verified protein deletions, each described by a comprehensive set of context, content, position, and gene-based features. We evaluated both statistical and deep learning algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants are typically classified into two categories: either pathogenic or benign. However, there is always a third class of variants: variants of uncertain significance (VUSs), which have been ignored by all previous methods. PON-Del is the first deletion interpretation method that includes VUSs. It provides two outputs, binary and three-state prediction with VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.