Revolutionizing CRISPR technology with artificial intelligence

Wait 5 sec.

IntroductionEarly genome engineering was developed on the basis of DNA-binding proteins such as zinc-finger proteins1,2 and transcription activator-like effector arrays3,4,5. Protein-based genome editing technologies have worked well so far with their unique advantages, but their design often involves a time-consuming process of protein engineering6,7,8. The later discovery of the CRISPR–Cas system9,10,11,12 brought a fundamental turn by providing a simplified and versatile method for targeted gene perturbation13. Among the various CRISPR applications, the CRISPR–Cas9 system has become the most widely used, using a guide RNA (gRNA) to navigate the Cas9 protein to a specific genomic locus for precise DNA cleavage. This breakthrough has expanded the possibilities of genetic engineering and positioned CRISPR as a key tool across multiple disciplines, including genetics, molecular biology and biomedicine.As CRISPR technology evolves, its applications extend beyond the original CRISPR–Cas9 system to various platforms for specific genetic changes14,15,16. There are currently three CRISPR-derived DNA editing technologies that have entered clinical trials17: Cas nucleases, base editors and prime editors (Fig. 1). CRISPR nucleases, such as Cas9 (refs.9,10,11,12) and its engineered variants, induce double-strand breaks (DSBs) that trigger DNA repair, resulting in insertions or deletions (indels). However, unintended indels caused by inaccurate nuclease activity of CRISPR pose risks of permanent damage, which limits precision in applications. To overcome this concern, alternative technologies have emerged, incorporating effector domains that enable desired changes without the direct action of Cas nucleases. Base editing, for example, combines deaminases with catalytically defective Cas proteins to precisely convert C-to-T or A-to-G nucleotides18,19,20. Prime editing, a more refined technique, supports a wider range of genetic modifications—including insertions, deletions and all other types of substitution—by fusing an engineered reverse transcriptase and an extended gRNA21 (Fig. 1). In addition to DNA editing technologies, other CRISPR-based tools have been devised that utilize RNA-modifying enzymes for RNA editing, transcriptional effectors for gene regulation and epigenomic modifiers for epigenome editing14.Fig. 1: CRISPR-based genome editing tools.CRISPR-based genome editing tools include CRISPR nucleases, base editors and prime editors. Cas nucleases are composed of the Cas9 protein and single guide RNA (sgRNA), which induce DSBs. CRISPR nucleases can be used for insertions, deletions and point mutations, as well as enabling chromosomal translocations. Base editors are composed of a catalytically impaired Cas9, sgRNA and a deaminase. Base editors primarily mediate C-to-T or A-to-G conversions without generating DSBs. Prime editors consist of a Cas9 nickase, an engineered reverse transcriptase and pegRNA, which contains a PBS and reverse transcription (RT) template encoding the desired edit. Prime editors are capable of small insertions and deletions and can facilitate all types of point mutation. The editing efficiency of each tool can be enhanced through AI-driven approaches.Full size imageDespite this remarkable progress, challenges still exist with CRISPR technology: variable efficiencies across cell types, dependence on sequence context for on-target activities and frequent off-target activities throughout the genome. To address these issues, advanced computational approaches are increasingly being used to optimize CRISPR systems22. Artificial intelligence (AI), particularly machine learning (ML), has become indispensable in refining gRNA design, predicting off-target effects and improving editing efficiency through the analysis of large datasets from diverse experiments. AI-driven models are effective in enhancing current CRISPR technologies, as well as guiding the development of cutting-edge tools. In this Review, we will explore how AI is revolutionizing CRISPR technology, focusing on improving CRISPR nuclease, base editing and prime editing for more precise and scalable genome-editing applications.AI-driven innovation in genome engineeringAI enables computers to perform tasks that typically require human intelligence. A subset of AI, ML, involves algorithms that learn from data to identify patterns and make predictions or decisions23. It encompasses various approaches (Table 1), including supervised learning, unsupervised learning and reinforcement learning. Deep learning (DL)24, a specialized area within ML, leverages artificial neural networks and supports various learning approaches, making it a powerful tool for processing complex data.Table 1 AI models used in this Review and their acronyms.Full size tableIn ML25, supervised learning is a common approach in which a model is trained on a labeled dataset, with each training example paired with an output label. This method is effective when input data and corresponding labels are available, enabling the model to learn a function that generates the correct outputs based on the input data. Alternatively, unsupervised learning processes unlabeled data, allowing the model to identify hidden patterns. This often involves clustering data points to identify similarities or characteristics. Reinforcement learning26 is a method of interacting with an environment, taking specific actions and receiving feedback (rewards) based on the outcomes, gradually learning to maximize these rewards through repeated interactions. Unlike traditional ML methods, DL can be applied across supervised, unsupervised and reinforcement learning, making it especially versatile and effective in processing large and complex datasets.Building on the foundational principles of DL, generative AI, including language models, has emerged as a powerful tool for generating new content and structures by learning from existing data. Recently developed DL technologies have enabled the prediction of protein structures based on amino acid sequences. Unlike traditional physics-based models, AI-driven approaches derive insights from large-scale data27. In 2020, Google DeepMind introduced AlphaFold1 (ref.28), marking the beginning of DL-based protein structure prediction. The subsequent development of AlphaFold2 (ref.29) in 2021 achieved near-experimental accuracy. Around the same time, the Baker laboratory introduced RoseTTAFold30, another DL-based protein structure prediction model. After that, models such as RoseTTAFoldNA31 and RoseTTAFold All-Atom32 were developed to extend structure prediction capabilities to nucleic acids and atomic-level modeling, respectively. More recently, AlphaFold3 (ref. 33) was developed, integrating generative AI models to extend beyond protein structure prediction, enabling the modeling of interactions with nucleic acids and other biomolecules. The development of AI-driven protein structure prediction models has profoundly transformed structural biology and was recognized with the 2024 Nobel Prize in Chemistry. These systems illustrate the broad potential of AI-driven models, with applications spanning fields from language processing to biological research.The integration of AI technologies has addressed key limitations in conventional CRISPR technology, thereby enhancing genome editing with greater precision and efficiency. In particular, AI-based models are valuable in designing optimal gRNAs and predicting off-target effects from large genomic datasets, notably improving genome editing performance. By automating and optimizing processes previously performed manually, AI allows researchers to focus on solving more complex problems. Moreover, beyond simple data analysis, AI-driven models generate novel DNA, RNA and amino acid sequences34,35, bringing innovation to genome editing fields that were difficult to access. The following sections discuss the contribution of AI to the advancement of CRISPR technology, drawing from relevant research articles.Enhancing CRISPR–Cas nucleases with AIThe CRISPR–Cas nuclease system is an RNA-guided endonuclease originally identified in bacterial immune systems9. It stores information about the nucleic acids of previously invading pathogens to cut them if they invade again. The CRISPR–Cas system can be divided into two parts: the Cas protein, which binds and cleaves DNA, and the gRNA, which directs the Cas protein to specific target DNA (Fig. 1). Due to its relatively easy design, the CRISPR–Cas system has become a powerful tool for genome editing. However, it can induce off-target effects, and DNA repair mechanisms against Cas-mediated cleavage are difficult to predict36.Optimizing gRNA design for high activity in the CRISPR–Cas systemExperiments using the CRISPR–Cas system can vary considerably depending on the gRNA used. Therefore, knowing the activity of gRNA in advance can greatly enhance the success rate of experiments. Accordingly, several AI-based research models have been developed to predict gRNA activity (Fig. 2 and Table 2).Fig. 2: AI-driven prediction and engineering in CRISPR systems.The timeline illustrates AI-assisted advancements in CRISPR-based genome editing, categorizing developments in gRNA design, off-target prediction, editing outcome prediction and protein engineering. ML and DL models are applied to predict the on-target and off-target activities of gRNA, along with editing efficiency and patterns. These models also facilitate the design of optimized gRNAs. Furthermore, AI-driven protein design can be utilized for protein engineering. Advanced tools such as AlphaFold3 enable structure-based discovery, amino acid homology-based exploration and Cas miniaturization. The color-coded classification differentiates models targeting Cas nucleases (in green), base editors (BE, in orange) and prime editors (PE, in blue), highlighting their specific contributions to genome engineering. Through these advancements, AI continues to refine CRISPR technologies, enhancing precision and broadening their applications in genome engineering.Full size imageTable 2 CRISPR–Cas nuclease system-associated AI prediction models.Full size tableFirst, Doench et al.37 assembled tiling gRNA pools for six and three endogenous genes in mice and humans, respectively, and evaluated their ability to generate null alleles through SpCas9. They classified the top 20% of gRNAs with high activity, investigated their sequence features and applied this information to develop the gRNA activity prediction model, Rule Set 1. Wong et al.38 utilized the previous dataset37 to classify and compare the top 20% and bottom 20% of gRNAs, identifying structural and sequence features to develop an improved gRNA activity prediction model. Subsequently, Doench et al.39 constructed a human/mouse genome-targeting gRNA library and leveraged it to establish the on-target activity prediction model, Rule Set 2. They also conducted screening with a library that included not only perfectly matched gRNAs but also those with insertions, deletions or mismatches, which led to the derivation of the cutting frequency determination (CFD) score for off-target activity prediction. Following this, Rule Set 3 (ref. 40) elucidated how variations among trans-activating CRISPR RNA (tracrRNA) variants could influence gRNA activity, incorporating these insights into a prediction model using tools such as light gradient boosting machine (LightGBM).To build a reliable prediction model, it is crucial to either obtain a sufficient quantity of data or to secure high-quality data. To this end, some studies have utilized available datasets, while others have improved screening platforms. Chari et al. generated an in vivo library-on-library methodology, which involved lentiviral integration of targets and conducting two rounds of testing, to evaluate multiple gRNAs across approximately 1,400 genomic loci. Using screening results from multiple human cell lines with SpCas9 and St1Cas9, they created sgRNAScorer41, a tool for predicting gRNA activity. In a separate case, Chuai et al. formulated DeepCRISPR42, a DL model that utilizes gRNA data with known on-target efficacy and off-target profiles. The model predicts both on-target efficiencies and genome-wide off-target effects of Cas9 simultaneously. This study addressed data imbalances through augmentation and bootstrapping to enhance model performance. Additionally, Kim et al. performed a high-throughput screening of 12,832 target sequences in human cells using a library that included the target DNA and the corresponding gRNA. They designed the activity prediction model DeepSpCas9 (ref. 43) using a convolutional neural network (CNN), which showed better generalization across different datasets compared with existing models. Another large dataset of 23,902 gRNAs led to the creation of CRISPRon44, an efficiency prediction model for gRNAs. This study also confirmed that the binding energy between gRNA and DNA is a key factor in feature analysis.In addition to human and mouse cell lines, researchers have also developed prediction models based on data from other species and cell lines. For instance, a study on zebrafish embryos tested 1280 gRNAs targeting 128 genes, as well as 640 alternative gRNAs, including truncated, extended and 5′-mismatched variants. Based on the screening results, the prediction model CRISPRscan45 was created using logistic and linear regression to analyze sequence features. This research identified several determinants, such as guanine enrichment and adenine depletion in gRNA activity. A distinct study conducted a genome-scale screening in Escherichia coli using a library of approximately 70,000 gRNAs to develop a gRNA prediction model for Cas9 and its variants46. This study is expected to aid in designing new antimicrobial agents.Because existing prediction models were tailored to the organism from which the training data were derived, a platform capable of designing gRNAs specific to individual organisms was required. In response, Deepguide47 was designed using gRNA activity profiles for SpCas9 and LbCas12a in Yarrowia lipolytica and was used to predict high-activity gRNAs on the basis of this dataset. This model can be applied to various other species through retraining. An additional study used a two-plasmid positive selection system in E. coli, which included a toxin expression plasmid that allowed only cells with the corresponding toxin removed by on-target activity to survive. Using these data, the prediction model crisprHAL was developed48, which enables accurate predictions and can be applied to other bacteria.Reducing off-target effects in the CRISPR–Cas systemThe term off-target effect refers to the unintended activity of genome engineering tools at sites other than the intended target. While on-target activity is certainly important, if the off-target effects are severe, severe off-target effects may make it challenging to apply this system. Thus, predicting off-target activity is essential for enhancing the safety and specificity of the CRISPR–Cas system (Fig. 2 and Table 2). Several studies has focused on addressing this issue through better prediction methods.Listgarten et al.49 developed a model named Elevation for off-target prediction, which performs gRNA–target pair scoring and gRNA summary scoring. In a parallel study, Lin et al. utilized data from previous research to create CRISPR-Net50, a model that predicts gRNA off-target activity through a long-term recurrent convolutional network (LRCN). Another study51 focused on analyzing the molecular interactions between RNA and DNA in the CRISPR system through simulations. This research collected various genome-wide off-target datasets and trained an extreme gradient boosting (XGBoost) classification model to calculate off-target scores and gRNA specificity scores51. By contrast, Toufikuzzaman et al.52 formulated CRISPR-DIPOFF, an off-target prediction model leveraging recurrent neural networks (RNNs) on the datasets utilized in DeepCRISPR42. CRISPR-DIPOFF addresses the precision–recall trade-off, a common limitation in conventional off-target prediction models, by using RNN variants such as vanilla RNN, long short-term memory (LSTM) and gated recurrent unit (GRU) to enhance the balance between precision and recall. Among these, LSTM outperformed the others and was selected as the final model for CRISPR-DIPOFF.Improving performance with Cas variantsNumerous variants of SpCas9 with enhanced activity and specificity have emerged, prompting the development of prediction models for these engineered variants. One of the first models, DeepHF53, was developed to predict gRNA activity based on genome-scale screening results in human cells, focusing on the high-fidelity versions of SpCas9, such as eSpCas9(1.1)54 and SpCas9-HF155. After that, Kim et al.56 conducted high-throughput screening, targeting more than 26,000 lentiviral integration sites and 78 endogenous sites in human cells to evaluate and compare the protospacer adjacent motif (PAM) compatibility of SpCas9, xCas957 and SpCas9-NG58. Based on these results, a DL model was formulated to predict on-target and off-target activities of Cas9 variants. Furthermore, a model that can predict and compare the activity of various Cas9 variants on target sequences was created using screening data from 13 SpCas9 variants across 26,891 target sequences59, facilitating the selection of the appropriate platform.Refining editing outcomes and mutation patterns in the CRISPR–Cas systemThe ability to predict the pattern of editing outcomes before designing an experiment can effectively enhance the chances of achieving the desired results (Fig. 2 and Table 2). First, Shen et al. conducted a screening to characterize the repair products after template-free Cas9 cleavage in five human and mouse cell lines. From these results, they designed inDelphi, a model to predict genotypes and frequencies of 1–60-bp deletions and 1-bp insertions60. They found that certain gRNAs could guide the repair process to produce a single genotype in at least 50% of editing products. In addition, they pinpointed human pathogenic alleles that could be corrected without a template and achieved over 50% correction in practice. In a different study, Allen et al. used a library containing both the gRNA and the target sequence, with variable contextual configurations around the target sequence, to screen over 40,000 gRNAs and examine the repair outcomes. They found that the repair results are biased and local sequence dependent, leading to the development of FORECasT61. This research also revealed that repair tendencies vary across cell lines. Similarly, Leenay et al. developed a model called SPROUT62 using the repair outcomes of Cas9-mediated cleavage at >1600 target sites in primary T cells obtained from 18 individuals. The model demonstrated higher accuracy than inDelphi and FORECasT when tested on data from other primary T cells and induced pluripotent stem cells. In another study, mutations resulting from Cas9 and repaired patterns were profiled for over 6000 targets in human cells63. With this dataset, Lindel was generated to predict mutation outcomes considering the local sequence context. This model exhibited greater accuracy in predicting the ratio of insertions to deletions compared with FORECasT.Expanding applications to other CRISPR systemsIn addition to the Cas9 system, other systems have been discovered64, and considerable research has been conducted on predictive models to predict their functionality. Unlike the Cas9 system, which requires an RNA composed of CRISPR RNA (crRNA) and tracrRNA, the Cas12a (Cpf1) system needs only crRNA65,66. For AsCas12a (AsCpf1), a CNN-based DL model called DeepCpf1 (ref. 67) was developed. This model incorporates chromatin accessibility information to enhance performance and can accurately predict AsCas12a activity in 125 cell lines. Compared with an earlier classification model, CINDEL68, DeepCpf1 notably improves the accuracy of activity predictions. Both models applied libraries containing the target sequence and corresponding gRNA, enabling the evaluation of Cas12a activity in a high-throughput manner.Unlike the CRISPR–Cas9 and CRISPR–Cas12 system, which target DNA, the CRISPR–Cas13 system targets RNA69,70,71. TIGER72, a DL model based on CNN, was designed for RfxCas13d (CasRx)71. This model outperformed earlier models, Cas13design73 and DeepCas1374, that used a random forest (RF) model and a convolutional recurrent neural network (CRNN)-based model when compared across four evaluation metrics (Pearson, Spearman, AUROC (area under the receiver operating characteristic curve) and AUPRC (area under the precision-recall curve)). The study found that the model can predict the efficacy for both perfect-match and mismatched gRNAs and that such mismatches can affect Cas13d activity.Generating Cas proteins and discovering new proteinsThe research on Cas proteins has been as comprehensive as studies involving gRNA or related sequences (Fig. 2). Alonso-Lerma et al.75 investigated the origins of the Cas9 protein using ancestral sequence reconstruction techniques to resurrect the ancestors of Cas9 (anCas). They selected five asCas9 proteins and then utilized the DL model AlphFold2 to predict their structure. Through deep structural analysis and activity profiling of these asCas9s, they uncovered the functional evolutionary trajectory of Cas9. Furthermore, Zhao et al.76 devised an interaction, dynamics and conservation (IDC) strategy to minimize the protein size using structural information such as nucleoprotein interaction, dynamic conformation reorganization and ortholog conservation. The authors utilized AlphaFold2 to predict the structure of the Cas13 protein, and developed five miniaturized variants by using the IDC strategy, which effectively minimizes protein size while maintaining enzymatic structure by removing unnecessary regions. In a different study, Yoon et al.77 created a structure homology-based search pipeline by leveraging a Foldseek-clustered AlphaFold database78,79. They used two representative HEPN domains (HEPN1 and HEPN2) of Cas13 as search queries and identified a novel ancestral clade of Cas13 (Cas13an). Cas13an, one-third the size of other Cas13 proteins, exhibited strong RNA depletion and effective antiphage defense in E. coli.Moreover, a generative model named Evo80 was devised to predict and generate biological sequences ranging from the molecular level to a genome-wide scale. Evo was trained on over 80,000 genome and metagenome datasets from prokaryotic genome and phage genomes. It successfully predicted functions across DNA, RNA and proteins, as well as gene essentiality at nucleotide resolution. Evo also generated novel CRISPR–Cas complexes along with their gRNAs, transposable elements and genome-scale coding sequences of approximately 650 kilobases. In a separate study, Ruffolo et al.81 adjusted a language model, like ProGen2 (ref. 82), to generate multiple new proteins. They utilized the CRISPR–Cas Atlas, a dataset comprising 1,246,163 CRISPR operons, to create millions of novel CRISPR–Cas sequences. In addition, they selected 238,917 Cas9 sequences from the CRISPR–Cas Atlas to train the model and generate Cas9-like sequences. From the generated Cas9-like sequences, 209 were tested for functionality in human cells, and the top hit, OpenCRISPR-1, was identified. Despite having lower sequence similarity to SpCas9, OpenCRISPR-1 exhibited increased activity. Importantly, this study represents the first successful instance of human genome editing using an entirely AI-designed complex, highlighting the transformative potential of AI-driven gene-editing technologies.Advancing CRISPR base editing with AIThe most well-known base editors are cytosine base editors (CBEs)50 and adenine base editors (ABEs)18 that mediate C-to-T and A-to-G substitutions, respectively (Fig. 1). Base editors were engineered by fusing a deaminase protein to a catalytically impaired Cas protein, which retains its binding activity, enabling precise single-base substitution without generating DSBs. Deaminases are base-modifying enzymes that act on single-strand DNA (ssDNA). After gRNA-dependent CRISPR targeting, the CRISPR complex unwinds the target DNA, exposing a short ssDNA segment. The cytosine or adenine deaminase then catalyzes the conversion of cytosine or adenine within exposed ssDNA, transforming the deaminated cytosine to thymine or adenine to guanine through DNA repair or replication mechanisms. Base editors show higher editing efficiency with low indel frequencies83; however, they can cause bystander editing near the target base and have the limitation that they cannot edit all kinds of point mutation.Improving the efficiency of base editorsBase editors possess substantial potential as genome editing tools for fundamental research and gene therapy. Yet, their application has been restricted by the considerable variability in editing performance across target sites. With the help of AI technology, several research teams have sought to address this issue by developing predictive models for base editing efficiency (Fig. 2 and Table 3).Table 3 Base editor system-associated AI prediction model.Full size tableThrough the analysis of a dataset from 38,538 target sequences in mammalian cells, Arbab et al. generated a ML model called BE-Hive84, characterizing the relationship between sequence context and the activity of CBE and ABE. They used BE-Hive to correct over 3000 disease-associated and pathogenic transversion variants (SNVs) and 174 pathogenic transversion SNVs with high precision. Furthermore, they developed new CBE variants, such as EA-BE4 and eA3A-BE5, providing narrow-window base editing. EA-BE4 (BE4 H47E + S48A) is a precise base editor that reduces undesired cytosine-to-guanine transversions while preserving the editing properties of the original BE4. eA3A-BE5 (eA3A-BE4 T44D + S45A) enhances single-nucleotide precision and reduces byproducts without sacrificing efficiency. In another investigation, Song et al. evaluated the efficiency of BE4 (rAPOBEC1-CBE) and ABE7.1018 (ecTadA7.10-ABE) in over 13,000 target sequences in human cells and created DeepBaseEditor (DeepCBE and DeepABE)85, a DL-based computational model. They applied this tool to predict the efficiency and outcome frequencies of CBE and ABE in target sequences. In a separate study, Marquart et al. performed an extensive analysis of ABEmax (ecTadA7.10-ABE)86, BE4max (rAPOBEC1-CBE)86, ABE8e (ecTadA8e-ABE)87 and Target-AID (PmCDA1-CBE)20 using a large lentiviral library containing gRNA-expressing cassette and target DNA88. They formulated an attention-based DL algorithm called BE-DICT, which can predict base editing results with remarkable accuracy. The authors conducted a comprehensive comparison of BE-Hive, DeepBaseEditor and BE-DICT, finding similar levels of accuracy. However, unlike the other models, BE-DICT offers a per-base module that identifies highly preferred or disfavored motifs in existing base editors, enabling the prediction of new variants with enhanced activity. Similarly, Pallaseni et al.89 measured the editing frequencies of FNLS and BE4GamRA for CBE (engineered rAPOBEC1-CBE), and ABE8e and ABE8.20-m for ABE (engineered ecTadA7.10-ABE)87, across approximately 14,000 target sequences in a human cell line and discovered a new sequence bias that considerably impacts the editing efficiency at specific positions. Based on these findings, they developed FORECasT-BE, a ML model for predicting per-position editing activity. Despite variations in datasets depending on cell type, FORECasT-BE demonstrated predictive accuracy comparable to that of BE-Hive84 and DeepBaseEditor85 across various high-throughput cellular contexts.C-to-G base editors (CGBEs)90,91 are an alternative base editing tool that induces C-to-G transversion mutations, whereas traditional base editors, such as CBEs and ABEs, primarily induce transition mutations. Koblan et al.92 generated an additional dataset by characterizing 10 CGBE variants (various CBE derivatives, lacking a uracil DNA glycosylase inhibitor or incorporating DNA-repair proteins) at more than 10,000 target sites in mammalian cells. These data were used to train a model similar to the traditional BE-Hive84, leading to the development of the CGBE-Hive model92, which predicts the purity and yield of editing outcomes for CGBEs. Using CGBE-Hive, they designed CGBEs and their gRNAs to introduce desired edits, successfully correcting the amino acid sequences of 546 disease-related SNVs with over 90% accuracy.Moreover, researchers have expanded the scope by modeling a large number of base editor variants. Kim et al.93 devised DeepCas9variants and DeepBE, which predict the base editing activity and outcomes of 63 base editors, formed by combining nine PAM-compatible Cas9 variants and seven deaminase variants including CBEs, ABEs and CGBEs. They conducted a DL-based computational model using three sublibraries consisting of a total of 47,475 pairs of gRNAs and targets. When tested on three base editors not included in the training datasets, DeepBE achieved an average Pearson correlation value of 0.77, demonstrating high performance.Strengthening base editing predictions through endogenous target analysisMost studies have relied on synthetic environments to generate high-throughput data for training models. However, artificial target sites using lentiviral integration often overestimate the editing activity or exhibit poor correlation with editing outcomes at some endogenous targets. Li et al.94 designed an automated platform for genome editing at 1210 endogenous target sites. Based on results from in situ genome engineering, they generated the chromatin accessibility enabled learning model (CAELM), which integrates chromatin accessibility and sequence context to predict the outcomes of CBE (BE4max, AncBE4max and hyA3A-BE4max)86. In a comparable approach, Yuan et al.95 obtained a genome-wide dataset from about 5000 endogenous target sites to address these discrepancies. By analyzing data, they discovered that base editing of ABEmax-F148A or YE1-BE3-FNLS is affected by endogenous factors such as transcriptional activity, chromatin accessibility, and DNA and histone modifications at endogenous target sites. The researchers also developed a DL algorithm called BE_Endo, which incorporates endogenous factors and sequence information. Because CBE activity is more likely to be influenced by endogenous factors than ABE, BE_Endo shows better predictive performance for CBE.Minimizing off-target effects in CBE and ABEThe genome contains numerous sequences that are similar to the target site where gRNAs can bind. As base editors utilize DNA-binding properties of CRISPR, they can induce genome-wide off-target effects96,97, similar to CRISPR nucleases. In addition, the deaminase enzyme in base editors may exhibit CRISPR-independent off-target activity98. Therefore, predicting off-target effects of base editors and understanding the mismatch tolerance between gRNAs and target DNA is crucial. Zhang et al.99 designed gRNA off-target pairs for ABE and CBE, generating off-target efficiency datasets comprising 54,663 and 55,727 entries in human cells, respectively (Fig. 2 and Table 3). Leveraging these data, they developed DL models, ABEdeepoff and CBEdeepoff (BEdeepoff), to predict off-target sites at endogenous loci. These tools can help to reduce the off-target effects associated with base editing.Discovering enhanced cytidine deaminases for base editingHuang et al.100 used AlphaFold2 to model the structure of 238 protein sequences from various deaminase families and found many ssDNA and dsDNA cytidine deaminases by clustering them on the basis of structural similarities. They further applied AI-assisted structural prediction to minimize the size of ssDNA deaminase (Sdd). Using the miniaturized Sdd, they created a CBE that could be packaged into a single vector for AAV-based CRISPR–Cas9 base editing. Likewise, Xu et al.101 conducted both amino acid homology searches and structure-based similarity analysis using the three-dimensional structures generated by AlphaFold2. A total of 1483 APOBEC-like deaminases were identified through amino acid homology analysis, and they were classified into 184 clusters using AI-based structural prediction. The study uncovered deaminases with high editing efficiency and robust activity across diverse sequence contexts. In another study, multiple synthetic adenine deaminases were discovered from a TadA-like sequence dataset using ProGen282 and ProteinMPNN102, just as OpenCRISPR-1 was identified from a Cas9-like sequence dataset81. The newly engineered adenine deaminases, when fused to SpCas9 or OpenCRISPR-1 to create ABEs, demonstrated robust A-to-G editing. These findings suggest that AI could substantially broaden the scope of base editor applications (Fig. 2).Elevating CRISPR prime editing with AIPrime editing is a gene-editing technology that allows more diverse edits compared with CRISPR nuclease and base editing. The prime editor21 used in prime editing consists of two main components: a protein component, which is a nickase Cas9 fused to a reverse transcriptase, and the prime editing gRNA (pegRNA), which is extended from gRNA with a primer binding site (PBS) and a reverse transcription template (RTT). Once the spacer sequence of the pegRNA binds to the target site, the nickase Cas9 generates a single-strand break on the non-target DNA. After nicking, a segment of ssDNA is exposed near the cut site, providing a substrate for the reverse transcriptase. This exposed ssDNA anneals to the PBS sequence of the pegRNA to initiate the reverse transcription. The reverse transcriptase then synthesizes a DNA, complementary to the RTT sequence, using the pegRNA as a template. This newly synthesized DNA contains information for the intended edit and replaces the original DNA sequence through intracellular DNA repair mechanisms. Through this mechanism103, prime editing mediates substitutions, small insertions and deletions at the target sites without generating DSBs21,104,105 (Fig. 1). However, the relatively low editing efficiency remains a drawback that needs improvement. DL and ML models can accurately predict the efficiency of prime editors by leveraging large experimental datasets. In particular, by utilizing AI models in the optimization process of pegRNAs, experimental attempts can be reduced, saving both time and costs (Fig. 2 and Table 4).Table 4 Prime editor system-associated AI prediction model.Full size tableOptimizing pegRNA design for prime editingThe various versions of prime editors differ in their efficiency and accuracy, making the selection of the appropriate version and the prediction of its efficiency a critical challenge in prime editing. In response to this hurdle, multiple research teams have focused on using AI to predict prime editing efficiency (Fig. 2 and Table 4).Kim et al.106 evaluated the efficiency of prime editor 2 (PE2) across various sequences and genomic contexts using 54,836 combinations of pegRNAs and target sequences. They analyzed factors that affect PE2 efficiency, such as the length of the pegRNA, including PBS and RTT regions, and the GC content of the target sequence. Based on these data, the authors established a computational model, DeepPE, to predict PE2 efficiency. In a different study, Easy-Prime107 was developed to optimize pegRNA and an additional nicking gRNA. The nicking gRNA improves PE activity by specifically cleaving nonedited alleles, thereby increasing the overall efficiency of prime editing, an approach referred to as the PE3 system. Researchers assessed the features affecting PE efficiency and found that the RNA folding feature was critical.With a different large dataset using self-targeting pegRNA libraries, PRIDICT108 was created to predict the efficiency of PE and expected editing outcomes. Training from the data with 92,423 pegRNAs for targeting 13,349 mutations, PRIDICT provides potential pegRNAs with PRIDICT scores, based on their editing efficiencies and unintended editing rates. In another study, Yu et al.109 tested prime editing efficiency on 338,996 pegRNA pairs, including 3979 engineered pegRNAs and target sequences. Based on these datasets, they devised computational models called DeepPrime and DeepPrime-FT to predict the editing efficiency of all types of edits, up to three base pairs, across seven cell types and eight prime editing systems. The researchers also formulated DeepPrime-OFF to predict editing efficiency at mismatched targets, which is useful for reducing the off-target effects.By incorporating transfer learning, an optimized prime editing design (OPED)110 was generated to predict efficiency and optimize the design of pegRNA. This model improves both the accuracy and generalizability of predicting pegRNA efficiency. OPED demonstrated the successful introduction of various ClinVar pathogenic mutations with PE2, PE3/PE3b and ePE systems. In addition, the authors offer OPEDVar database, a web application with optimized PE designs for over two billion pathogenic variant candidates.Focusing on short sequence insertions among the editing outcomes of prime editors, the MinsePIE (modeling insertion efficiency for prime insertion experiments) algorithm111 was developed to predict insertion efficiency. This model analyzed 3604 pegRNAs across various cell lines and genomic sites, identifying key factors for efficient PE activity, such as sequence length, cytosine content and secondary structure, as well as DNA repair mechanisms such as TREX1 and TREX2. The authors also showed that MinsePIE aids in selecting codon variants of common fusion tags that achieve high insertion efficiency.Enhancing prime editing efficiency in chromatin contextsThe efficiency of prime editing can vary depending on the genomic environment, such as the chromatin state and epigenetic modifications surrounding the target gene. One of the most well-known cellular factors that substantially influence prime editing is the mismatch repair (MMR) pathway. The MMR pathway is a DNA repair mechanism that recognizes and corrects mismatches in the DNA112. During prime editing, the MMR pathway may identify the newly reverse transcribed DNA introduced by the prime editor as an error and remove it, thereby reducing the overall efficiency of prime editing. To address these factors, the PRIDICT108 model was further refined into PRIDICT2.0 and ePRIDICT113, enhancing the accuracy of prime editing efficiency predictions. PRIDICT2.0 was trained using datasets from different cell lines and contexts, such as MMR-proficient and MMR-deficient cells, allowing predictions across diverse genomic conditions. It can predict editing outcomes, including insertions, deletions and substitutions, for edits up to 15 base pairs in length. Furthermore, ePRIDICT expanded on PRIDICT2.0 by integrating local chromatin features. This model specifically accounts for the chromatin environment, which affects the efficiency of prime editing. It is particularly useful for predicting editing efficiency in different chromatin states, such as active transcription regions or heterochromatin, helping researchers to customize their approach on the basis of the genomic environment.Boosting prime editing efficiency with additional factorsAs the MMR pathway can limit the success of prime editing by removing DNA containing the desired edits, inhibiting the MMR pathway helps to achieve higher prime editing activity114,115,116. Prime editing efficiency was improved through the key MMR pathway protein MLH1 by expressing a dominant-negative form (MLH1dn), a strategy known as the PE4 system114. To approach a similar strategy in a novel way, Park et al.117 used AI to generate a new small binder protein that binds to MLH1 and inhibits the MMR pathway. The authors utilized RFdiffusion118 and AlphaFold3 to design this small binder protein (MLH1-SB), and by incorporating it into the prime editor, they observed an improvement in prime editing efficiency. By using AI-assisted protein engineering to create new elements that can boost genome editing efficiency, a much wider variety of genome editing tools will emerge.ConclusionThe convergence of AI and CRISPR technology has massively increased activity and specificity in genome engineering. AI technology, which excels at analyzing and recognizing features and patterns from large amounts of experimental data, predicts the efficiency and accuracy of gene editing. Computational tools can quickly perform complex calculations by searching for systems based on amino acid sequences or protein structural homology, identifying unknown CRISPR-like systems from vast amounts of genomic information. In addition, AI-driven protein design models are used to design new proteins for genome engineering that do not exist in nature. AI technology plays an important role in reducing the number of experiments by computationally suggesting the most optimal pathway, reducing the probability of failure. In this Review, we have outlined that CRISPR-mediated genome engineering can be improved and uncovered using AI tools.Nevertheless, current technical limitations constrain the full potential of AI. The performance of AI models is highly dependent on the conditions and quality of the experimental dataset. If the specific conditions of an organism, tissue or cell type differ, the predictions made by ML may not match the experimental results. In addition, while generative AI enables the design of new proteins, the original data on which they are trained are derived from existing proteins in nature; therefore, they cannot suggest brand-new designs. There may also be missing information about the properties of the protein, and the functionality of the AI-designed protein needs to be validated. However, AI technology is evolving at an accelerated pace; thus, these problems will be solved in the near future.With the recent clinical approvals of gene-editing therapies, genome engineering technologies become critical in their ability to directly correct the DNA mutations that cause rare genetic diseases17. In therapeutic applications, CRISPR technologies must achieve greater precision, high efficiency and safety, not only at the cellular level but also within the human body. AI-enhanced CRISPR activity prediction and functional improvements will help to develop personalized treatments for each patient, considering the complexity of their individual genomes. By effectively applying AI, researchers can develop safer and more precise CRISPR tools for clinical use, leading the way forward in gene therapy.Consequently, AI decreases the burden on researchers in genome engineering by improving existing CRISPR technologies and making it possible to create new systems. By learning from the extensive experimental data in genome editing, along with predictive modeling, AI will continue to elevate the efficiency, accuracy and safety of CRISPR technologies. As AI-driven CRISPR technologies become more sophisticated, they will accelerate the development of next-generation personalized therapies for genetic disorders, bringing a new era of precision medicine.ReferencesBibikova, M., Beumer, K., Trautman, J. K. & Carroll, D. Enhancing gene targeting with designed zinc finger nucleases. Science 300, 764–764 (2003).CAS PubMed Google Scholar Porteus, M. H. & Baltimore, D. Chimeric nucleases stimulate gene targeting in human cells. Science 300, 763–763 (2003).PubMed Google Scholar Kim, H. & Kim, J.-S. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014).CAS PubMed Google Scholar Kim, Y. et al. A library of TAL effector nucleases spanning the human genome. Nat. Biotechnol. 31, 251–258 (2013).CAS PubMed Google Scholar Miller, J. C. et al. A TALE nuclease architecture for efficient genome editing. Nat. Biotechnol. 29, 143–148 (2011).CAS PubMed Google Scholar Cho, S. I. et al. Engineering TALE-linked deaminases to facilitate precision adenine base editing in mitochondrial DNA. Cell 187, 95–109 e26 (2024).CAS PubMed Google Scholar Lim, K. Mitochondrial genome editing: strategies, challenges, and applications. BMB Rep. 57, 19–29 (2024).CAS PubMed PubMed Central Google Scholar Lim, K., Cho, S. I. & Kim, J. S. Nuclear and mitochondrial DNA editing in human cells with zinc finger deaminases. Nat. Commun. 13, 366 (2022).CAS PubMed PubMed Central Google Scholar Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).CAS PubMed PubMed Central Google Scholar Cho, S. W., Kim, S., Kim, J. M. & Kim, J.-S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).CAS PubMed Google Scholar Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).CAS PubMed PubMed Central Google Scholar Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).CAS PubMed PubMed Central Google Scholar Eid, A. & Mahfouz, M. M. Genome editing: the road of CRISPR/Cas9 from bench to clinic. Exp. Mol. Med 48, e265 (2016).CAS PubMed PubMed Central Google Scholar Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).PubMed PubMed Central Google Scholar Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).CAS PubMed Google Scholar Jang, H. K., Song, B., Hwang, G. H. & Bae, S. Current trends in gene recovery mediated by the CRISPR–Cas system. Exp. Mol. Med. 52, 1016–1027 (2020).CAS PubMed PubMed Central Google Scholar Healey, N. Next-generation CRISPR-based gene-editing therapies tested in clinical trials. Nat. Methods 30, 2380–2381 (2024).CAS Google Scholar Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).CAS PubMed PubMed Central Google Scholar Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).CAS PubMed PubMed Central Google Scholar Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).PubMed Google Scholar Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).CAS PubMed PubMed Central Google Scholar Abudayyeh, O. O. & Gootenberg, J. S. Programmable biology through artificial intelligence: from nucleic acids to proteins to cells. Nat. Methods 21, 1384–1386 (2024).CAS PubMed Google Scholar Sarker, I. H. Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2, 160 (2021).PubMed PubMed Central Google Scholar LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).CAS PubMed Google Scholar Tarca, A. L., Carey, V. J., Chen, X. W., Romero, R. & Draghici, S. Machine learning and its applications to biology. PLOS Comput. Biol. 3, e116 (2007).PubMed PubMed Central Google Scholar Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 2nd edn (The MIT Press, Cambridge, 2018).Baek, M. & Baker, D. Deep learning and protein structure modeling. Nat. Methods 19, 13–14 (2022).CAS PubMed Google Scholar Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).CAS PubMed PubMed Central Google Scholar Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein–protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).CAS PubMed PubMed Central Google Scholar Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).CAS PubMed PubMed Central Google Scholar Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).CAS PubMed Google Scholar Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).CAS PubMed Google Scholar Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).CAS PubMed PubMed Central Google Scholar Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).CAS PubMed PubMed Central Google Scholar Callaway, E. ‘ChatGPT for CRISPR’ creates new gene-editing tools. Nature 629, 272 (2024).CAS PubMed Google Scholar Zhang, X. H., Tee, L. Y., Wang, X. G., Huang, Q. S. & Yang, S. H. Off-target effects in CRISPR/Cas9-mediated genome engineering. Mol. Ther. Nucleic Acids 4, e264 (2015).CAS PubMed PubMed Central Google Scholar Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).CAS PubMed PubMed Central Google Scholar Wong, N., Liu, W. & Wang, X. WU-CRISPR: characteristics of functional guide RNAs for the CRISPR/Cas9 system. Genome Biol. 16, 218 (2015).PubMed PubMed Central Google Scholar Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).CAS PubMed PubMed Central Google Scholar DeWeirdt, P. C. et al. Accounting for small variations in the tracrRNA sequence improves sgRNA activity predictions for CRISPR screening. Nat. Commun. 13, 5255 (2022).CAS PubMed PubMed Central Google Scholar Chari, R., Mali, P., Moosburner, M. & Church, G. M. Unraveling CRISPR–Cas9 genome engineering parameters via a library-on-library approach. Nat. Methods 12, 823–826 (2015).CAS PubMed PubMed Central Google Scholar Chuai, G. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 19, 80 (2018).PubMed PubMed Central Google Scholar Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).CAS PubMed PubMed Central Google Scholar Xiang, X. et al. Enhancing CRISPR–Cas9 gRNA efficiency prediction by data integration and deep learning. Nat. Commun. 12, 3238 (2021).CAS PubMed PubMed Central Google Scholar Moreno-Mateos, M. A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR–Cas9 targeting in vivo. Nat. Methods 12, 982–988 (2015).CAS PubMed PubMed Central Google Scholar Guo, J. et al. Improved sgRNA design in bacteria via genome-wide activity profiling. Nucleic Acids Res. 46, 7052–7069 (2018).CAS PubMed PubMed Central Google Scholar Baisya, D., Ramesh, A., Schwartz, C., Lonardi, S. & Wheeldon, I. Genome-wide functional screens enable the prediction of high activity CRISPR–Cas9 and –Cas12a guides in Yarrowia lipolytica. Nat. Commun. 13, 922 (2022).CAS PubMed PubMed Central Google Scholar Ham, D. T. et al. A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets. Nat. Commun. 14, 5514 (2023).CAS PubMed PubMed Central Google Scholar Listgarten, J. et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat. Biomed. Eng. 2, 38–47 (2018).CAS PubMed PubMed Central Google Scholar Lin, J. C., Zhang, Z. L., Zhang, S. X., Chen, J. Y. & Wong, K. C. CRISPR-Net: a recurrent convolutional network quantifies CRISPR off-target activities with mismatches and indels. Adv. Sci. 7, 1–17 (2020).Chen, Q. et al. Genome-wide CRISPR off-target prediction and optimization using RNA–DNA interaction fingerprints. Nat. Commun. 14, 7521 (2023).CAS PubMed PubMed Central Google Scholar Toufikuzzaman, M., Hassan Samee, M. A. & Sohel Rahman, M. CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction. Brief. Bioinform. 25, 1–10 (2024).Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).PubMed PubMed Central Google Scholar Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).CAS PubMed Google Scholar Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).CAS PubMed PubMed Central Google Scholar Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).CAS PubMed Google Scholar Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).CAS PubMed PubMed Central Google Scholar Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).CAS PubMed PubMed Central Google Scholar Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).CAS PubMed Google Scholar Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).CAS PubMed PubMed Central Google Scholar Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. https://doi.org/10.1038/nbt.4317 (2018).Leenay, R. T. et al. Large dataset enables prediction of repair after CRISPR–Cas9 editing in primary T cells. Nat. Biotechnol. 37, 1034–1037 (2019).CAS PubMed PubMed Central Google Scholar Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 47, 7989–8003 (2019).CAS PubMed PubMed Central Google Scholar Moon, S. B., Kim, D. Y., Ko, J. H. & Kim, Y. S. Recent advances in the CRISPR genome editing tool set. Exp. Mol. Med. 51, 1–11 (2019).CAS PubMed Google Scholar Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).CAS PubMed PubMed Central Google Scholar Yamano, T. et al. Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell 165, 949–962 (2016).CAS PubMed PubMed Central Google Scholar Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).CAS PubMed Google Scholar Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).CAS PubMed Google Scholar Cox, D. B. T. et al. RNA editing with CRISPR–Cas13. Science 358, 1019–1027 (2017).CAS PubMed PubMed Central Google Scholar Abudayyeh, O. O. et al. RNA targeting with CRISPR–Cas13. Nature 550, 280–284 (2017).PubMed PubMed Central Google Scholar Konermann, S. et al. Transcriptome engineering with RNA-targeting type VI-D CRISPR effectors. Cell 173, 665–676 (2018).CAS PubMed PubMed Central Google Scholar Wessels, H. H. et al. Prediction of on-target and off-target activity of CRISPR–Cas13d guide RNAs using deep learning. Nat. Biotechnol. 42, 628–637 (2024).CAS PubMed Google Scholar Wessels, H. H. et al. Massively parallel Cas13 screens reveal principles for guide RNA design. Nat. Biotechnol. 38, 722–727 (2020).CAS PubMed PubMed Central Google Scholar Cheng, X. et al. Modeling CRISPR–Cas13d on-target and off-target effects using machine learning approaches. Nat. Commun. 14, 752 (2023).CAS PubMed PubMed Central Google Scholar Alonso-Lerma, B. et al. Evolution of CRISPR-associated endonucleases as inferred from resurrected proteins. Nat. Microbiol. 8, 77–90 (2023).CAS PubMed PubMed Central Google Scholar Zhao, F. et al. A strategy for Cas13 miniaturization based on the structure and AlphaFold. Nat. Commun. 14, 5545 (2023).CAS PubMed PubMed Central Google Scholar Yoon, P. H. et al. Structure-guided discovery of ancestral CRISPR–Cas13 ribonucleases. Science 385, 538–543 (2024).CAS PubMed PubMed Central Google Scholar Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).CAS PubMed PubMed Central Google Scholar van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).PubMed Google Scholar Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).CAS PubMed PubMed Central Google Scholar Ruffolo, J. A. et al. Design of highly functional genome editors by modeling the universe of CRISPR–Cas sequences. Preprint at bioRxiv https://doi.org/10.1101/2024.04.22.590591 (2024).Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 e3 (2023).CAS PubMed Google Scholar Song, Y. et al. Large-fragment deletions induced by Cas9 cleavage while not in the BEs system. Mol. Ther. Nucleic Acids 21, 523–526 (2020).CAS PubMed PubMed Central Google Scholar Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).CAS PubMed PubMed Central Google Scholar Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).CAS PubMed Google Scholar Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).CAS PubMed PubMed Central Google Scholar Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).CAS PubMed PubMed Central Google Scholar Marquart, K. F. et al. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Nat. Commun. 12, 5114 (2021).CAS PubMed PubMed Central Google Scholar Pallaseni, A. et al. Predicting base editing outcomes using position-specific sequence determinants. Nucleic Acids Res. 50, 3551–3564 (2022).CAS PubMed PubMed Central Google Scholar Chen, L. et al. Programmable C:G to G:C genome editing with CRISPR–Cas9-directed base excision repair proteins. Nat. Commun. 12, 1384 (2021).CAS PubMed PubMed Central Google Scholar Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2021).CAS PubMed Google Scholar Koblan, L. W. et al. Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat. Biotechnol. 39, 1414–1425 (2021).CAS PubMed PubMed Central Google Scholar Kim, N. et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Nat. Biotechnol. 42, 484–497 (2024).CAS PubMed Google Scholar Li, S. et al. Automated high-throughput genome editing platform with an AI learning in situ prediction model. Nat. Commun. 13, 7386 (2022).CAS PubMed PubMed Central Google Scholar Yuan, T. et al. Deep learning models incorporating endogenous factors beyond DNA sequences improve the prediction accuracy of base editing outcomes. Cell Discov. 10, 20 (2024).CAS PubMed PubMed Central Google Scholar Kim, D. et al. Genome-wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35, 475–480 (2017).CAS PubMed Google Scholar Kim, D., Lim, K., Kim, D. E. & Kim, J. S. Genome-wide specificity of dCpf1 cytidine base editors. Nat. Commun. 11, 4072 (2020).CAS PubMed PubMed Central Google Scholar Grunewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019).CAS PubMed PubMed Central Google Scholar Zhang, C. et al. Prediction of base editor off-targets by deep learning. Nat. Commun. 14, 5358 (2023).CAS PubMed PubMed Central Google Scholar Huang, J. et al. Discovery of deaminase functions by structure-based protein clustering. Cell 186, 3182–3195 (2023).CAS PubMed Google Scholar Xu, K. et al. Structure-guided discovery of highly efficient cytidine deaminases with sequence-context independence. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-024-01220-8 (2024).Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).CAS PubMed PubMed Central Google Scholar Shuto, Y. et al. Structural basis for pegRNA-guided reverse transcription by a prime editor. Nature 631, 224–231 (2024).CAS PubMed PubMed Central Google Scholar Choi, J. et al. Precise genomic deletions using paired prime editing. Nat. Biotechnol. 40, 218–226 (2022).CAS PubMed Google Scholar Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol. 40, 731–740 (2022).CAS PubMed Google Scholar Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).CAS PubMed Google Scholar Li, Y., Chen, J., Tsai, S. Q. & Cheng, Y. Easy-Prime: a machine learning-based prime editor design tool. Genome Biol. 22, 235 (2021).CAS PubMed PubMed Central Google Scholar Mathis, N. et al. Predicting prime editing efficiency and product purity by deep learning. Nat. Biotechnol. 41, 1151–1159 (2023).CAS PubMed Google Scholar Yu, G. et al. Prediction of efficiencies for diverse prime editing systems in multiple cell types. Cell 186, 2256–2272 (2023).CAS PubMed Google Scholar Liu, F. et al. Design of prime-editing guide RNAs with deep transfer learning. Nat. Mach. Intell. 5, 1261 (2023).Google Scholar Koeppel, J. et al. Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants. Nat. Biotechnol. 41, 1446–1456 (2023).CAS PubMed PubMed Central Google Scholar Li, G.-M. Mechanisms and functions of DNA mismatch repair. Cell Res. 18, 85–98 (2008).CAS PubMed Google Scholar Mathis, N. et al. Machine learning prediction of prime editing efficiency across diverse chromatin contexts. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02268-2 (2024).Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635–5652 (2021).CAS PubMed PubMed Central Google Scholar Ferreira da Silva, J. et al. Prime editing efficiency and fidelity are enhanced in the absence of mismatch repair. Nat. Commun. 13, 760 (2022).CAS PubMed PubMed Central Google Scholar Park, J.-C. et al. MutSα and MutSβ as size-dependent cellular determinants for prime editing in human embryonic stem cells. Mol. Ther. Nucleic Acids 32, 914–922 (2023).CAS PubMed PubMed Central Google Scholar Park, J.-C., Uhm, H., Kim, Y.-W., Oh, Y. E. & Bae, S. AI-generated small binder improves prime editing. Preprint at bioRxiv https://doi.org/10.1101/2024.09.11.612443 (2024).Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).CAS PubMed PubMed Central Google Scholar Download referencesAcknowledgementsThis work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (RS-2023-00210965 and RS-2024-00339116) and the Korea Institute of Science and Technology (KIST) Institutional Program (2V10573 and 2E33791). Figures were created using BioRender.com.Author informationAuthor notesThese authors contributed equally: Min-gyeong Kim, Min-ji Go.Authors and AffiliationsBiomedical Research Division, Korea Institute of Science and Technology, Seoul, Republic of KoreaMin-gyeong Kim, Min-ji Go, Soo-hwan Jeong & Kayeong LimDivision of Bio-Medical Science and Technology, KIST School, University of Science and Technology, Seoul, Republic of KoreaMin-gyeong Kim, Min-ji Go, Soo-hwan Jeong & Kayeong LimGraduate School of Biomedical Science and Engineering, Hanyang University, Seoul, Republic of KoreaSeung-Hun KangAuthorsMin-gyeong KimView author publicationsSearch author on:PubMed Google ScholarMin-ji GoView author publicationsSearch author on:PubMed Google ScholarSeung-Hun KangView author publicationsSearch author on:PubMed Google ScholarSoo-hwan JeongView author publicationsSearch author on:PubMed Google ScholarKayeong LimView author publicationsSearch author on:PubMed Google ScholarCorresponding authorCorrespondence to Kayeong Lim.Ethics declarationsCompeting interestsThe authors declare no competing interests.Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissionsOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.Reprints and permissionsAbout this article