Revolutionising English language education:empowering teachers with BERT-LSTM-driven pedagogical tools

Wait 5 sec.

IntroductionOne of the current concerns in higher education is training educated, skilled, and highly experienced university graduates, for whom proficiency in a foreign language is a prerequisite set by companies (Oksana et al., 2020). A competitive graduate needs to be able to speak freely in another language, not just informally, but also to obtain the information required for professional activities (Ganikhanova et al., 2020). They also need to be able to construct monologues and dialogic speeches in straightforward communication scenarios across various communication domains and write formal business letters that adhere to official English language conventions. Developing educational and methodological frameworks that target building essential competencies for foreign language interaction is a key strategy to address this issue. Many language universities are currently working on large-scale projects to provide instructional materials to the next generation of students. Writing teaching and methodical allowances began with a collection and generalisation of the content of scientific articles by contemporary researchers who emphasised the challenges of creating instructional aids, with a focus on the primary requirements that should be taken into account during their creation (Gabarre and Gabarre, 2020).For instance, students in non-linguistic areas preparing for Pre-Intermediate and Intermediate levels are meant to use the educational and methodical manual “Professional English Language”, which covers the disciplines of “Business English” and “Professional English Language” in educational technology (Alla et al., 2020). The manuscript focuses on improving the method of educating business and technical English students in non-linguistic training areas; the development of students’ linguistic, professional, and communicative competencies; the ability to establish and sustain business contacts; and an account of how English is used as a tool for interaction and communication in the dialogue of modern world cultures all contributed to the manuscript’s need for publication (Varlakova et al., 2023). This instructional resource conforms to a specified publishing type. It offers methodological materials to educators and learners for skillfully constructing classroom assignments, efficiently organising individual work, and being ready for hands-on companies and professional English lessons (Zubkov, 2023).Any public institution that wants to implement managerial change must prioritise employee empowerment. With its strong propensity to allow managers to manage, it has been significant in the New Public Management (NPM) movement and is regarded as essential to solving organisational issues (Meuleman, 2021). Major accountability reforms in the field of public education, such as the Every Student Succeeds Act (ESSA) and No Child Left Behind (NCLB), have also emphasised the need for teacher empowerment to improve the calibre of frontline public services (Richerme, 2020). This is especially true when it comes to teachers’ use of performance information. Research dating back to the founding of the human relations school demonstrates how empowerment at work fosters both individual and organisational development in various settings. Empowerment improves organisational performance, fosters more creative behaviour, gives employees a stronger voice, increases work happiness, enhances commitment, and reduces the intention to leave public organisations. Teachers have a significant impact on young people’s mindsets, which helps to shape society. Nonetheless, the shortage of teachers is a global issue that affects educational institutions worldwide.There are several obstacles to teaching English. It can be challenging for teachers to create lessons that cater to every student because of the wide range of learning styles and skill levels among them (Pan et al., 2023). Many schools lack access to high-quality instructional resources and technologies. Teachers often lack sufficient training in cutting-edge teaching techniques and technology, which can negatively impact their effectiveness as educators because of burnout caused by excessive workloads. Feedback and assessments were equally troublesome. Standardised assessments have the potential to reduce tailored feedback and cause stress in students, and they frequently lack useful instruments for continuous evaluation. It may not be easy to maintain students’ motivation and engagement when educational materials are culturally inappropriate. It is increasingly challenging to provide effective instruction when addressing behavioural issues and teaching large classes, making it difficult to offer one-on-one instruction (Stevenson et al., 2020). Retention of traditional curricula that do not consider language use in the contemporary world also limits the adoption of innovative pedagogy.Technology integration in the classroom can help to address many of these issues. Technology tools help individualise training for different learning needs and styles. Learning can become more significant and engaging by becoming more interactive and using digital resources and materials to increase engagement and motivation (El-Sabagh 2021). Technology provides instructors with avenues to access materials for professional development and informs them of innovative strategies and resources. Standardised testing can be minimised by utilising advanced test systems that provide instant feedback on students’ progress and tailored feedback (Zimmer & Matthews, 2022). Apart from offering interactive activities that keep students on track, technology allows instructors to manage their classes (Heilporn et al., 2021). The ability to tailor an easy-to-edit virtual curriculum to represent prevailing language and cultural settings maximises learning worth and efficiency. Furthermore, technology can potentially reduce socioeconomic disparities by offering easy-to-use learning tools to students from varying socioeconomic backgrounds. Ultimately, by facilitating teachers’ and complementing students’ learning processes, incorporating technology into pedagogy addresses current problems and enhances English language teaching.AI and ML are life-changing technologies in most sectors, including education. Their application in education can transform conventional teaching models, improve educational opportunities, and facilitate student achievement (Kuleto et al., 2021). The capacity of AI and ML to read extensive amounts of information, recognise patterns, and predict results can be utilised to build more efficient and customised learning interfaces. These technologies provide new solutions to old questions in language education. Conventional English language teaching is not very effective in accommodating the varied demands of groups of learners, providing timely feedback, offering personalised attention, or fostering learner engagement. AI and ML can overcome such constraints by providing interactive teaching aids, automated grading, and adaptive learning spaces that support multiple learning speeds and styles. Some excellent achievements of the AI branch of Natural Language Processing (NLP) include the creation of advanced models, such as Long Short-Term Memory Networks (LSTM) and Bidirectional Encoder Representations from Transformers (BERT). BERT excels in sentence-level word-sense disambiguation and can be applied to most language-oriented tasks. However, long-term dependency and context are acquired over time using long short-term memory (LSTM), which can process sequential data. By collaborating, these models can leverage their capabilities to construct useful tools that facilitate language acquisition and understanding.The main contributions of this study to the investigation of the hybrid BERT-LSTM model’s ability to transform English language instruction are as follows:The BERT model was used as a contextual pre-trained model to improve the effectiveness of language programmes and to combine LSTM to address sequence data to enhance the delivery of educational language support.Developing user-friendly pedagogical tools that combine the hybrid model to assist teachers in various aspects of language teaching, such as comprehension, writing, and vocabulary building.Providing custom learning support, taking into consideration over 16 million attempts for students, allows for individual accommodation depending on learning preferences and difficulties.Combining BERT and LSTM as strategies for language aid uses the strength of the two models to obtain contextual information and analyse the sequence to provide enriched educational tools.Exploring how these tools can support teachers in delivering more personalised, engaging, and effective instruction.The adaptive feedback provided by the BERT and LSTM models aligns with Krashen’s Input Hypothesis, which posits that language learners acquire language most effectively when they are exposed to input slightly beyond their current level of competence (i + 1) (Krashen, 1985). By offering instantaneous corrections and vocabulary suggestions tailored to individual learner performance, the system ensures that comprehensible input is consistently delivered. Moreover, the interactive design of the tools supports a constructivist learning environment in which learners actively engage with content, receive scaffolded guidance, and construct knowledge through meaningful linguistic interactions (Vygotsky, 1978). This theoretical alignment reinforces the relevance of the proposed system in contemporary pedagogical approaches to language education.The remainder of this paper is organised as follows. Related works presents a comprehensive review of previous studies conducted in the same area. Problem statement investigates the proposed plan of action based on the issue description discussed in Data pre-processing. The results of this study are discussed in Results & Discussion. Summative ideas and conclusions of this study are provided in Conclusion and Future Scope.Related worksUsing their midterm test results as primary data, Yağcı (2022) proposed a novel model based on machine learning (ML) algorithms to predict undergraduate students’ final exam grades. The dataset consisted of the academic performance grades of 1854 students enroled in the Turkish Language-I course at a Turkish public institution during the fall semester of 2019–2020. The results show that the proposed model achieves a classification accuracy of 70–75%. The forecasts were constructed using only three categories of specifications: grades from midterm exams, professors, and departments. Such data-driven studies are essential for informing decision making and creating a framework for learning analysis in higher education. One of the weaknesses of this study is its reliance on a limited set of input characteristics to predict final test scores, more specifically, the results from intermediate examinations, departmental data, and staff data.A CNN, an RNN, or a combination of the two was used in the architecture proposed by Diment et al., (2019). A tree-structured Parzen estimator was used in the Bayesian optimisation context to determine the ideal topology and hyperparameters. A dataset of 80 naturally occurring words from 120 speakers was used. Once tested on a representative sample of the dataset, the proposed technique performed satisfactorily. Identifying pronunciation issues in 46 of 49 terms with an elevated level of efficiency. This technique can handle pronunciation mistake detection without the need for large datasets or manually developed features. While the practicality of pronunciation detection systems in the real world hinges on their ability to withstand environmental influences such as noise levels and changes in recording equipment, implementing these tactics could lead to increased computational demands and compromise between the efficiency and effectiveness of the models.Hwang et al., (2024) developed Smart English software to facilitate English as a foreign language (EFL) conversation exercises in authentic contexts. These discussion activities have been divided into “designed talk” and “free talk” categories based on the content of an English textbook and a realistic ambient setting, which incorporates factors such as transportation, climate, and scenic descriptions accordingly. Smart U English’s methods were developed with flexible, long-lasting, and adaptable talks, with the aim of improving English speaking and lexical resources. Statistical analysis revealed that the experimental group demonstrated greater proficiency in learning than the control group. The interview transcripts revealed that participants felt that practising conversations in real-world settings using Smart English might pique their curiosity and inspire them to think and speak English more effectively. Hence, Smart English can benefit people who converse in English and engage in natural conversations everywhere. A disadvantage of Smart English software is that it does not cater to the varying proficiency levels and individual learning approaches of all users. Although the software is intended to allow the practice of conversation in realistic conditions, it may not provide personalised feedback or adaptive learning paths tailored to each student’s strengths and areas of difficulty.Akhtar et al., (2020) used a CNN for pronunciation error analysis of Arabic words.The present study developed three machine learning classifiers that automatically identify pronunciation errors using a CNN feature-based approach that extracts features from layers 6, 7, and 8 of AlexNet.This study also employs an independent feature extraction and classification technique based on transfer learning. A comparative analysis of these approaches with a conventional ML-based approach with MFCC features is presented to test the effectiveness of the proposed approach. KNN, SVM, and RF: The same three classifiers were employed in the baseline approach for mispronunciation detection. Based on transfer learning techniques, deep feature extraction from AlexNet, and handcrafted features, Arabic words were identified with average accuracies of 73.67%, 85%, and 93.20%, respectively, according to the experimental findings. Additionally, the recommended feature selection method achieved the best average accuracy (93.20%) compared with all other approaches. The drawback is that processing efficiency and speed may be challenging owing to the computational demands of deep CNNs.Li Ma (2021) aimed to enhance students’ English learning abilities by examining collegiate E using virtual reality, artificial intelligence, and machine learning to teach English in an immersive situation. Comparing the teaching experiment with two distinct groups of first-year university students, the experimental class used VR technology to facilitate immersive virtual context learning from a constructivist perspective. By contrast, the control class used standard multimedia tools and conventional teaching techniques. Through an average score of 2.8 points higher, the student’s overall English level was also more effective than that of the control class. In the classroom, teachers occupy the majority of the time; students only quietly get much knowledge from teachers, they have little opportunity to participate in the exchange of knowledge and convey ideas in the target language, and they are “immersed” in the Chinese environment most of the time. The disadvantage is that teachers occupy most of the classroom time, which means that students have limited opportunities to actively engage, share information, and express their thoughts in the target language.Changes in the educational system indicate that science and technology are related. For pupils to become engaged and show interest in the subject, mentors must use such technologies when instructing. Studies have been conducted to demonstrate how aspiring science teachers perceive the changes in education brought about by Industry 4.0.The information was gathered using a three-section survey. This survey was created to determine teachers future capabilities, how education is changing, and how proficient teachers are with Information and Communication Technology (ICT). Sixty-six prospective science instructors participated in the in-class teaching as part of this purposive sampling study. The questionnaire had a Cronbach’s alpha value of 0.825, according to the study’s findings, and the prospective teachers showed a positive attitude about adjusting to Industry 4.0. Thus, by employing a range of 12 technologies, students can work on various projects (Sari, Wilujeng 2020).Sun et al., (2020) developed an online smart English teaching system with deep learning (DL) support. It offers useful information from vast amounts of data, condenses regulations and information, and aids educators in raising students’ English proficiency. The thought process of the AI expert system is reflected in this system. The test application demonstrates how the system enhances the relevance of the learning materials and helps students become more efficient. In addition, the system includes a referential definition and provides an example model that utilises comparable techniques. The disadvantage is that although these technologies are capable of offering customised suggestions based on data, they may fail to capture the complex and dynamic elements of human learning, such as motivation, emotional involvement, and unique learning preferences, which are difficult to measure.Several drawbacks have been identified in earlier studies on Machine Learning (ML) and Deep Learning (DL) applications in education. First, a model for estimating final test scores may have overlooked other important variables that influence academic success because it relies on a limited number of input parameters, including midterm grades, departmental data, and staff data. Despite its success, a pronunciation detection method suggested in a different study may encounter difficulties due to computational demands and external factors, such as noise levels. Personalised feedback and adaptive learning paths were not available in the English conversation practice programme because they were unable to accommodate a wide range of competence levels and unique learning requirements. Teacher-centred classroom dynamics hindered pupil participation and active use of the target language, according to a study that employed virtual reality technology for immersive English learning. Finally, even with data-driven individualised suggestions, an online English teaching system with DL help may fail to capture the dynamic and complicated components of human learning, such as motivation and emotional involvement Table 1.Table 1 Comparison of Educational AI and ML-based Methods: Advantages and Limitations.Full size tableProblem statementResearch on the application of ML and DL in education has made significant progress, but some shortcomings remain. Forecasting models of academic achievement, such as those by Yağcı (2022), for example, rely on a few parameters (department, faculty, and midterm grades) without including significant parameters such as attendance, class participation, and assignments; therefore, a more comprehensive model is needed. Pronunciation detection models (Diment et al., 2019; Akhtar et al., 2020) are highly accurate but suffer from practical deployment issues, such as noise sensitivity, variability in recordings, and computational costs, and should be comprehensively tested under various circumstances. Adaptive learning platforms, such as Smart English (Hwang et al., 2024) and AI-driven platforms (Sun et al.,2020), increase participation but do not provide fully personalised learning experiences, with no immediate feedback processes and adaptive paths created for individual learners. Similarly, VR-based immersive learning (Li Ma, 2021) increases language acquisition but is instructor-centred, limiting active student participation and emphasising the need for more interactive, student-controlled VR models. In addition, the training of teachers for Industry 4.0 (Sari, Wilujeng 2020) is promising with regard to adoption patterns, but it is based on small, subjective data sets and thus calls for larger, objective evaluations. To overcome these issues, this study aimed to develop an end-to-end machine learning (ML) model based on diverse learning and behavioural data, enhance the detection of pronunciation based on real-world robustness, and construct an adaptive learning system that incorporates personalised feedback along with interactive virtual reality (VR) environments to enable active learning.Proposed BERT and LSTm models application in English language educationThe same systematic approach was employed in this study to apply advanced NLP models and to improve English learning. The process began with data sampling, which involved more than 16 million exercise attempts conducted by over 72,000 learners. The collection of such a large amount of data allows one to observe various peculiarities of students performance related to their interactions with learning materials, thereby forming a strong foundation for analysing multiple types of learning processes. First, data cleaning was performed, which involved eliminating missing values, removing outliers, and normalising the dataset. Training and test sets were formed using the dataset, and all input values were normalised using the min-max normalisation technique, which improved the model’s performance and resulted in faster convergence. The two main concepts at the centre of the applied method were the construction and training of the BERT and LSTM models. BERT is used in a way that suits the educational task of grammar correction and scoring of essays due to its forward and backward contextual errors. At the same time, LSTM models are employed for the pointed recognition of grammatical mistakes as well as the evaluation of an essay’s quality, benefiting from the neural networks’ capacity to identify long-term dependencies in the data. To implement these models in a holistic system, constructing APIs for data manipulation and prediction production, as well as enhancing performance and graphical user interfaces for educators and learners, are required. The goal of this integrated platform is to provide detailed, real-time feedback and enhance the educational process multiple times by offering support to both teachers and students. Figure 1 provides a high-level conceptual overview of the methodology, illustrating the flow from data collection to the final integrated platform that provides educational tools and instantaneous feedback.Fig. 1Conceptual overview of themethodology.Full size imageData collectionThe Junyi Academy Foundation (2024) curated this dataset, a noteworthy endeavour to enhance online learning research and practice. The collection includes over 16 million exercise attempts, and over 72,000 pupils were painstakingly recorded. These logs provide an extensive record of how students interact with instructional materials, illustrating a range of engagement and learning styles. The goal of this foundation is to enable educators and researchers to investigate and create novel approaches to creating individualised learning experiences by making this abundance of data accessible. This study utilised a dataset comprising over 16 million exercise attempts from 72,000 learners. The data was split into 80% for training (~12.8 million samples), while 10% each (~1.6 million samples) was allocated for validation and testing. This programme fosters multidisciplinary collaboration to shape the future of online learning while also supporting initiatives that promote fair and high-quality education through technology. The samples from the dataset are listed in Table 2.Table 2 Sample Table from the Dataset.Full size tableData pre-processingData cleaningManaging outliers and missing values is the first step in the data analysis and preparation. The first step in addressing missing values is to determine whether each table is complete. Rows with few missing values were eliminated, and, based on the distribution of the data, missing values in the numerical data were imputed using the mean, median, or mode. The most common category was categorised as categorical data. The robustness and integrity of the dataset will be ensured for upcoming analysis and modelling activities by identifying and treating outliers that may appear as abnormally extended exercise completion times using statistical approaches.Data transformationWhen min-max normalisation is employed, the input characteristics are scaled to a standard range, usually between 0 and 1. BERT-LSTM benefits from this normalisation technique because it preserves the training process and enhances convergence. In the absence of an explicit equation, the method above guarantees that all other value points in the information are rescaled linearly with consideration to this range and that the lowest and highest values have been modified to 0 and 1, respectively. By calculating the suggested content or column’s lowest and maximum values, eliminating the smallest value, and reducing the range of standards, the normalisation process alters each data point separately. Eqn. The min-max normalisation is shown in (1).$${Z}_{{norm}}=\frac{Z-{Z}_{\min }}{{Z}_{\max }-{Z}_{\min }}$$(1)Zmax and Zmin These are the largest and lowest values of the features in the dataset, respectively, where Z is the initial feature value. This ensures that the numbers in between are linearly scaled, with the minimum and maximum values set to 0 and 1, respectively. In particular, this normalising technique is beneficial in cases where the features have varying scales because it guarantees consistency across the features and enhances the performance of the machine-learning model during training.Data splittingThe dataset was split into training and testing sets to evaluate the model performance. Consider using cross-validation for a more robust evaluation, especially if the dataset is sufficiently large to support it.Model development for BERT-LSTMBERT was pre-trained to enhance the performance and quality of the NLP models. The core structure of the BERT is the DL structure of the transformer encoder layers. All 12 layers of the encoder consisted of a hidden dimension of 768, and the multihead self-attention layer in BERT was h = 12 (Wang and Kuo, 2020). Using this design, BERT can determine the importance of a word based on its context in a document.To obtain generic language representations, the BERT models were initially pre-trained over large collections of text. During pre-training, BERT was capable of learning the nuances of English language usage. To capture the long-range dependencies of sequential data, such as sentences, BERT uses a transformer architecture.Training with both the left and right contexts produces bidirectional word images, in contrast to other word-embedding models like Word2Vec and GloVe. This allows BERT to produce contextualised word embeddings that consider the surrounding words to capture subtle semantic meanings. With the MLM and NSP objectives, BERT can predict masked words in sentences and whether two sentences are consecutive. There are also several BERT variations, such as BERT-Base and BERT-Large, each with distinctive features and functions optimised for use in multiple languages, alphabets, and layer sizes. BERT’s capacity of BERT to encode input texts, token embeddings, and mutual conditioning for context awareness renders it a useful technique for NLP tasks (Feng et al., 2024).BERT is transformer-based, and its transformer architecture relies on self-attention mechanisms to determine the relative importance of words in a sentence. The attention mechanism enables BERT to acquire knowledge of long-range dependencies and word permutations, which enhances BERT’s performance in context-sensitive tasks.The BERT must be fine-tuned following pre-training to perform specific educational tasks well. Fine-tuning BERT involves adjusting the model parameters to meet specific task requirements, such as grading essays and checking grammar. By pursuing these objectives, BERT can be adapted from a general language model to a specific instructional tool.BERT was trained to identify and correct grammatical mistakes using annotated datasets with a focus on common grammatical errors. The datasets contained sentences that were corrected for common grammatical mistakes. Through this training, BERT learns to identify grammatical mistakes and suggests appropriate corrections. The model is a valuable tool for improving students’ writing, as it can understand the sentence context, thereby making grammatical corrections of higher quality.Training BERT to evaluate essay quality is performed differently. This requires datasets that contain essays with grades provided by human evaluators using metrics such as coherence, structure, vocabulary use, and overall content. By exposing BERT to these graded essays, the model learns to recognise the most important features that determine essay quality. It is well-versed in assessing the coherence of concepts, the logical structure of content, and the density of words. This enabled BERT to provide informative comments and scores that accurately represented students’ overall writing skills.The fine-tuning process involves optimising the parameters of BERT through training on task-specific datasets. It is achieved through this training, in which pre-learned BERT representations are specialised in recognising educational task features. The training comprises numerous iterations, in which the performance accuracy of BERT is tested and calibrated with each iteration. Through fine-tuning with datasets in narrow ways, BERT becomes more accurate for tasks such as grammar checking and essay grading.This shift ensures that the model efficiently supports educational use by providing teachers with effective tools to improve student outcomes.In short, a BERT trained for scholarly work is a domain specialist who can detect grammatical mistakes and grade essays. That is, exact training on annotated data and tuning model parameters to scholarly requirements will eventually equip teachers with sophisticated tools to enhance the quality of English language instruction.More complex recurrent neurones were used for the development of the LSTM. The recurrent neurones of an LSTM can be imagined as a one-cell state (Staudemeyer, Morris 2019). Similar to an RNN, an LSTM determines its current state from its current input from the previous state. The forget, update, and output gates are the three gates employed by an LSTM to control the current neuron.An LSTM network can link current data with historical knowledge. An LSTM is coupled to an input gate, output gate, and forgetting gate. The current and previous outputs are indicated by ${bt}$ and ${bt}-1$, while the input is marked by ${Ct}$ and ${Ct}-1$, which stand for the new and last state, respectively.The ensuing Eqn demonstrates the LSTM input gate concept. (2), (3), and (4)$$jt=\sigma (Zj\cdot [zt-1,xt]+bj)$$(2)$${C}_{t}={\rm{tanz}}\left({Z}_{j}\cdot \left[{b}_{t-1},{b}_{t}\right]+{b}_{j}\right)$$(3)$${C}_{t}={f}_{t}{C}_{t-1}+{j}_{t}{C}_{t}$$(4)Within Eq. (2) a sigmoid layer is used to filter out the data points xt and bt−1 in order to determine which one of them should be added within Eq. (4) The current information, C̃t, and the long-term storage data, Ct−1 into Ct are combined. While C̃t, shows a tanz output, Zj indicates a sigmoid output. In this case, b stands for the bias of the LSTM input gate and Zj for the weight matrices.The dot product and sigmoid layer can then choose to pass information through the forget gate of LSTM. The decision to remove pertinent information from a prior cell has a probability associated with it.Use Eq. (5) to determine whether the relevant data from a prior cell with a certain probability should be saved. ${Zf}$ Symbolises the weight matrix, ${\rm{b}}f$ Represents the offset, and σ represents the sigmoid function.$$ft=\sigma (Zf\cdot [bt-1,xt]+bf)$$(5)The output gate of the LSTM determines the states required for Eqs. (6) and (7) provided by inputs xt and bt-1. Following its acquisition, the final output is multiplied by the state decision vectors that transmit fresh data, Ct, via the tanz layer.$$Pt=\sigma (Zo\cdot [Ct-1,xt]+bo)$$(6)$$bt=Pt{tanz}(Ct)$$(7)Through supervised learning, the LSTM model was trained to identify common grammatical mistakes and provide accurate corrections, thus making it a more effective tool for improving student writing.For essay grading, the LSTM model is taught on graded essays according to predefined parameters, such as coherence, organisation, clarity of arguments, and use of evidence. By learning from these tagged datasets, the LSTM model can grade essays on these essential parameters. It can identify the logical progression of ideas, assess the structuring of content, and identify the depth and clarity of arguments presented in essays. LSTM model training is an iterative learning process for labelled datasets. In this process, the model parameters were continuously adjusted to minimise the prediction errors and enhance the performance of the given tasks. Through this repeated training, the LSTM model became proficient in executing certain educational tasks, such as grammar checking and essay grading, with a high degree of accuracy.Precise planning of the architecture and rigorous training of the LSTM model on relevant learning tasks has become a useful tool for promoting language learning. The ability of LSTM to acquire long-term dependencies and understand sequences makes it highly effective in providing accurate grammatical corrections and insightful essay analysis, thereby becoming a central component of the learning process. Figure 2 shows the architectural diagram of the BERT-LSTM, and its flow chart is shown in Fig. 3. Essay Feedback System: Grammar, Coherence, and Vocabulary Analysis are given in Fig. 4.Fig. 2BERT-LSTM Architectural Diagram.Full size imageFig. 3Flow Chart of BERT-LSTM.Full size imageFig. 4Essay Feedback System: Grammer, Coherence, and Vocabulary Analysis.Full size imageAlgorithm for MLP-GRURequire: Dtrain, DtestRequires BERT-LSTM design hyperparameters.Pre-processing: Outlier manipulation, missing data imputation, min-max normalisation, and hybrid BERT-LSTM model.BERT Component.Input Layer: Tokenise and encode the input text for BERT.BERT Layer: Apply a pre-trained BERT model to extract contextualised embeddings from the input text.Dropout 1: Apply dropout with a rate of 0.1 to the BERT embeddings.LSTM Component:Layer 1: Apply an LSTM layer with 128 units and return sequence-enabled Layer 2: Apply an LSTM layer with 32 units and return sequences disabled to concatenate the outputs of the BERT and LSTM components.Additional Dense Layers.Dense Layer 1: Apply a dense layer of 64 units and ReLU activation. Dropout 2: Apply a rate of 0.3.Dense Layer 2: Apply a dense layer of 32 units and ReLU activation.Output Layer:A softmax activation function and dense layer with the desired number of output classes were used. Compile the Model:Specify the optimiser (Adam with a learning rate of 0.001). We define the loss function as (categorical cross-entropy).Evaluation metrics (accuracy) were set. Train the Model:Train the hybrid model on Dtrain with a validation split and track performance over a predetermined number of epochs.Evaluate the model:Using the specified metrics, evaluate the model’s performance on the Dtest.Based on the performance measurements, the hyperparameters were modified as required.The model was retrained and re-evaluated to maximise the efficiency.Integration of BERT and LSTM Models into a Unified FrameworkThe first task of integration is to develop Application Programming Interfaces (APIs) or libraries that allow input data to be processed and output prediction to be generated for both LSTM and BERT models. APIs are translators that allow dissimilar software units to talk with models without understanding the complexities of the inner workings of each model. By designing well-defined APIs, instructors and developers can directly input students’ writing samples, forecast grammar corrections, and receive essay scores.The compatibility and interactivity efficiency of BERT and LSTM models must be addressed. Several technical and design considerations are required.Establish a system for controlling data flow among models. For example, a student’s essay is initially processed by the BERT model to verify grammar and subsequently by the LSTM model to be marked. A smooth and logical sequence of actions must exist.Standardise the input and output formats so that information can be easily shared across models. This can be achieved using common data structures in which both models can read and execute.The system must be tuned to reduce latency and enhance performance. Parallel processing or the use of high-performance computing resources can be employed to enable the system to respond as quickly as possible to user inputs, making it feasible for use in instantaneous classrooms.Strong error-handling mechanisms are used to compensate for possible errors that occur in the model interaction. This makes the system stable and usable even if it is exposed to errors or atypical inputs.The ultimate goal is to create a one-platform educational system that combines the BERT and LSTM models to provide a unified platform for learning English. The platform offers simplified interfaces for both teachers and learners, as detailed below:Student Portal: Student portals where students are able to upload their assignments, receive immediate feedback on essay and grammar levels, and access personalised learning materials to help them develop language ability. Feedback from the models can help students identify their errors and learn how to improve them. Teacher Dashboard: One main interface through which teachers can upload students’ assignments, see detailed reports on the quality of essays and grammar, and monitor student progress over time. A dashboard can also provide insights and recommendations from the model analysis.By integrating the BERT and LSTM models into a unified framework, the educational platform harnesses the strengths of both models to provide a powerful, multifaceted tool for English language education. This integration not only enhances the capabilities of individual models but also creates a synergistic effect that significantly improves the overall learning and teaching experience.Pedagogical tool designThe scoring rubric includes grammar, coherence, vocabulary, organisation, and content relevance, all of which are scored separately. The essays were scored on a five-point Likert scale (1 = poor, 5 = excellent) for each category. Grammar is not strictly counted as an error but is scored holistically based on its impact on understanding. The AI model agrees with human raters’ criteria for assessment, with readability and meaning taking precedence over error detection. Pedagogical tool design, utilising BERT and LSTM, aims to provide teachers with tangible and user-friendly materials. These tools focus on enhancing various aspects of student writing and language learning.For each category of grammar, coherence, vocabulary, organisation, and content relevance, a five-point scoring rubric was prepared to make annotations consistent and professional. Every step of the scale explained its specific meaning for scoring. In the field of grammar, a score of 1 suggested many errors that caused confusion, a score of 2 indicated many errors that made things less clear, a score of 3 meant some errors did not stop the text from being understood, a score of 4 suggested a few minimal errors and a 5-score meant there were no grammatical mistakes. Refined descriptions were created for each of the remaining groups to ensure consistency with the ratings of others.Each essay was rated separately by two English language experts trained in rubric scoring. For each criterion, scores from both raters were compiled and studied. If the gap between the two scores for an attribute was one point or lower, the average was rounded off as the final label used in training. When the difference between the ratings exceeded one, the raters came together to discuss and jointly assess the essay. Thus, the process made it easier to compare and agree on the labels, which improved the training of the BERT and LSTM models. The key tools and their design considerations are listed in Table 3.Table 3 Pedagogical Tools Overview.Full size tableDesign ConsiderationsTable 4 provides a structured overview of the pedagogical tools designed using BERT and LSTM capabilities, highlighting their functionalities, features, and user interface considerations.Table 4 Design Considerations.Full size tableResults and DiscussionDiscussion of the efficacy, precision, and reliability of the developed models over conventional techniques. The time taken for processing, accuracy, recall, BERT’s F1 score, accuracy, and MSE of LSTM were some of the parameters used to assess the models. Additionally, comparative research has demonstrated the effectiveness of these models in providing reliable and consistent feedback, which is essential for improving instructional frameworks. A Windows 10 operating system and Python programming language were used on a simplified machine to explore these dimensions.Model performanceThe dataset was utilised in conjunction with BERT and LSTM model applications, yielding significant improvements in essay quality assessment and grammatical error detection. Python and Windows 10-based systems were used in this study. An in-depth examination of the performance metrics and the impact of integrating these advanced NLP models into the educational process is discussed in detail in subsequent sections.Grammar error detectionThe BERT model is trained on a massive corpus of sentences with different types of grammatical errors. Performance was measured using the following parameters:Precision: Number of grammatically correct errors divided by the number of errors detected.Recall: Number of grammatically correct errors divided by total errors present.F1 Score: Harmonic mean of precision and recall is a balanced measure of the performance of the model.The values, as given in Table 5, show high recall and precision, which depict the effectiveness of the BERT model in accurately identifying and rectifying grammatical mistakes. The same can be observed graphically in Fig. 5.Table 5 BERT Model Performance Metrics for Grammar Error Detection.Full size tableFig. 5Model Performance of BERT.Full size imageEssay quality assessmentThe LSTM model was trained on essays graded by human evaluators. The evaluation metrics included the following.The model predictions were compared directly with the scores given by human evaluators to assess how closely the system aligned with expert judgement. The LSTM model demonstrated high accuracy and low MSE, suggesting that it can reliably assess essay quality.Table 6 shows the excellent accuracy (0.95) and low MSE (0.09) of the LSTM model, representing its dependability in evaluating the calibration of the essays. These indicators suggest that the LSTM model can accurately and consistently score the essays. The model’s accuracy in matching the grading by human assessors was demonstrated by its low mean squared error (MSE). This is illustrated in Fig. 6.Table 6 Comparative Performance Analysis.Full size tableFig. 6Model Performance of LSTM.Full size imageTable 7 illustrates how the LSTM model of essay grading and the BERT model of grammar checking is superior to traditional methods in terms of speed and consistency. High scores on the BERT model provided consistent and reliable error detection. The LSTM model provides stable and accurate essay reviews owing to its high accuracy and minimal errors, as indicated by the MSE.Table 7 LSTM Model Performance Metrics for Essay Quality Assessment.Full size tableThe training and validation MSE across the epochs of the proposed LSTM model are shown in Fig. 7, which also shows how both errors initially decrease before converging, with a slight divergence. Consistent with the training MSE, the validation MSE also showed possible overfitting behaviours as the model complexity increased with subsequent epochs.Fig. 7Model Performance of LSTM.Full size imageTable 8 presents the theoretical 95% accuracy rates for the proposed BERT and LSTM models for the different types of grammatical mistakes. Based on the given accuracy measure, this implies that the models are effective at detecting and fixing these mistakes. This is illustrated in Fig. 8.Table 8 Accuracy Rates of Different Kinds of Grammatical Mistakes.Full size tableFig. 8Error Wise Accuracy of Grammer Error Subtypes.Full size imageFigure 9 shows the relationship between the number of English words and the average time required for task completion across different network types. As the number of words increases, the time required for the task increases for all network types.Fig. 9Average Time needed for the Task Completion.Full size imageComparative analysisThe performances of the BERT and LSTM models were compared with those of the traditional rule-based grammar checkers and manual essay grading. This comparison focused on several key aspects.Speed: Time taken to process and evaluate student submissions.Accuracy: Correct grammar corrections and essay scores.Consistency: Uniformity of feedback provided to students.The comparative average processing times in seconds for several models, including Grammar Checking and Essay Grading, are shown in Fig. 10 and Table 9. Of the models assessed, Rubric-based Essay Grading showed the longest processing time, whereas BERT showed the shortest, indicating differences in computing needs.Fig. 10Error wise accuracy of grammer error subtypes.Full size imageTable 9 Performance comparison with traditional models.Full size tableTable 10 demonstrates that the proposed method outperforms GPT-based models in terms of processing speed, explainability, computational efficiency, feedback consistency, and scalability, making it the most suitable approach for educational applications that require structured and interpretable testing. Although Transformer-based models have comparable performances, they require more fine-tuning than our model, making our model more cost-effective and practical for real-world applications.Table 10 Performance evaluation of the proposed model vs. GPT-based and Transformer-based Models.Full size tableWhile the proposed BERT-LSTM framework demonstrates superior performance in terms of accuracy and processing speed, a practical usability comparison with established educational tools, such as Grammarly, Criterion, and Write & Improve, would further substantiate its classroom applicability. These platforms are widely adopted by educators and learners for grammar correction and writing feedback, making them critical benchmarks for evaluating the user experience. In future, integrating direct user testing and feedback from teachers and students in live instructional settings would provide valuable insights into the system’s intuitiveness, reliability, and pedagogical impact. Initial classroom trials yielded positive results, with teachers noting improvements in feedback efficiency and students reporting increased engagement and clarity in writing tasks.DiscussionThe comprehensive evaluation of the BERT and LSTM models highlights their superior performance in grammatical error detection and essay quality assessment compared to traditional methods (Ariffin and Tiun, 2024; Uto, 2021). The BERT model, fine-tuned on a diverse dataset of annotated sentences, achieved a high precision (0.94) and recall (0.92), ensuring reliable and consistent grammatical error detection, as shown in Table 4 and Fig. 4. Similarly, the LSTM model trained on essays graded by human evaluators demonstrated excellent accuracy (0.95) and a low mean squared error (MSE: 0.09), underscoring its dependability in essay evaluation. Table 6 highlights the faster and more consistent speeds of the models compared to the rule-based and manual approaches. By contrast, Table 7 and Fig. 7 present excellent accuracy levels in different grammatical error categories, which attest to the strength and efficiency of the models in pedagogical design. The MSE curves for training and validation in Fig. 6 also verify the robustness of the LSTM model despite the mild overfitting tendencies with increased complexity. The BERT and LSTM models offer several advantages over traditional approaches, including faster processing times, improved accuracy, and consistent performance. BERT achieves robust and precise detection of grammatical mistakes in 0.5 Â s, whereas LSTM achieves trustworthy essay grading with a low mean squared error and high accuracy. Models enhance the productivity and reliability of learning systems by providing uniform and accurate feedback to the students.Conclusion and future scopeThe results of this research, particularly on essay assessment and grammar error correction, hint at the efficiency of BERT and the usage of the LSTM model in English language instruction. As effectively as the LSTM model adequately graded essay content with high accuracy, the BERT model that was trained against annotated sets was found to have precise accuracy in correct marking of grammatical mistakes. A comparison with other more conventional methods revealed substantial benefits in terms of consistency, speed, and accuracy of students’ responses. This research demonstrates how advanced natural language processing methods can revolutionise education by providing students with immediate, personalised feedback that enhances their performance in class. Opportunities for future research and development in this area are limited. Expanding the range of applications to other language learning areas, such as vocabulary learning and reading comprehension skills, beyond grammar and essay checkers, would provide more comprehensive assistance to students.Additionally, incorporating multimodal inputs into models, including audio and video essays, can enhance students’ ability to evaluate different forms of student expression. Additionally, there is scope for further personalised learning by exploring the incorporation of AI-based models into learning platforms to create dynamic learning environments tailored to the individual needs of students. Integrating multimodal inputs, such as audio and video essays, in models can be useful in enhancing their capacity to assess various forms of student expression. Additionally, there is the potential for further personalisation of learning by investigating the integration of AI-based models into learning platforms, thereby designing dynamic learning spaces that are adaptable to the unique needs of different learners.