Investigating the Impact of the Stationarity Hypothesis on Heart Failure Detection using Deep Convolutional Scattering Networks and Machine Learning

Wait 5 sec.

IntroductionAccording to the WHO, CVDs are the leading cause of death worldwide. Each year, around 17.9 million people die from these diseases. Most of these deaths are due to coronary heart disease, including strokes caused by cerebrovascular conditions and heart attacks. Together, these two account for 80% of CVD-related fatalities. Middle-aged and older adults are the groups most affected by these health issues1,2.In the early 1900s, CVD accounted for less than 10% of global deaths. However, this number had risen to 30% by 2001. The majority of CVD-related deaths, about 80%, now occur in low- and middle-income countries. In 2020, CVD became the leading cause of death and disability worldwide, with a special increasing in lower income countries2. By 2001, CVD had already become the leading cause of death in developing countries, a trend that had been noted in developed nations since the mid-20th century3.By 2030, researchers predict non-communicable diseases will account for more than three-quarters of all deaths worldwide, meaning that chronic conditions such as CVD, diabetes and cancer will become the primary causes of death globally, surpassing infectious diseases. Specifically, CVD is expected to be responsible for more deaths in low-income countries than infectious diseases such as HIV/AIDS, tuberculosis, and malaria combined, as well as maternal and newborn health issues and nutritional problems4. It is also estimated that around 23.3 million people will die from CVDs each year by 20305. In fact, the WHO has stated that CVDs will continue to play a dominant role in global mortality trends6.The main cardiovascular diseases, which account for at least 80% of the CVD burden across all income regions, include the following three that are the leading causes of illness and death: Ischemic Heart Disease (IHD)7, stroke8, and heart failure9. For preventing dangerous and worsening conditions, early identification of arrhythmias is essential. Patients can receive immediate care if these arrhythmias are detected in time, which reduces the risk of complications and allows for more effective treatment10. The diagnosis of heart conditions is carried out by cardiologists or medical experts who analyze and interpret Electrocardiogram (ECG) signals to assess heart health.Einthoven revolutionized the discipline of electrocardiography. He also designed a triaxial bipolar system that incorporates three limb leads to record the electrical activity of the heart. His innovative approach included Lead I, Lead II, and Lead III, which form the sides of an equilateral triangle, known as Einthoven’s Triangle11. The ECG signal measures the electrical activity of the heart, relying on the conduction system. It is composed of five waves and complexes: P, Q, R, S, and T, each corresponding to a particular aspect of the cardiac cycle12 as illustrated in Fig. 113.Fig. 1Morphology of an ECG heartbeat13.Full size imageECG signals are widely used in the diagnosis of CVDs. With advancements in digital technology and development of low-cost miniaturized acquisition units, digital acquisition and processing of ECG signals have become widely performed14. Manual CVD analysis is also labor-intensive and prone to errors, especially for large-size ECG signals, due to the complex nature of these signals, which require significant training15,16. Thus, faulty ECG analyses may easily lead to an incorrect diagnosis and treatment. Hence, automatic arrhythmia classification in ECG signals offers significant benefits, as it may provide not only an unbiased diagnosis but also could reduce the workload of medical personnel. Therefore, the detection and classification of ECG are of great significance and have paved the way for advancements in CVD research17.There are significant developments in the automatic ECG classification algorithms using classical Machine Learning (ML) models, such as Decision Trees (DT), (LD)18, and logistic regression19 for diagnosing cardiac arrhythmias. Other techniques such as Naive Bayes, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN) have also been used in this field20,21. Artificial Neural Networks (ANN) have become a powerful tool, capable of detecting real-time arrhythmias through the recognition of intricate patterns and correlations in ECG signals22. Other methods combine feature extraction with ML models such as time domain features, frequency domain features, and a combination of both23.Until recently, deep learning has sounded very promising for ECG signal analysis and has mostly outperformed the results given by other traditional ML methods. Convolutional Neural Networks (CNN)24, Recurrent Neural Networks (RNN)25, and Long-Short Term Memory (LSTM) neural networks26 achieved the state-of-the-art performance thanks to their automatic features extraction ability from raw ECG signals.The most common cardiac rhythm seen in medical-surgical units is Normal Sinus Rhythm (NSR)27. This rhythm is normally controlled by the Sinoatrial (SA) node, which generates the electrical impulses that start each heartbeat and create a regular rhythm on an ECG. NSR is an indicator of normal heart rate, which typically ranges from 60 to 100 beats per minute (bpm). This rhythm will ensure that blood is well circulated throughout the entire body, delivering the oxygenated and nutritional supply to all organs and tissues.There are various types of ECG alterations which accompany the manifestation of Arrhythmia Rhythm (ARR) and Congestive Heart Failure (CHF) illnesses. For instance, in ARR, the heart rate is below 60 bpm (bradycardia) or exceed 100 bpm (tachycardia), the PP and RR intervals are constantly in fluctuation, the heart rate is irregular, and the P-wave amplitude is diminished or reduced28. Even though CHF is related to heart mechanical disruption, several ECG traces are present due to CHF. The ECG results show a prolonged QRS complex that lasts 120 milliseconds or more. In some cases, the QRS complex is inverted, the ST-segment is raised, and the P-wave’s attribute changes over time29. As a result, the discriminative patterns in ECG signals can provide valuable signatures for classifying both conditions.The objective of this research work is to classify closely related cardiac disorders, such as ARR and CHF, from normal patients with NSR using ECG signals. This study utilizes ECG fragments for this purpose, eliminating the necessity for beat-level segmentation. However, we haven’t attempted to classify ARR and CHF subclasses.Related worksMany studies have focused on detecting arrhythmias, but two main approaches commonly used to evaluate the performance of these methods: the intra-patient paradigm and the inter-patient paradigm30,31.In intra-patient approach, the data from each patient was split into training and testing sets. However, inter-patient approach would be more effective, because it better reflects real-life scenarios. In inter-patient classification, the training and testing data came from two different patients. This method takes into account the unique characteristics present in each person’s ECG such as waveform shape and slight variations in heart rhythm. The comparison between intra-patient and inter-patient techniques in ECG classification is depicted in Fig. 2.Fig. 2Inter-patient and intra-patient techniques.Full size imageDe Chazal et al.30 showed that the intra-patient paradigm often leads to biased results because it learns the specific traits of each patient during training. This yields an accuracy close to 100% during testing. In real-life situations, the trained model needs to handle heartbeats from new patients that it hasn’t seen before during training.Moreover, Luz et al.31 demonstrated that heartbeat classification methods tend to achieve much higher accuracy when evaluated using the intra-patient paradigm compared to the inter-patient paradigm.In recent years, research has concentrated on discriminating ECG signals related to NSR, CHF, and ARR. In this research article, different investigations that aim to detect the ECG rhythms mentioned previously will be discussed. Most of the research in the literature has centered on the differentiation between NSR and ARR.In this context, we review the work proposed by Pławiak et al.32, where they used 29 ECG recordings from the Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia database33. In their study, data segmentation was applied, resulting in ECG signal lengths of 10 seconds with the aim of classifying 13 types of ECG heartbeats. Their approach was based on feature extraction from ECG segments using spectral power density with the SVM classifier, resulting in 98.85% using 10-fold cross-validation. Although their proposed approach has led to improved classification results, NO inter-patient paradigm was applied to test their proposed approach.Mathunjwa et al.34 used 29 ECG recordings from the MIT-BIH arrhythmia database to classify ECG segments of 2 seconds length into 5 classes. In their approach, they applied the recurrent plot to convert ECG segments into images. The images were fed into a 2-dimensional CNN for the classification task. Using the 5-fold cross-validation technique, their proposed approach was able to achieve an accuracy of 98.41% across the 5 ECG segment classes. NO inter-patient paradigm was applied in their proposed work.Other automated arrhythmia classification methods focus on classifying ECG segments into 2 classes, NSR and CHF. Sudarshan et al.35 proposed a novel approach to classify ECG segments of 2 seconds length using 58 normal ECG recordings and 18 ECG recordings with CHF. Their proposal combined two feature extraction techniques, including dual tree complex wavelet transform and statistical features. DT and the KNN algorithms were employed as classifiers to distinguish between the two classes. Their proposed approach attained an accuracy of 99.86% using 10-fold cross-validation technique with NO inter-patient paradigm.Tripathy et al.36 used 17 ECG recordings as NSR and 15 as CHF. In their proposed approach, the ECG recordings were split into segments of 4 seconds length. Different types of features were applied to extract meaningful information from the ECG segments, including Stockwell transformation and time-frequency entropy. Using a hybrid classifier, their approach was evaluated with 10-fold cross-validation, achieving an accuracy of 98.78% in distinguishing between CHF and normal ECG segments. NO inter-patient paradigm was applied to evaluate their model’s effectiveness.Few studies have focused on classifying ARR, CHF, and NSR. Kaouter et al.37 proposed an innovative approach to classify ECG segments of 8 minutes length into three classes: ARR, NSR, and CHF. In their proposed work, they used Continuous Wavelet Transform (CWT) along with a CNN to distinguish between the three ECG classes, achieving an accuracy of 83.20% with NO inter-patient paradigm being applied.Çınar et al.38 proposed a novel approach based on Short Term Fourier Transform (STFT) combined with AlexNet classifier to classify ECG segments of 6 seconds into ARR, CHF, and NSR, achieving an accuracy of 96.77%. NO inter-patient paradigm was applied to test the model’s performance on unseen patient data.Eltrass et al.39 used Constant-Q-Non-Stationary Gabor Transform (CQNSGT) with AlexNet to classify 1.3-min ECG segments into ARR, CHF, and NSR classes, achieving an accuracy of 98.82% with NO inter-patient paradigm being applied.The work proposed by Nahak et al.40 coupled a set of automated features extracted through a pre-trained Transfer Learning (TL) model with Handcrafted Features (HF) such as auto-regressive fractal dimension, Shannon’s entropy, and wavelet variance. These features were then fed into an SVM classifier for distinguishing the three classes: NSR, ARR, and CHF. Instead of directly utilizing 36 NSR, 94 ARR, and 30 CHF recordings, they divided them into 1-min signals, amplifying the dataset to 288 NSR samples, 752 ARR samples, and 250 CHF samples. Their study explored how combining these two feature extraction methods could improve classification results, achieving an accuracy of 99.06%. Nonetheless, this approach has drawbacks despite the positive results. Primary HF are often hard to understand, and when we add TL, the computational cost rises. In addition, the classification accuracy of the CHF class remained below 96%. The results of their proposal were evaluated under the inter-patient paradigm to identify the effectiveness of their model on unseen ECG data.Prusty et al.41 proposed a novel method to classify ECG segments of 1000 samples into three classes using Scale-Invariant Feature Transform (SIFT) to extract features from ECG segments, then fed the extracted features into a 2-dimensional deep CNN-based TL technique. Their proposed approach, which combined SIFT and 2D CNN, resulted in a classification accuracy of 99.78% with NO inter-patient paradigm applied. The existing related works focused on classifying ECG segments into the three classes NSR, CHF, and ARR are summarized in Table 1.Table 1 Existing research work on ARR, NSR and CHF classification.Full size tableMotivations and contributionsDespite the high accuracy attained in so many studies classifying NSR against abnormalities like ARR or CHF, there is still further scope for exploration. For example, closely related conditions such as CHF and ARR are difficult to be differentiated from one to another. Most research has explored binary classification involving, for example, NSR versus ARR or CHF versus NSR, but the combination of these three classes into a multiclass classification framework remains underexplored.The novelty of our proposed method lies, first and foremost, in its evaluation under the inter-patient paradigm, which contrasts with many previous studies that rely on random data splitting an approach that does not reflect real-world scenarios. Secondly, we introduce a new approach based on the stationarity hypothesis, applied for the first time in the automated classification of ECG signals. Finally, our method uniquely combines the WSN, LD, and the stationarity hypothesis. This integrated technique significantly improves performance, even under the more challenging inter-patient paradigm, where results typically degrade as highlighted by De Chazal30 and Luz et al.31.Most previous studies often consider a NO inter-patient approach in which a model was trained and then tested on the same subject. Though, this approach highly useful in controlled conditions, it clearly does not translate to the natural setting. Among the works mentioned, Nahak et al.40 stands out as one in which an inter-patient approach has been used. i.e., the model, once trained, was tested on completely unseen patients. However, their approach had several limitations such as high computational complexity due to feature extraction techniques and TL, as well as lower classification accuracy, especially for CHF which scored below 96%. These gaps in the prior works highlight the need for a more practical and efficient solution.The major contributions of our work are the following:1.A new inter-patient classification method will be developed for better accuracy in discrimination NSR, CHF, and ARR.2.The performance of the proposed method using inter-patient and NO inter-patient settings will be compared using various ML techniques.3.Utilize WSN with LD classifier for classifying ECG segments into the three classes.4.An innovative approach is being introduced based on the stationarity property of ECG rhythms to further increase the accuracy of inter-patient scenarios.This study introduces a more robust and realistic ECG classification for clinical applications.MethodologyOur approach using a deep convolutional scattering network and LD classifier with stationarity hypothesis for detecting CHF, ARR, and NSR, is described in the "Methodology" section. The "Results and discussion" and "Conclusion" sections, explain and discuss the obtained results and summarize the findings of this study, respectively.ECG data description and pre-processingThe dataset used for the purpose of classifying ARR, CHF, and NSR is described as follows:1.96 recordings used were from the MIT-BIH ARR Database33. This collection includes beat annotation files for 29 long-term ECG recordings of patients with arrhythmias, aged between 34 and 79 years. The group contained 8 men, 2 women, and 37 other patients with non-specific gender. The initial recordings were digitized at 360 samples per second, with a resolution of 5 µVolt/bit.2.36 recordings were sourced from the MIT-BIH NSR Database42. This dataset contains long-term ECG recordings from 18 patients at BIH Arrhythmia laboratory. All the subjects were free from serious ARR, and they were 5 men aged 26 to 45, and 30 women aged 20 to 50 years. The ECG signals were digitized at a sampling rate of 128 Hz along with a 12-bit Analog-to-Digital Converter (ADC), with measurements at fixed time intervals of 7.8125 ms.3.30 recordings were obtained from Beth Israel Deaconess Medical Center (BIDMC) CHF Database43. This repository features long-term ECG recordings from 15 patients with severe CHF, 11 men aged 22 to 71 years, and 4 women aged 54 to 63 years. Their CHF is classified as class 3–4 according to the New York Heart Association (NYHA). Each recording lasts about 20 hours, with two ECG signals sampled at 250 samples per second, and the data has 12-bit resolution over a 10-millivolt range. The original analog recordings were collected using ambulatory ECG recorders with a bandwidth of approximately 0.1 Hz to 40 Hz at Boston’s Beth Israel Hospital.Two leads for each patient were applied to capture the heart electrical activity, specifically ML II and precordial derivations (V1, …, V6). These leads were chosen because they are positioned along the heart axis, and can easily pick the heart rhythms occurring in this area. A summary of the dataset information used in this study is provided in Table 2.Table 2 Description of the database used in this research.Full size tableEvery raw data file was subjected to the scale specified in the PhysioNet.info files for every PhysioNet data file. A standard rate of 128 Hz was used to resample the data. Two different shemes were utilized in this research paper including inter-patient paradigm and NO inter-patient paradigm.NO inter-patient paradigmWe worked with the raw dataset, which included 96 ECG as ARR, 30 ECG as CHF, and 36 ECG as NSR samples, each signal contains 65,536 samples. The only preprocessing technique applied was segmentation, by dividing each ECG signals into segments of 2048 samples. This resulted in 3072 ARR signals, 960 CHF signals, and 1152 NSR signals, each containing 2048 samples and resampled at 128 Hz. In total, we used 5184 ECG segments for this study. The ECG segments for the three classes used in our research are illustrated in Fig. 3. The different steps followed to classify ECG segments into ARR, CHF, and NSR are depicted in Fig. 4.Fig. 3ECG segments of 2048 samples.Full size imageFig. 4Classification process under the NO inter-patient paradigm.Full size imageTo evaluate how effective our proposed model is, we used a hold-out validation technique with stratification. This means that our dataset is divided into two parts, one part for training and the remaining for testing, while guarantying a proper distribution of each class in both data sets. Specifically for each class, we allocated 80% for training and 20% for testing. As a result, in the testing set, there is 614 ECG signals classified as ARR, 192 ECG as CHF, and 230 ECG as NSR. Meanwhile, the training dataset contained 2458 ARR signals, 768 CHF signals, and 922 NSR signals. In total, we had 4148 ECG signals for training and 1036 for testing. To test further the effectiveness of our proposed method, we employed the 5-fold cross validation technique on the training data set.Inter-patient paradigmIn inter-patient sheme, the patients are divided into two groups with stratification. The first group contain 80% of the total number of patients, and 20% of the patient’s data from each class are used in testing. After the division of the ECG signal, the segmentation was applied following the same technique used in the intra-patient paradigm. The inter-patient sheme followed by this research paper is summarized in Table 3.Table 3 Inter-patient split of ECG segments.Full size tableFeature extraction using WSNBruna and Mallat44 proposed a new feature extraction method, namely the Wavelet Scattering Transform (WST). This technique uses complex wavelets to make a good trade-off between its ability to detect the different pattern in the signal and its ability to produce stable time-frequency features. This makes WST particularly suitable to analyze signals, because it captures important details while it remains resistant to noise and small changes in the data.The WST stands out as one of the most effective techniques for extracting features from non-stationary signals in both time and frequency domains. Some of the benefits of using WST includes its ability to stand against time shifts, rotation invariances, noise distortions, and also provide dimensionality reduction which are critical for modern signal processing. Thanks to these characteristics, WST proved to be a very effective tool for processing signals.The Wavelet Scattering Network (WSN) is an equivalent deep convolutional network formed by a cascade of wavelets, non-linearity, and low pass filter. This structure enables the derivation of low variance features with minimal configuration from real valued time series and images for use in ML and deep learning applications. The challenge with deep CNN is that they often work like a black-box which we don’t fully understand why they perform so well in classification tasks. Despite their success, the reasons behind their effectiveness are still not entirely clear. Thus, the scientific community decided to create a modified version of deep CNN using a white-box, in which it’s possible to interpret and see what happens exactly inside. The architectures of both deep CNN and WSN, enabling a comparison between them, are illustrated in Fig. 5.Fig. 5Features extraction comparison between WSN and CNN.Full size imageWhen preparing to use the WSN, the first thing to decide is which kind of wavelet is going to be used. As we are dealing with ECG signals, the selection of the wavelet should be optimized for this purpose only. After evaluating several options, such as the Mexican hat wavelet, Morlet wavelet, and Haar wavelet, we decided to use the Gabor complex wavelet. The reason for this decision is based on the close resemblance of real and imaginary shapes of Gabor wavelet with the QRS complex found in ECG signals. This makes the Gabor wavelet especially sensitive to the structure of ECG signal, therefore extracts more details and relevant information from the signal. Both the real and imaginary sections of the Gabor wavelet are depicted in Fig. 6.Fig. 6Gabor wavelet filters with coarsest-scale (lowest frequency).Full size imageThe following expression presents the mathematical representation of the complex wavelet employed.$$\psi \left( t \right) = { }\frac{1}{{\sqrt {2\pi \sigma^{2} } }}e^{{\frac{{ - t^{2} }}{{2\sigma^{2} }}}} e^{i\omega t}$$(1)In the given expression, \(t\) represents the time, and \(\sigma\) represents the standard deviation of the Gaussian function. \(\omega\) is defined as \(2\pi f\), where \(f\) represents the center frequency of \(\psi\), and \(i\) represents the imaginary unit. The envelope of the complex wavelet is characterized as a low-pass filter denoted as \(\Phi\).$$\Phi \left(t\right)=\left|\psi (t)\right|$$(2)In our study, the signal \(x\left(t\right)\) represents an ECG signal with 2048 samples at a sampling frequency of 128 Hz. As mentioned earlier, the first step in the WSN convolves this signal with the low-pass filter wavelet \(\Phi\). The bandwidth of the low-pass filter wavelet determines the outcome of this step. The result is the zeroth-order wavelet scattering network contains a vector of 4-time windows. The low pass filter acts as a moving average filter, smoothing out high frequency components. After applying the low pass filter, a critical down-sampling results the restricts of the bandwidth below the cut off frequency \({f}_{c}\), the down sampling factor \(D\), and the number of time windows can be calculated as follows:$$D = \left\lfloor {\frac{{f_{s} }}{{2f_{c} }}} \right\rfloor$$(3)$$NO.\,Time\,windows = \left\lfloor {\frac{{Signal\,length}}{D}} \right\rfloor$$(4)In our case, we use the low-pass filter wavelet depicted in Fig. 7, with cut-off estimated frequency of \({f}_{c}=0.125\) Hz. For the sampling rate of 128 Hz, the down-sampling factor can be calculated to approximately comes out to 512 using the previous formula. As a result, we reduce the data significantly by only keeping samples spaced by this factor. Resulting \(NO. Time\,windows=4\).Fig. 7Power spectrum of the low pass wavelet filter.Full size imageThese initial coefficients are represented in a vector \({S}_{0}\) of size 1 × 4. In the WSN’s zeroth order, we are primarily analyzing the slower variations in the signal. At this stage, we achieve good time resolution but have limited frequency resolution.$${S}_{0}=x\left(t\right)*\Phi$$(5)An invariance scale duration \(T\) is needed to be set in order to use a WSN. This parameter determines the maximum time span over which translation invariance will not change. For example, if setting the invariance scale \(T\) equal to 1 s, it will preserve the scattering features through any shift within that 1-second period. To ensure effective analysis, the invariance scale value must be shorter than the signal duration value.$$T< \frac{Signal length}{fs}$$(6)where \(fs\) represent the sampling frequency of the signal.In our study, to balance computational efficiency with the requirement for shift invariance, we tested different scales from 8 to 16 s. The best results came with an invariance scale of \(T=16\) seconds, so we selected this as our final setting.We need a bank of wavelet filters that cover different frequency ranges to design a WSN. These filters are ordered, such that the signal is decomposed into various frequency bands according to the Nyquist theorem. For our case, the sampling frequency of the signal is 128 Hz. For the first filter bank, the wavelet with the highest frequency band will be centered at its maximum power frequency, which is:$${f}_{n}\approx \frac{{f}_{s}}{2}$$(7)where \({f}_{s}\) represents the sampling frequency of the signal, and \({f}_{n}\) represents the central frequency of the wavelet’s highest frequency band.In WSN, the scales \({\lambda }_{i,j}\) for a given index \(j\) of the wavelet in the filter bank \({\Delta }_{\text{i}}\) is expressed as a function of the quality factor \({Q}_{i}\), which determines the number of wavelets per octave. This scale can be expressed as follows:$${\lambda }_{\text{i},\text{j}}={2}^{\frac{-j}{{Q}_{i}}}, {\lambda }_{\text{i},\text{j}}\in {\Delta }_{\text{i}} , j=\{\text{1,2},\dots ,{N}_{i}\}$$(8)$$N_{i} = \left\lfloor {Q_{i} .log_{2} \left( {\frac{{f_{s} .T}}{{2.Q_{i} }}} \right) + 1} \right\rfloor$$(9)Next, we build the filter bank using a quality factor \({Q}_{1}=8\), which represents the number of wavelets used to cover an octave frequency range. For the first filter bank, the choice is intentional of a quality factor \({Q}_{1}=8\). So, we can create a refined set of wavelet filters based on this value of \({Q}_{1}\), each covers different ranges of frequency. This value provides us with lower scales, in order to capture intricate details of the frequency and preserve manageable complexity. The computational cost increased for a higher value of \({Q}_{1}\), even if it provides even finer frequency separation. With the use of \({Q}_{1}=8\), we achieve a practical division of frequency bands, optimizing the accuracy of the signal representation and the processing speed. As we move to lower frequencies, the frequency bands are split into more detailed sub-bands increasingly, giving \({N}_{1}=57\) wavelets, each has a range of frequency. The power spectrum for these 57 high-pass wavelet filters is presented in Fig. 8, while their individual central frequencies and 3-dB bandwidths are highlighted in Fig. 9.Fig. 8Power spectrum of the first filter bank wavelets.Full size imageFig. 9Center frequencies and 3-dB bandwidths of the first filter bank wavelets.Full size imageTo recover the high-frequency components of the signal, we use the first filter bank \({\Delta }_{1}\) that consist of \({N}_{1}=57\) high-pass wavelet filters at different scales or frequency bands. The modulus signal is convolved with the low-pass filter and critically down-sampled, which result in 4-time windows. This process allows us to obtain the first-order scattering network coefficients, denoted as \({S}_{1}\). At this stage, we analyze the high-frequency elements of the signal and extract these important high-frequency coefficients. The first order scattering coefficients produce 57 × 4 coefficients.$${S}_{1}=\{|x*{\psi }_{{\lambda }_{1}}|*\Phi {, \left|x*{\psi }_{{\lambda }_{2}}\right|*\Phi ,\dots ,|x*{\psi }_{{\lambda }_{{N}_{1}}}|*\Phi \}}_{{\{\lambda }_{1}..{\lambda }_{{N}_{1}}\}\in {\Delta }_{1}}$$(10)However, the first order of the process ends with a convolution using the low-pass filter \(\Phi\), which results in the loss of some high-frequency components. So, we constructed a second filter bank of high-pass wavelets \({\Delta }_{2}\) with a quality factor of \({Q}_{2}=1\), in order to recover these lost high frequencies. This choice is intentional, since we have already covered specific frequency ranges in the first filter bank. Using a higher value of \({Q}_{2}\) could lead to redundant scales and increased computational costs. The power spectrum of the second set of high-pass wavelet filters that consist of \({N}_{2}=9\) is illustrated in Fig. 10. While, the 3-dB bandwidths of wavelet filters used in the second filter banks and their central frequencies are depicted in Fig. 11.Fig. 10Power spectrum of the second filter bank wavelets.Full size imageFig. 11Center frequencies and 3-dB bandwidths of the second filter bank wavelets.Full size imageTo retrieve the high-frequency components, we applied the second order WSN, in order to obtain the second order scattering coefficients, denoted as \({S}_{2}\). In the second order, the wavelets scattering coefficients are expected to be of a size of 57 × 9. Where 57 represents the number of high-pass wavelet filters in the first filter bank, and 9 represents the number of high-pass wavelets in the second filter bank, which may result in 513 scattering paths in the second order of the scattering network.$${S}_{2}=\{|\left|x*{\psi }_{{\lambda }_{\text{i}}}\right|*{{\psi }_{{\lambda }_{\text{j}}}\left|*\Phi \right\}}_{{\lambda }_{\text{i}}\in {\Delta }_{1}, {\lambda }_{\text{j}}\in {\Delta }_{2}}, i=\left\{\text{1,2},3,\dots ,{N}_{1}\right\},\text{j}=\left\{\text{1,2},3,\dots ,{N}_{2}\right\}$$(11)This can lead to high computing costs for the extraction of characteristics. Consequently, the optimization routes setting was set to true for the WSN. When there is a substantial overlap between the bandwidths of parent and child nodes in the scattering network, the scattering pathways are selectively computed in order to optimize the network’s paths. In this context, “substantial overlap” is defined as follows: the child node’s 3-dB bandwidth is deducted from its wavelet center frequency for a quality factor of 1, or 1/2. The scattering path is calculated if that value is less than the parent’s 3-dB bandwidth. The definition of considerable overlap for quality factors larger than 1 is an overlap between the center frequency of the child minus the child’s 3-dB bandwidth. If this overlap occurs with the 3-dB bandwidth of the parent, the scattering path is computed. This optimization results in 142 scattering paths in the second order of the scattering network. Finally, the scattering coefficients of the second order \({S}_{2}\) are obtained, with a size of 142 × 4.According to previous research, almost 99% of the scattering coefficient’s energy is concentrated within the first two layers. This energy dropping quickly as we progress to higher layers44. The final scattering coefficients, which we refer to as the feature matrix for a single ECG signal, are represented as \(S\) and have a size of 200 × 4. The scattering network is illustrated in Fig. 12.Fig. 12Architecture of the scattering network.Full size imageAfter applying the WSN to an ECG signal with 2048 samples, we obtained a feature matrix sized 200 × 4 corresponds to 800 samples. The results of this transformation emphasize the effectiveness of the WSN as a powerful signal processing technique to reduce dimensionality. In this example, compared to the original signal, the WSN yielded a reduction in dimensions of 60.94%. This is an important characteristic because it enables the use of different ML algorithms, which generally perform better with lower-dimensional data. We reshape the tensor into an adequate format to prepare the data for the classifiers.The features matrices used under the inter-patient and NO inter-patient paradigms in training and testing data are presented in Table 4.Table 4 Features size using inter-patient and NO inter-patient paradigms.Full size tableComparison of the classification results of inter-patient and NO inter-patient paradigmsFinding and analyzing a confusion matrix is the traditional method for evaluating a model’s performance. This matrix determines False Positives (FP) or instances that were classified into the incorrect category, False Negatives (FN) or instances that were classified into normal beats when they are actually not, True Positives (TP) or the number of cases that were classified into the correct disease category, and True Negatives (TN) or the number of cases that were classified into normal heart beats. We selected accuracy, precision, recall, and F1-score as performance metrics to assess the model’s performances, these metrics are derived from the confusion matrix.$$Accuracy= \frac{TP+TN}{TP+FP+TN+FN}$$(12)$$Precision= \frac{TP}{TP+FP}$$(13)$$Sensitivity=\frac{TP}{TP+FN}$$(14)$$Specificity=\frac{TN}{TN+FP}$$(15)$$F1 score=\frac{2.SEN.PRE}{SEN+PRE}$$(16)Different ML models have been used for the purpose of classifying ECG signals into: ARR, CHF and NSR. The ML models were feed the scattering coefficients obtained by applying the WSN. As we have 4-time windows for each ECG segment, the data was oversampled by a factor of 4. This approach allows the ML models to generate 4 predictions for each ECG segment, corresponding to the 4-time windows obtained by applying the WSN. The classification accuracy using 5-folds cross-validation for the inter-patient split and NO inter-patient split are outlined in Tables 5 and 6, respectively.Table 5 Validation accuracy for the data of inter-patient split.Full size tableTable 6 Validation accuracy for the data of NO inter-patient split.Full size tableIt is obviously clear trough these tables that different ML models were used, testing their performance and starting by DT, LD, Quadratic Discriminant (QD), Naïve Bayes (NB), Linear SVM, Quadratic SVM, Cubic SVM, KNN, Ensemble Bagged Trees (EBT), and Ensemble subspace KNN (EKNN). As we can observe and analyze the results, the two splits are really close. This is because the validation folds don’t follow the inter-patient paradigm, meaning NO inter-patient data separation was maintained in the validation sets. Interestingly, in both shemes, the validation accuracy of the KNN separation outperforms other ML models. And as the KNN achieved the best separation, we further tested the KNN on the testing data of both inter-patient and NO inter-patient paradigms to explore the difference between these two paradigms. The testing accuracy using KNN under inter-patient and NO inter-patient paradigms is illustrated in Fig. 13. The investigation conducted in this study confirms the observations of De Chazal et al.30 and Luz et al.31. In the inter-patient paradigm, the testing accuracy achieved was about 79.40%. On the other hand, the testing accuracy using NO inter-patient was significantly higher, reaching close to 100%, with a remarkable accuracy of 99.30%.Fig. 13Accuracy comparison between inter-patient and NO inter-patient split.Full size imageBased on the results given by the Table 5, we decided to proceed with the LD model as it strikes a balance between prediction speed and validation accuracy. LD classifier is a method that aims to maximize the separation between two or more groups by finding the optimal values for the vector \(\upsigma\) which represents the weights used to calculate the discriminant scores.$$LD=\sum_{i=1}^{N}{\upsigma }_{i}{X}_{i}$$(17)Results and discussionTo enhance the testing accuracy under the inter-patient paradigm using LD classifier, an innovative approach was proposed in this research work based on the "stationarity hypothesis of ECG Rhythms".One of the earliest references to the assumption of stationarity in ECG analysis comes with Moody et al.33 subsequent studies from the MIT-BIH ARR database. They provided methods for ECG segmentation and pattern recognition in the context of period in which the arrhythmias are present in a non-varying manner. This proposed work is based on the assumption of the stationarity hypothesis, which supposed that arrhythmias detected in one part of a long ECG signal from a patient are likely to appear in other parts of the same signal. This concept has never been applied in the field of automated detection of arrhythmia. Thus, in this study, we decided to investigate the impact of this observation on the prediction accuracy of CHF, ARR, and NSR.In this paper, we propose a novel approach for classifying ECG signals as belonging to either of 3 classes: ARR, CHF, and NSR. This approach is based on the stationarity hypothesis, and uses WSN coupled with a LD classifier. Every ECG segment is first transformed to a feature matrix of size 4 × 200. This feature matrix is then fed into the LD classifier, which produces four possible class predictions for each segment.In real-time ARR detection, it is probable that the same arrhythmia will manifest in successive ECG segments for the same patient. Predictions from the previous 2 ECG segments produce a total of 8 predictions, as each segment offers 4 possible class predictions. These predictions are added to the predictions from the current segment in order to increase classification accuracy. The final class for the current segment is then determined by applying a Weighted Voting (WV) procedure to all 12 forecasts.The workflow of our innovative method is depicted in Fig. 14. Moreover, the different weights assigned to the 12 predictions, based on validation results using the inter-patient paradigm, is illustrated in while Fig. 15. This approach leverages past segment information to enhance the reliability of ECG classification.Fig. 14Proposed model for CHF and ARR classification.Full size imageFig. 15Weights of different predictions used in weighted voting.Full size imageThe impact of adding the stationarity hypothesis using WV was evaluated under the inter-patient paradigm on testing data following the suggested methodology. The results in Fig. 16 outlines a significant improvement in classification accuracy among the three classes. Specifically, an accuracy increase of 20.21% was observed when compared to using WSN combined with LD classifier alone, achieving an accuracy of 99.61%. These findings highlight the significant role of the stationarity hypothesis in improving ECG classification accuracy, especially under the difficult inter-patient paradigm.Fig. 16Testing accuracy improvement after adding stationarity hypothesis and weighted voting.Full size imageTo evaluate the performance of our proposed approach for enhancing predictions under the inter-patient paradigm, several methods were employed. These methods include the testing confusion matrix illustrated in Fig. 17. Other measurements including sensitivity, specificity, precision, and F1 score for each class, are summarized in Table 7.Fig. 17Confusion matrix of testing data under inter-patient paradigm.Full size imageTable 7 Classification performance metrics for ARR, CHF, and NSR.Full size tableIn addition to the confusion matrix of testing data, we have included the ROC_AUC curves (one-vs-rest) for the three classes ARR, CHF, and NSR, as illustrated in Figs. 18, 19, and 20, respectively. These curves emphasize the ability of the proposed model to accurately distinguish between positive and negative cases, achieving a TP rate of 99% and a FP rate of 0% for all three classes. The Area Under the Curve (AUC) values are close to 1, further confirming the effectiveness of our model in reliably differentiating between ARR, CHF, and NSR classifications.Fig. 18ROC_AUC curve for ARR as positive class.Full size imageFig. 19ROC_AUC curve for CHF as positive class.Full size imageFig. 20ROC_AUC curve for NSR as positive class.Full size imageWe have also conducted a detailed comparison with state-of-the-art methods for detecting heart diseases. The existing approaches used to classify ARR, CHF, and NSR, and which can be divided into two classes including with or without inter-patient paradigms are highlighted in Table 8. The results reveal that most of the methods without considering an inter-patient paradigm tend to achieve higher performance in classification performances39,41.Table 8 Comparison of algorithms of the existing method in ARR, CHF, and NSR classification. Bold text indicates significant values under the inter-patient paradigm.Full size tableHowever, among approaches employing the inter-patient paradigm, the work of Nahak et al.40" is the only study focused on the classification of ECG segments under this paradigm. For fairness in comparison, we evaluated our approach against models using the same inter-patient paradigm. Indeed, sensational results have been obtained with the proposed method outperforming existing models on accuracy, sensitivity, specificity, and F1 score of 99.61%, 99.35%, 99.74%, and 99.49%, respectively.Our work contributes significantly to the field of automated CVDs detection by introducing an innovative approach based on the deep WSN combined with the LD stationarity hypothesis of rhythms and WV technique. This innovative method enhances prediction accuracy while considering inter-patient differences, making it highly promising for clinical applications and early diagnosis of CVDs.As outlined in Table 9, our analysis confirms the efficiency and speed of our proposed WSN based model using LD for ECG classification. The feature extraction process takes just 0.42 s per segment, with a prediction speed of 22,000 segments per second. The memory usage per segment is 6400 bytes.Table 9 Complexity analysis of the proposed model.Full size tableThe MATLAB Version R-2021b programming language was utilized to implement all algorithms on windows server. The system used for execution had an Intel(R) Core (TM), i5, CPU 6300U processor with a clock speed of 2.40 GHz. The RAM capacity was 12 GB, operated on a 64-bit architecture. Our model delivers swift performance, low parameter count, and minimal computational cost.ConclusionThis research introduces an innovative method to improve the classification accuracy in the inter-patient paradigm, reflecting real-world scenarios of automated ECG signal analysis. In this paper, our approach leverages the use of an innovative application of the stationarity hypothesis of heart rhythms in ECG signals. By using WSN, LD classifier, stationarity hypothesis, and WV, we found that stationarity hypothesis combined with weighted voting significantly increased the accuracy under the inter-patient paradigm, achieving a 20.21% increase.Our model proved to be a good classifier for long-term ECG signals into these three main categories through its ability to separate normal from abnormal heart rhythms. This ability to distinguish NSR from pathologic rhythms indicates that the model is very generalizable. Since it detects morphological features namely from the QRS complex as well as rhythm-based features, it is extremely good at detecting a wide range of arrhythmias. Therefore, in more specialized applications, where interpretation aims to detect specific arrhythmia types, the model can be tuned to scan relevant ECG intervals and correctly differentiate between these conditions. We attribute this versatility to the inherent robustness of the stationarity-based hypothesis in enabling the model to observe minute but critical differences between heartbeat dynamics. This property plays a crucial role in broadening the applicability of the model to broader diagnostic conditions with disparate arrhythmic patterns.The proposed technique has demonstrated excellent results for classifying the three ECG signal classes: ARR, CHF, and NSR. In the near future, we aim to extend this work to classify every heartbeat in a long-term ECG recording. More than this, we want to lighten up the computation of WSN only by selecting some most powerful wavelet paths for classification to extend the proposed approach to be more compact, portable, and feasible for practical implementation on embedded devices.