NextGen lung disease diagnosis with explainable artificial intelligence

Wait 5 sec.

IntroductionEach year, pulmonary illnesses, including COVID-19, Bacterial Pneumonia, Viral Pneumonia, and Tuberculosis (TB), claim the lives of millions of people1. The ratio is intended to increase annually. The global healthcare system is under immense strain due to the severe COVID-19 pandemic and the exponential rise in COVID-19 patients. Likewise, tuberculosis and pneumonia can be fatal illnesses. DL-based techniques have demonstrated high accuracy in CXR image classification2. Though more effective at generating sharp images, computerized tomography (CT) and magnetic resonance imaging (MRI) are radiation-intensive and highly costing procedures. Therefore, the best option for identifying pulmonary illnesses is to use CXR images3. The lung condition has comparable symptoms, such as fever, coughing, and breathing difficulties. Conversely, TB has a delayed onset of illness and a more extended incubation period. Therefore, prompt and accurate identification of such disorders is essential to provide the right care and save lives.Deep learning algorithms can accurately detect and categorize various diseases without requiring human intervention4,5,6. Deep learning is more successful than machine learning as the network grows because more profound data representation is possible with more extensive networks. It handles and distinguishes the morphological structures and identifies the classes7. As a result, the model automatically gathers attributes and results in more precise outcomes. The main goal of deep learning is extracting and classifying information from images. It exhibits promising revolution in several sectors, including healthcare.Furthermore, deep learning can be applied to creating models that use CXR images to forecast and diagnose diseases precisely. Conversely, doctors can employ contemporary technologies like AI and imaging inputs like heatmaps and comparable data as decision-support tools to reduce human error and boost diagnosis accuracy8. Techniques for deep learning differ significantly from those for conventional machine learning. They can construct the input model as the network’s size grows. Consequently, the model automatically collects data and produces more accurate results. Unlike machine learning methods, deep learning models describe features through a series of non-linear functions combined to maximize the model’s accuracy9.Most researchers and professionals use traditional AI models as “black-box” models for various activities, including medical diagnostics. These conventional AI techniques lack the information and justifications needed to support doctors in reaching improved conclusions. This prospect is made possible by XAI, which converts AI-based black-box models to more explainable and explicit gray-box representations10.The primary drawbacks of the inability to distinguish different lung conditions from one another are solved with our proposed work with the following significant contributions:1.Designed an improved U-Net segmentation specifically for multi-class lung disease classification on CXR images.2.Developed a specialized transfer learning framework using four pre-trained models, enabling precise classification accuracy in CXR images.3.Proposed an XAI-TRANS, an automated interpretation framework on CXR image classification decisions using a LIME-based heatmap, and Grad-CAM has been implemented.4.Evaluated results against traditional methods and examined each component’s impact, ensuring the proposed approach’s quality with accuracy, precision, recall and F1-Score metrics.Related worksSeveral experiments have recently been conducted to address the COVID-19 outbreak using machine learning approaches. For the early detection of infected cases, researchers suggested a multi-level threshold with a Support Vector Machine (SVM) classifier11. In the beginning, a multi-level threshold technique was used to extract features. Next, an SVM classifier was used to analyze the 40 contrast-enhanced CXR images’ characteristics that were retrieved, and 97% classification accuracy was attained. The authors used an enhanced SVM classifier in a different study to identify COVID-19 instances12. Apostolopoulos et al. have developed a DL-based machine-learning technique to identify COVID-19 instances13. The technique was utilized for binary and multiclass analysis, and the dataset included 500 no-finding images, 700 bacterial pneumonia cases, and 224 COVID-19 X-rays. Concerning binary (COVID-19 vs. No-findings) and multi-class (COVID-19 vs. No-findings vs. pneumonia), the suggested model yields high accuracy results of 98.78% and 93.48%, respectively. Hemdan et al.14 also developed a DL-based COVID-19 case detection approach utilizing X-ray images. The suggested method was contrasted with seven existing DL-based COVID-19 case detection methods. The technique was solely used to classify binary classes, and an accuracy rate of 74.29% was calculated. Three distinct automated lung disease detection techniques have been established based on three distinct DL models-ResNet50, InceptionV3, and InceptionResNetv215. To automatically detect infected regions, Bandyopadhyay et al.16 created a hybrid model based on two distinct ML techniques: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). About the confirmed COVID-19 cases, the suggested technique achieved 87% accuracy. To automatically classify from CXR, the ADL approach was provided in17. The suggested model yielded an accuracy of 89.5%, precision of 97%, and recall of 100%.A multi-dilation deep learning method (CovXNet) was presented to detect COVID-19 and other pneumonia cases from CXR images automatically18. Two distinct datasets were used for experiments to assess CovXNet’s performance. 5,856 X-ray images comprised the first dataset, while 305 X-ray images of various COVID-19 patients comprised the second dataset. According to the findings, the CovXNet approach successfully classified classes in 90.2% of cases, detecting COVID or Normal with an accuracy of 97.4%, binary classes with 96.9%, and multiclass classes with 90.2% .Several pre-trained models, including VGG16, VGG19, ResNet50, DenseNet121, Xception, and capsule networks, have served as the foundation for designing and developing additional deep learning techniques19,20,21,22,23,24,25. Using a Generative Adversarial Network (GAN) and deep transfer learning, Loey et al.26 could diagnose COVID-19 from X-ray images. Three distinct transfer learning pre-trained models-AlexNet, GoogleNet, and RestNet18-were used in the suggested methodology. A set of datasets comprising 69 COVID-19, 79 pneumonia viruses, 79 pneumonia bacteria, and 79 normal patients were used to test the approach. The experimental results showed that the maximum accuracy rate (99.9%) for binary class classification issues is achieved when pre-trained GoogleNet is combined with GAN.Grad-CAM was used to show the extracted region on CXR images using the attention model and VGG1627,28. The goal of the Ashan et al.29 study was to identify COVID-19 cases based on CXR and CT images and to use six deep CNN learning models, such as VGG16,400 CXR and 400 CT images, to train MobileNetV2, InceptionResNetV2, ResNet50, ResNet101, and VGG19. MobileNetV2 outperformed NasNetMobile, with an average accuracy of 82.94% on a dataset with CT images and 93.94% on a dataset with CXR images. In addition to CXR image analysis, DL models have recently been used to study CT image-based detection of TB, COVID-19, and pneumonia. For example, Li et al.30 employed a pre-trained ResNet50 model to identify pneumonia from COVID-19 in CT scans. To increase the explainability of the model, they used GradCAM to display important regions in the CT images after training the DL model using 4356 CT data, resulting in a 95% accuracy rate. However, there aren’t enough heat maps to pinpoint the algorithm’s distinct characteristics to make predictions.Ukwuoma et al. proposed a vision transformer-based framework for automated lung-related pneumonia and COVID-19 detection, demonstrating the growing interest in leveraging transformer architectures for medical imaging tasks31. Also, Addo et al. developed EVAE-Net, an ensemble variational autoencoder model that demonstrated strong classification performance for COVID-19 diagnosis using CXR data32. Lastly, Monday et al. handled the issue of low-quality images by introducing WMR-DepthwiseNet, a wavelet multi-resolution convolutional neural network designed specifically for COVID-19 diagnosis33. This underlines the importance of integrating explainable AI and segmentation techniques, as pursued in the proposed XAI-TRANS framework, to enhance both transparency and diagnostic reliability in lung disease classification.Explainable Artificial Intelligence (XAI) is a set of techniques and practices developed to make AI models more transparent, interpretable, and understandable. AI offers previously unseen benefits for various everyday tasks, including manufacturing, finance, and entertainment, which include increased efficiency and broader data analysis34,35. However, high-risk systems, especially the healthcare industry are lagging in the application of AI36. The problem with many lung disease classification AI algorithms is that they are often a black box that humans cannot easily comprehend. XAI aims to address this issue by helping healthcare professionals better understand the decisions made by these models. By improving transparency and interpretability in AI models, XAI plays a crucial role in enabling medical professionals to trust and interpret the decisions made by these models37. This understanding leads to better patient care and treatment planning while ensuring adherence to regulatory requirements38.Proposed methodologyThis research presents an XAI approach for detecting multiple lung diseases and assessing healthy lung conditions using transfer learning with CXR images. Improved U-Net lung segmentation is performed before the classification. Employed the transfer learning techniques with VGG16, VGG19, InceptionV3, and ResNet50 models. Our approach involves inception-based transfer learning leveraging XAI techniques, specifically LIME and Grad-CAM, to provide detailed and accurate explanations for predictions related to COVID-19 and other lung diseases. Basic comparisons for all models experimented, and the inception is integrated with the proposed XAI-TRANS model as it outperformed the other models.Data descriptionThe Dataset used in this research is the Lung Disease Dataset39. This dataset comprises a diverse collection of 7560 Chest X-ray images, consisting of five classes: Bacterial Pneumonia, Viral Pneumonia, COVID-19, Tuberculosis and Normal as depicted in Fig.1. This dataset contains 1550 Bacterial Pneumonia samples, 1490 Viral Pneumonia samples, 1500 COVID-19 samples, 1460 Tuberculosis samples and 1560 Normal samples.Normal: These images represent healthy lung conditions and serve as a reference point for comparison.COVID-19: This category includes X-ray images from individuals with confirmed or suspected COVID-19 cases, helping in the early detection and tracking of the disease.Viral Pneumonia: Images in this group are linked to cases of viral pneumonia caused by flu, RSV, SARS-Cov-2, etc., adding to our knowledge and ability to identify this particular lung infection.Tuberculosis: These images feature tuberculosis cases, aiding in recognizing and understanding this infectious lung disease.Bacterial Pneumonia: This category contains X-ray images associated with bacterial pneumonia cases commonly caused by Streptococcus, assisting in identifying and comprehending this type of lung infection.Fig. 1Sample chest X-rays from each category of Lung disease.Full size imageData preprocessingImage preprocessing is carried out, namely image resizing and normalization. The dataset is assessed and splitting were carried out for the model training and testing as depicted in Table 1.Table 1 Dataset Split for training, validation and testing of the proposed XAI-TRANS model.Full size tableResizing the image: As every image in the collected dataset are of varied sizes, all the images are resized to the fixed resolution of $256\times 256$ pixels to maintain uniform dimensions in the dataset. This resizing operation was an essential step to ensure consistency and compatibility with our chosen model.Normalization: Every pixel value in the images ranges between 0 and 255. Normalizing the pixel values of images by dividing by 255 scales the pixel values to a range of [0,1]. This normalization facilitates faster convergence during model training and helps stabilize and improve the neural network’s performance.Image augmentation techniquesImage augmentation is carried out by applying transformations and modifications like rotations, flips, zooms, shifts, changes in brightness or contrast, etc., to the existing images; it generates new variations of images while preserving the underlying content. By presenting the model with a broader range of variations of the same image, it learns to better recognize objects despite changes in lighting conditions, viewpoints, orientations and other factors. This ultimately leads to more accurate and reliable performance when the model encounters new, unseen data. It can also be used to address class imbalance problems in datasets. The parameters used were a zoom range of 0.2, a shear range of 0.2 and horizontal flipping.Zooming: This method randomly zooms the image in or out of the images. Thus, it amplifies an image to highlight specific components, enabling the model to capture important details.Shearing: Modifying an image along a specific axis creates a different perception angle and improves the model’s understanding of diverse viewpoints.Random flipping: Horizontal flipping is applied in random images to augment dataset diversity further.This image augmentation process unfolds dynamically during the model training phase, occurring on-the-fly as the model is constructed. By incorporating these pre-processing and augmentation techniques, we equip our model to adeptly accommodate variations in image quality, size, and perspective, culminating in a notable enhancement in the accuracy of our lung disease detection capabilities. We assigned equal weightage to our image augmentation process’s flip, shearing, and zooming techniques. Our approach aimed to ensure a balanced application of these augmentation operations, intending to introduce diversity into the training dataset while avoiding overemphasizing any particular operation. By giving equal weightage to these techniques, we sought to maintain a well-rounded and representative dataset that captures various aspects of variation and distortion that the model might encounter in real-world scenarios. This balanced approach helps enhance the model’s robustness and generalization capabilities, making it better equipped to handle various input variations40.Improved U-Net lung segmentationLung anomalies reveal details about a wide range of illnesses which has more overlapping radiological features, such as ground glass opacities (GGO), paving patterns, reverse halo signs etc. To ascertain what type of lung condition the patient is affected with, our study looks at the CXR with a radical approach. An extra lung segmentation component is added to the pipeline to improve the performance of the proposed model’s detection and explanation components. Manual segmentation is not feasible for CXR images since it has air bronchograms and cavitary lesions. Furthermore, errors and contradictions can arise from human annotations. Thus, the proposed Lung segmentation forces the segmentation of multiclass images and its explanation networks to only detect and explain in the lung sector of the images by feeding them veiled lung images. The explanation section’s output is displayed in the entire CXR for a more accurate understanding.Fig. 2Improved U-Net architecture for lung segmentation.Full size imageThe U-Net model is a well-known deep-learning architecture designed for image segmentation tasks, such as lung segmentation. The proposed model is shown in Fig.2. Using chest X-rays with dimensions of $256\times 256\times 3$ as input images produces binary segmentation masks representing the lungs. The U-Net model was trained on a publicly available dataset containing chest X-ray images and their corresponding masks41,42. The U-Net model uses encoder and decoder structures to process the input images and generate a binary mask representing the lungs. The encoder captures high-level abstractions through convolutional and pooling layers, gradually reducing the spatial dimension and increasing the number of filters to extract hierarchical features. On the other hand, the decoder pathway utilizes skip connections that concatenate feature maps from corresponding layers in the encoder, allowing the recovery of the original image resolution while retaining crucial contextual information. Deconvolution layers (transposed convolutions) have been included to enhance the feature maps. The model outputs a single pixel-wise probability map with a sigmoid activation function. The final output is a binary segmentation mask of the lungs. This improved U-Net effectively analyze X-ray images of the lung by segmenting and interpreting all parts, including the infected and uninfected regions in the lung. We found that including the perimeter outside the lung does not significantly affect the analysis. Still, it is crucial to include the entirety for early diagnosis. Segmentation of the lung is carried out before the classification and explanations.Proposed XAI-TRANS workflow architectureTransfer Learning has found significant success with the rise of deep learning and enables the rapid development of high-performing models and many applications in Natural Language Processing (NLP), image classification and reinforcement learning. It can lead to faster convergence and better model performance, mainly when the new task has limited labelled data. Using the learned characteristics from the first task as a starting point allows the model to understand the second task more quickly and effectively. This can prevent overfitting by enabling the model to learn general features relevant to the second task. Further, it decreases the time-consuming training process and allows the creation of a model with high classification performance43. Ultimately, transfer learning not only enhances accuracy in COVID-19 detection by leveraging pre-learned image features but also plays a crucial role in addressing the global impact of the pandemic, potentially saving lives through swift and precise identification of affected individuals44.The proposed XAI-TRANS model used the pre-trained weights instead of training a model from scratch for a new task since it adapts the knowledge by fine-tuning it on the new dataset. This process involves updating the model’s parameters to fit the new task better. Lung disease detection is integrated with the improved U-Net model, classification and the explainability as shown in Fig.3. Here, pre-trained convolutional neural networks such as VGG’s, ResNet50, and Inception etc have learned a rich hierarchy of features from vast and diverse image classes. Transfer learning empowers the reuse of these learned features, allowing the development of highly accurate models with substantially reduced data requirements45.VGG16 was selected for its simplicity and strong performance in transfer learning tasks, particularly with limited datasets. ResNet50 was chosen for its deep architecture and skip connections, which help mitigate vanishing gradients and capture complex features effectively in medical images.Fig. 3Proposed XAI-TRANS architecture for lung disease classification on CXR images.Full size imageVGG16In VGG16, VGG refers to the Visual Geometry Group of the University of Oxford, while the number 16 refers to its layers. It has 13 convolutional layers and 3 fully connected layers. VGG16 network is a deep Convolutional Neural Network with 1000 outcomes and capable of handling $224\times 224$ images. Instead of having many hyper-parameters, VGG16 uses a $3\times 3$ filter in convolutional layers and a stride of 1 in the same padding across the entire network. It uses a max pool layer of $2\times 2$ of stride 2. It follows convolution and maxpool layers arrangement consistently throughout the architecture. VGG16 is widely used for image classification tasks. A pre-trained version of the VGG16 network is trained on over one million images from the ImageNet database. We can use the pre-trained version of VGG16 as the base model and remove its fully connected layer. Here, the CNN is used as a fixed feature extractor and allows additional layers and fine-tuning based on our classification.VGG19To put it simply, VGG19 is a state-of-the-art convolutional neural network. It includes 16 convolutional layers, 3 fully connected layers, 5 max pool layers, and 1 softmax layer; the VGG19 model is rather complex. Any convolution filter’s reception field is just $3\times 3$ in stride 1. Row and column padding are used following convolution to preserve spatial resolution. The largest pooling window is $2\times 2$ strides46. VGG19 borrows its model architecture from its predecessor, VGG16. When pitted against VGG16, VGG19 performs somewhat better. A pre-trained version of VGG19 is also available, which can be loaded and used. VGG16 is one of the popular networks, trained with more than a million images from the ImageNet dataset with 1,000 different classes. Therefore, the model can be applied as a helpful tool for the feature extractor of new images47.ResNet50ResNet (Residual Network) is a type of deep convolutional neural network (CNN) proposed in 2015 by Kaiming He et al.48at Microsoft Research Asia, has 50 layers deep with 48 convolution layers, 1 Max pool layer, and 1 Average pool layer, was pre-trained on the ImageNet dataset, which has 1000 different classes at resolution $224\times 224$. This model has more than 23 million trainable parameters. The ResNet50 architecture consists of four main parts: convolutional layers, identity block, convolutional block, and fully connected layers. The key element of the ResNet50 architecture is skip connections, sometimes referred to as residual connections. They enable the network to learn deeper architectures without encountering the issue known as vanishing gradients, which arises when the deeper layer’s parameter gradients get too small for the network to learn from. The identity and convolutional blocks in ResNet50 make use of skip connect. After the input has passed through multiple convolutional layers, the identity block adds it back to the output. On the other hand, the convolutional block reduces the number of filters before the $3\times 3$ convolutional layer by using a $1\times 1$ convolutional layer before adding the input back to the output. ResNet50 uses skip connections to improve training efficiency and avoid vanishing gradients, allowing the network to learn deeper architectures.InceptionV3The InceptionV3 is a Convolutional Neural network-based deep learning model used for image classification proposed by Szegedy et al. in the paper titled “Rethinking the Inception Architecture for Computer Vision” in 201549. It is an improved version of the Inception V1 model, introduced by Google Net in 2014. The inceptionV3 model is 48 layers deep with under 25 million parameters. It is an improved version of the Inception V1 model, introduced as GoogLeNet in 2014.InceptionV3 is a modified version of the previous Inception architectures focusing on reducing computational power. It introduces inception modules that use multiple parallel filter sizes ($1\times 1, 3\times 3, 5\times 5$) to capture features at different scales. This design improves parameter efficiency by using $1\times 1$ convolutions for dimensionality reduction before more extensive convolutions. Additionally, it includes auxiliary classifiers as regularised during training to encourage the network to learn more discriminative features at intermediate layers.Explainable artificial intelligenceThe study uses XAI techniques such as LIME and Grad-CAM to identify significant features and regions in chest X-ray images with interpretability of the transfer learning model results. Thereby improving the accuracy of diagnoses and facilitating effective communication between medical experts and patients.LIMEIn our research, we utilized LIME to clarify the predictions of the proposed DL model. LIME, which stands for Local Interpretable Model-agnostic Explanations, is a well-known XAI technique intended to offer local interpretability for individual predictions made by any black-box model, regardless of its underlying algorithm. The primary objective of LIME is to approximate the complex model locally with a simpler, interpretable surrogate model, providing insight into the decision-making process in the vicinity of the original input. LIME functions by slightly altering the input data and measuring the impact of these alterations on the original model’s predictions. It then fits a locally interpretable surrogate model (such as logistic regression) to explain the prediction differences. The central idea behind LIME is that if the input data were perturbed just enough, the simpler surrogate model should exhibit similar behaviour to the black-box model in the local area around the original input. By creating heatmaps, LIME identified the essential areas of input images that influenced the model’s decision-making process. These heatmaps assigned relative importance scores to different pixels or regions of the images based on their contribution to the model’s predictions50,51.GRAD-CamGradient-weighted Class Activation Mapping (Grad-CAM) is an XAI technique used primarily for interpreting and visualizing the internal workings of convolutional neural networks. It grants users insight into which parts of an input image bear the greatest weight in the model’s decision-making process, contributing to enhanced trust, transparency, and clarity in AI systems52.Grad-CAM operates based on the output’s gradients for the convolutional layers’ activations. It calculates a gradient of the output class score concerning the feature maps produced by the final convolutional layer before the fully connected layers53. The intermediate feature maps then weigh the gradient to create a class activation heatmap. Regions with higher activation scores in the CAM correspond to areas in the input image that have significantly contributed to the model’s prediction and the results are shown in Fig.4. Grad-CAM is a useful technique for visually interpreting a CNN’s decision-making process54. Applications of Grad-CAM include image classification, object detection, semantic segmentation, medical imaging, and autonomous vehicles. Each application leverages Grad-CAM’s ability to identify the most influential regions in input images for the model’s prediction, helping users refine models, address biases, and improve overall performance.Fig. 4Grad-CAM explanation on chest X-ray images.Full size imageResults and discussionsThis study presents the results for binary classification into COVID-19 and normal and multiclass classification into normal and other diseased lung conditions such as bacterial pneumonia, viral pneumonia, COVID-19, and tuberculosis. Dense objects, like metal blocks and bones, appear white under X-rays since they absorb the radiation. The least dense regions, like the lungs, will be black, and less dense areas will appear in grey tones. The CXR assessment can identify lung complications like any other relevant researches55 implemented.Experimental setupThe experiments were conducted on a system with the following specifications: an Intel Core i5 processor, 8 GB RAM, with NVIDIA Quadro K600 GPU and a 64-bit Windows Operating System. The experiments were conducted using the Python programming language, and libraries such as pandas, numpy, matplotlib,keras and tensorflow were utilized.Evaluation metricsAssessment of Lung Disease detection model performance relies heavily on the confusion matrix. This offers a summary of predicted and actual class labels, facilitating the evaluation of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). We can get insights into the model’s accuracy, precision, recall and F1-score, which are crucial for evaluating the efficacy across different classes.$$\begin{aligned} Accuracy(ACC)=\ \frac{TP\ +\ TN}{TP\ +\ FP\ +\ FN\ +\ TN} \end{aligned}$$(1)$$\begin{aligned} Precision(PRE)\ =\ \frac{TP}{TP\ +\ FP} \end{aligned}$$(2)$$\begin{aligned} Recall(REC)\ =\ \frac{TP}{TP\ +\ FN} \end{aligned}$$(3)$$\begin{aligned} F1\ Score(F1)\ =\ 2\times \frac{Precision\ \times \ Recall}{Precision\ +\ Recall} \end{aligned}$$(4)Tunning the hyper parametersLearning Rate: The learning rate is a hyperparameter that determines the size of the optimisation algorithm’s steps while searching for the minimum loss function. If the learning rate is too small, learning will be slow. Learning may be unstable and fail to converge if it’s too large. We used 0.001 as the learning rate after considering both convergence speed and stability. This choice facilitates gradual parameter adjustments, reducing the risk of overshooting during optimization. Specifically tailored for our COVID-19 detection task, it ensures consistent progress during training, mitigating erratic behaviour.The hyperparameters used in the proposed XAI-TRANS is given in Table 2.Table 2 Hyperparameters used in the implementation of proposed XAI-TRANS model.Full size tableOptimizer: It is an algorithm used to find the optimal set parameters of a neural network during training that results in the best performance and minimizes the error. Optimizers work by iteratively updating the parameters of the neural network based on the gradient of the loss function concerning those parameters. We used Adam as an optimizer since it combines the advantages of two other optimization algorithms, RMSprop and momentum. Adam maintains two moving averages of gradients: the first moment (the mean) and the second moment (the uncentered variance). These moving averages adaptively adjust the learning rate for each parameter, leading to faster convergence and improved performance.Loss function: A loss function measures the difference between a neural network’s predicted output and target values during training. The purpose of training a neural network is to reduce the loss function, which improves the model’s ability to predict accurately. This study uses a loss function called “categorical cross-entropy loss,” which is used for a multi-class classification model with more than one output label and is one-hot encoded.Activation function: Here, the Rectified Linear Unit (ReLU) is the activation function in our densely connected layers. ReLU has become widely adopted in deep learning because it addresses the vanishing gradient problem and expedites convergence. ReLU enables the model to grasp intricate features within the data by introducing non-linearity. This activation function is well-suited for our task as it enhances the network’s capacity to discern and encode complex patterns in radiological images, ultimately enhancing diagnostic accuracy.Analysis on binary and multi-class classification modelTable 3 presents the results for binary classification into COVID-19 and normal. For multiclass classification, the classes are normal, bacterial pneumonia, viral pneumonia, COVID-19, and tuberculosis. The CXR assessment can identify lung complications based on the image appearance. Dense objects, like metal blocks and bones, appear white under X-rays since they absorb the radiation. The least dense regions, like the lungs, will be black, and less dense areas will appear grey. Based on these tones, our classifier distinguishes the different lung conditions.Table 3 Results obtained for the binary and multiclass classification on CXR images.Full size tableThe models were developed using transfer learning, with segmented CXR images. Weighted averages were used in the calculation process to account for sample size and yield a more precise representation of the averages. Segmentation of lungs is carried out to further enhance the performance of DL models56. On investigating the pre-trained models with segmented CXR images as inputs, the InceptionV3 model stands out in detecting COVID-19 cases with 98% accuracy. This remarkable accuracy, with an F1-score of 0.98, shows the model’s exceptional proficiency in accurately classifying COVID-19 positive and negative cases. The ResNet50, VGG16 and VGG19 models also indicate a similar classification performance with an accuracy of 97% and with F1-score of 0.97. Together, these accuracy scores demonstrate the models’ ability to produce precise predictions, pointing to their potential for efficient COVID-19 detection.Fig. 5Confusion matrix of Binary Classification without segmentation.Full size imageFig. 6Confusion matrix of Binary Classification with proposed improved UNet segmentation.Full size imageAs illustrated in Fig.5 for the InceptionV3 model, the matrix illustrates 127 true positive predictions and 126 true negative predictions, with 3 single false positive predictions and only 4 false negative predictions. The performance of the InceptionV3 model is further improved with the proposed improved U-Net segmented CXR images as input, depicted in Fig.6 with 130 true positive predictions and 129 true negative predictions. On evaluating precision, which measures the models’ ability to accurately predict the positive cases among the predicted positives, we observed a perfect precision of 0.98 for InceptionV3. Similarly, recall, which indicates the models’ ability to identify actual positives among all positive cases, of 0.98, reflecting the models’ effectiveness in both true positive prediction and positive class identification.Table 4 Comparative analysis of the metrics with state-of-the-art methods.Full size tableTable 4 compares four DL models for multiclass classification. The evaluation of fine-tuned deep learning models to detect various lung diseases reveals insightful findings, which are summarized. It shows various performance metrics illuminating each model’s effectiveness in accurately identifying lung disease from medical images. Among the various sets of models, the proposed XAI-TRANS stands out as the exemplar with a perfect accuracy rate of 97%, with an F1-score of 0.97, precision of 0.98 and recall of 0.97. The other models also significantly performed well, whereas the ResNet50 with a good accuracy of 93% . VGG19 achieved an accuracy rate of 90%, and VGG16 achieved the lowest accuracy rate of 83%. Unlike another model, the proposed model once again proved the efficacy of the explainable AI and the influence of the improved U-Net segmentation with evident improvements in metrics.Fig. 7Confusion matrix of multi-class classification of lung diseases.Full size imageFor multi-class classification, almost all the five classes are classified correctly by all the models. The inceptionv3 model is classified much better than the remaining models with the least misclassification, which is illustrated in Fig.7.XAI-TRANS results on CXR imagesThe LIME interpretable model was used for the explainability of lung diseases. The main objective is to provide interpretable explanations for individual predictions of complex deep learning models to gain insights into why the model made its prediction, for instance. This step involves analyzing the visualizations and understanding the contributions of individual features or regions to the model’s decision, thereby increasing trust and transparency in the model’s behaviour. LIME takes the predicted image and provides the step-by-step explanation, shown in Fig.8. The input image shows the region highlighted by a yellow arrow by the doctor, and he describes that these are the main regions that caused COVID-19.Fig. 8Explainability LIME representation on CXR images.Full size imageFirst, Choose the instance for which you want to generate an explanation. Next, it generates boundaries in the input image and finds the distance between the actual and predicted feature maps by creating perturbations. This can be done by making small changes to the instance while keeping its overall characteristics intact. Fit an interpretable model to the perturbed samples. Calculate the explanations for the instance based on the coefficients or feature importance values obtained from the interpretable model. Then, it shows the distance explanation by creating heatmaps.Fig. 9Proposed XAI-TRANS with Grad-CAM explanation on chest X-Ray images of lung disease.Full size imageTable 5 Ablation trail results on different groups of the proposed XAI-TRANS model.Full size tableFurther, it creates a visualization highlighting the important features or regions of the instance, represented with blue color in the heatmap and green regions in LIME explanations that influenced the model’s prediction. Grad-CAM is a powerful technique that provides visual insights into why the model makes specific predictions. After the model’s prediction, Grad-CAM can elucidate which regions of the X-ray were instrumental in the model’s decision-making process, as given in Fig 8. Grad-CAM retraces the network’s steps to understand which parts of the image were crucial for the prediction. Next, Grad-CAM computes a weighted sum of these activation maps, where the gradients and the activation values determine the weights. This process generates a heatmap highlighting the regions of the chest X-ray image that significantly contribute to the model’s prediction as resulted in the Fig.9.Fig. 10Accuracy and loss graph for the Proposed XAI-TRANS.Full size imageSignificance of the proposed workThe proposed XAI-TRANS integrated with two techniques: LIME and Grad-CAM. The reason for applying both is to evaluate the classification models thoroughly since they work differently are tested with various ablation trials, as depicted in Table 5.Over each epochs the nodes in the networks learns better and the convergence is evident as shown in the Fig 10. They have some significant differences and highlights: (i) LIME is model-agnostic, and Grad-CAM is model-specific; (ii) in LIME, the granularity of important regions is correlated to the granularity of the super-pixel identification algorithm; (iii) Grad-CAM produces a very smoothed output because the dimension of the last convolution layer is much smaller than the dimension of the original input. (iv) The improved U-Net segmentation extracts distinguished features, and transfer learning handles the feature overlapping issues and cuts downs the rate of misclassification. Thus, we can use a more comprehensive approach to increase the model reliability in a real-world context57. Table 4 confirms that the proposed XAI-TRANS results outperform the other state-of-the-art methods in which improvements are recorded and validated against ground truths.Conclusion and future worksThis work presents an XAI method leveraging transfer learning with CXR images to diagnose various lung diseases, including COVID-19, bacterial pneumonia, viral pneumonia, and tuberculosis. Our proposed model makes accurate predictions and provides explanations, enhancing transparency and trust. Incorporating lung segmentation significantly improves the model’s classification and explanation performance. Heatmaps generated by XAI techniques such as LIME and Grad-CAM offer clear visualizations of the affected regions, aiding non-expert users in understanding the AI’s decisions. While lung segmentation increases processing time and the model’s reliance on a limited CXR dataset poses some challenges, additional data would enhance classification performance. The XAI component, despite its limitations of being a complex model, helps interpret CXR images effectively. When given a healthy CXR image, the model produces heatmaps indicating regions affected by different lung diseases, emphasizing the importance of the classification section. This study highlights the potential of XAI-TRANS approaches in healthcare, particularly for diagnosing lung diseases and providing insights into AI predictions. The noise removal in the CXR images is a another challenge that is to be addressed seriously in the future implementation for early diagnosis in real-time images. The proposed model demonstrates strong potential for real-world applications by aiding healthcare professionals in accurately diagnosing lung diseases, even under non-ideal conditions. The use of transfer learning enables quick and precise analysis, while XAI-TRANS facilitates interpretation of doubtful results, helping identify the correct disease state and severity. This supports the formulation of appropriate treatment plans and drug suggestions, ultimately leading to faster patient recovery with the right medicines in a shorter time.Data availabilityThe datasets generated and/or analysed during the current study are available in the Kaggle repository, can be downloaded from https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.ReferencesMahbub, M. K., Biswas, M., Gaur, L., Alenezi, F. & Santosh, K. Deep features to detect pulmonary abnormalities in chest x-rays due to infectious diseasex: Covid-19, pneumonia, and tuberculosis. Information Sciences 592, 389–401 (2022).Article PubMed PubMed Central Google Scholar Bhandari, M., Shahi, T. B., Siku, B. & Neupane, A. Explanatory classification of cxr images into covid-19, pneumonia and tuberculosis using deep learning and xai. Computers in Biology and Medicine 150, 106156 (2022).Article PubMed PubMed Central Google Scholar Rubin, G. D. et al. The role of chest imaging in patient management during the covid-19 pandemic: a multinational consensus statement from the fleischner society. Radiology 296(1), 172–180 (2020).Article PubMed Google Scholar Ghadezadeh, M. et al. Deep convolutional neural network-based computer-aided detection system for covid-19 using multiple lung scans: Design and implementation study. Journal of Medical Internet Research 23, https://doi.org/10.2196/27468 (2021).Ghadezadeh, M., Aria, M. & Asadi, F. X-ray equipped with artificial intelligence: Changing the covid-19 diagnostic paradigm during the pandemic. BioMed Research International 2021, 16. https://doi.org/10.1155/2021/9942873 (2021).Article CAS Google Scholar Ghadezadeh, M. et al. Efficient framework for detection of covid-19 omicron and delta variants based on two intelligent phases of cnn models. Computational and Mathematical Methods in Medicine 2022, 1–10. https://doi.org/10.1155/2022/4838009 (2022).Article Google Scholar Self, W.H., Courtney, D.M., McNaughton, C.D., Wunderink, R.G. & Kline, J.A. High discordance of chest x-ray and computed tomography for detection of pulmonary opacities in ed patients: implications for diagnosing pneumonia. The American journal of emergency medicine 31(2), 401–405 (2013)Badža, M. M. & Barjaktarović, M. Č. Classification of brain tumors from mri images using a convolutional neural network. Applied Sciences 10(6), 1999 (2020).Article Google Scholar Sarp, S., Zhao, Y., Kuzlu, M.: Artificial intelligence-powered chronic wound management system: Towards human digital twins. PhD thesis, Virginia Commonwealth University (2022)Ayebare, R. R., Flick, R., Okware, S., Bodo, B. & Lamorde, M. Adoption of covid-19 triage strategies for low-income settings. The Lancet Respiratory Medicine 8(4), 22 (2020).Article Google Scholar Mahdy, L.N., Ezzat, K.A., Elmousalami, H.H., Ella, H.A. & Hassanien, A.E. Automatic x-ray covid-19 lung image classification system based on multi-level thresholding and support vector machine. MedRxiv, 2020–03 (2020)Moraes Batista, A.F., Miraglia, J.L., Rizzi Donato, T.H. & Porto Chiavegatto Filho, A.D. Covid-19 diagnosis prediction in emergency care patients: a machine learning approach. MedRxiv, 2020–04 (2020)Apostolopoulos, I. D. & Mpesiana, T. A. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and engineering sciences in medicine 43, 635–640 (2020).Article PubMed PubMed Central Google Scholar Hemdan, E.E.-D., Shouman, M.A., Karar, M.E.: Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv preprint arXiv:2003.11055 (2020)Narin, A., Kaya, C. & Pamuk, Z. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Analysis and Applications 24, 1207–1220 (2021).Article PubMed PubMed Central Google Scholar Bandyopadhyay Sr, S., & DUTTA, S. Machine learning approach for confirmation of covid-19 cases: Positive, negative, death and release (preprint). (No Title) (2020)Khan, A. I., Shah, J. L. & Bhat, M. M. Coronet: A deep neural network for detection and diagnosis of covid-19 from chest x-ray images. Computer methods and programs in biomedicine 196, 105581 (2020).Article PubMed PubMed Central Google Scholar Mahmud, T., Rahman, M. A. & Fattah, S. A. Covxnet: A multi-dilation convolutional neural network for automatic covid-19 and other pneumonia detection from chest x-ray images with transferable multi-receptive feature optimization. Computers in biology and medicine 122, 103869 (2020).Article CAS PubMed PubMed Central Google Scholar Wang, L., Lin, Z. Q. & Wong, A. Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Scientific reports 10(1), 19549 (2020).Article ADS CAS PubMed PubMed Central Google Scholar Sethy, P.K., Behera, S.K., Ratha, P.K. & Biswas, P. Detection of coronavirus disease (covid-19) based on deep features and support vector machine. International Journal of Mathematical, Engineering and Management Sciences 5 (2020).Afshar, P. et al. Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images. Pattern Recognition Letters 138, 638–643 (2020).Article ADS PubMed PubMed Central Google Scholar Horry, M. J. et al. X-ray image based covid-19 detection using pre-trained deep learning models (2020).Singh, M. et al. Transfer learning-based ensemble support vector machine model for automated covid-19 detection using lung computerized tomography scan data. Medical & biological engineering & computing 59, 825–839 (2021).Article Google Scholar Das, N. N., Kumar, N., Kaur, M., Kumar, V. & Singh, D. Automated deep transfer learning-based approach for detection of covid-19 infection in chest x-rays. Irbm 43(2), 114–119 (2022).Article Google Scholar Heidari, M. et al. Improving the performance of cnn to predict the likelihood of covid-19 using chest x-ray images with preprocessing algorithms. International journal of medical informatics 144, 104284 (2020).Article PubMed PubMed Central Google Scholar Loey, M., Smarandache, F. & M. Khalifa, N.E. Within the lack of chest covid-19 x-ray dataset: a novel detection model based on gan and deep transfer learning. Symmetry 12(4), 651 (2020).Nirmala, V., Shashank, H., Manoj, M., Satish, R.G., Premaladha, J.: Skin cancer classification using image processing with machine learning techniques. In: Intelligent Data Analytics, IoT, and Blockchain, pp. 1–15. Auerbach Publications, ? (2023)Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)Ahsan, M. M. et al. Covid-19 symptoms detection based on nasnetmobile with explainable ai using various imaging modalities. Machine Learning and Knowledge Extraction 2(4), 490–504 (2020).Article Google Scholar Li, L. et al. Artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct. Radiology. 200905 (2020).Ukwuoma, C. et al. Automated lung-related pneumonia and covid-19 detection based on novel feature extraction framework and vision transformer approaches using chest x-ray images. Bioengineering. 9, 709 https://doi.org/10.3390/bioengineering9110709 (2022).Addo, D. et al. Evae-net: An ensemble variational autoencoder deep learning network for covid-19 classification based on chest x-ray images. Diagnostics. 12(11) https://doi.org/10.3390/diagnostics12112569 (2022).Monday, H. N. et al. Wmr-depthwisenet: A wavelet multi-resolution depthwise separable convolutional neural network for covid-19 diagnosis. Diagnostics 12(3) (2022).Kuzlu, M., Cali, U., Sharma, V. & Güler, Ö. Gaining insight into solar photovoltaic power generation forecasting utilizing explainable artificial intelligence tools. Ieee Access 8, 187814–187823 (2020).Article Google Scholar Garg, P., Sharma, M. & Kumar, P. Transparency in diagnosis: Unveiling the power of deep learning and explainable ai for medical image interpretation. Arabian Journal for Science and Engineering, 1–17 (2025)Veeramani, N. & Jayaraman, P. Yolov7-xai: Multi-class skin lesion diagnosis using explainable ai with fair decision making. International Journal of Imaging Systems and Technology 34(6), 23214 (2024).Article Google Scholar Das, A. & Rad, P. Opportunities and challenges in explainable artificial intelligence (xai): A survey. arXiv preprint arXiv:2006.11371 (2020).Islam, M. K., Rahman, M. M., Ali, M. S., Mahim, S. & Miah, M. S. Enhancing lung abnormalities detection and classification using a deep convolutional neural network and gru with explainable ai: A promising approach for accurate diagnosis. Machine Learning with Applications 14, 100492 (2023).Article Google Scholar Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5), 1122–1131 (2018).Veeramani, N., Premaladha, J., Krishankumar, R. & Ravichandran, K. S. Hybrid and automated segmentation algorithm for malignant melanoma using chain codes and active contours. In: Deep Learning in Personalized Healthcare and Decision Support, pp. 119–129. Elsevier, ??? (2023)Jaeger, S. et al. Automatic tuberculosis screening using chest radiographs. IEEE transactions on medical imaging 33(2), 233–245 (2013).Article ADS PubMed Google Scholar Candemir, S. et al. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE transactions on medical imaging 33(2), 577–590 (2013).Article ADS PubMed PubMed Central Google Scholar Sarp, S. et al. An xai approach for covid-19 detection using transfer learning with x-ray images. Heliyon 9(4) (2023)Talukder, M. A., Layek, M. A., Kazi, M., Uddin, M. A. & Aryal, S. Empowering covid-19 detection: Optimizing performance through fine-tuned efficientnet deep learning architecture. Computers in Biology and Medicine 168, 107789 (2024).Article PubMed Google Scholar Premaladha, J., Surendra Reddy, M., Hemanth Kumar Reddy, T., Sri Sai Charan, Y. & Nirmala, V. Recognition of facial expression using haar cascade classifier and deep learning. In: Inventive Communication and Computational Technologies: Proceedings of ICICCT 2021, pp. 335–351 (2022). SpringerV, N., K, J.A.S., G, A., N, S.S., S, P.: Automated template matching for external thread surface defects with image processing. In: 2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–6 (2023). https://doi.org/10.1109/ICECCT56650.2023.10179653Nirmala, V. An automated detection of notable abcd diagnostics of melanoma in dermoscopic images. In: Artificial Intelligence in Telemedicine, pp. 67–82. CRC Press, ??? (2023)He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Ahsan, M. M. et al. Deep transfer learning approaches for monkeypox disease diagnosis. Expert Systems with Applications 216, 119483 (2023).Article PubMed PubMed Central Google Scholar Främling, K., Westberg, M., Jullum, M., Madhikermi, M. & Malhi, A.. Comparison of contextual importance and utility with lime and shapley values. In: International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, pp. 39–54 (2021). SpringerSong, D. et al. A new xai framework with feature explainability for tumors decision-making in ultrasound data: comparing with grad-cam. Computer Methods and Programs in Biomedicine 235, 107527 (2023).Article PubMed Google Scholar Zou, L. et al. Ensemble image explainable ai (xai) algorithm for severe community-acquired pneumonia and covid-19 respiratory infections. IEEE Transactions on Artificial Intelligence 4(2), 242–254 (2022).Article Google Scholar Marmolejo-Saucedo, J. A. & Kose, U. Numerical grad-cam based explainable convolutional neural network for brain tumor diagnosis. Mobile Networks and Applications 29(1), 109–118 (2024).Article Google Scholar Shah, P.M., Zeb, A., Shafi, U., Zaidi, S.F.A., Shah, M.A.: Detection of parkinson disease in brain mri using convolutional neural network. In: 2018 24th International Conference on Automation and Computing (ICAC), pp. 1–6 (2018). https://doi.org/10.23919/IConAC.2018.8749023Veeramani, N., Jayaraman, P., Krishankumar, R., Ravichandran, K. S. & Gandomi, A. H. Ddcnn-f: double decker convolutional neural network’f’feature fusion as a medical image classification framework. Scientific Reports 14(1), 676 (2024).Article ADS CAS PubMed PubMed Central Google Scholar Teixeira, L. O. et al. Impact of lung segmentation on the diagnosis and explanation of covid-19 in chest x-ray images. Sensors 21(21), 7116 (2021).Article ADS CAS PubMed PubMed Central Google Scholar Download referencesAcknowledgementsOur earnest thanks to SASTRA Deemed University for providing research facilities at the Computer Vision and Soft Computing Laboratory to proceed with the research.FundingThe author(s) received no financial support for this article’s research, authorship, and/or publication.Author informationAuthors and AffiliationsSchool of Computing, SASTRA Deemed University, Thirumalaisamudram, 613401, Thanjavur, Tamilnadu, IndiaNirmala Veeramani, Reshma Sherine S.A, Sakthi Prabha S, Srinidhi S & Premaladha JayaramanAuthorsNirmala VeeramaniView author publicationsSearch author on:PubMed Google ScholarReshma Sherine S.AView author publicationsSearch author on:PubMed Google ScholarSakthi Prabha SView author publicationsSearch author on:PubMed Google ScholarSrinidhi SView author publicationsSearch author on:PubMed Google ScholarPremaladha JayaramanView author publicationsSearch author on:PubMed Google ScholarContributionsThe contributions for this research article are as follows: formal analysis, Conceptualization, methodology, algorithm implementation, draft preparation, visualization, done by N.V., R.S., S.P., and S.S. Validation, formal analysis, supervision, and project administration by N.V. and P.J. All authors have read and agreed to the published version of the manuscript.Corresponding authorsCorrespondence to Nirmala Veeramani or Premaladha Jayaraman.Ethics declarationsCompeting interestsThe authors declare no competing interests.Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary InformationSupplementary Information.Rights and permissionsOpen Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.Reprints and permissionsAbout this article