Multi channel fusion diffusion models for brain tumor MRI data augmentation

Wait 5 sec.

IntroductionBrain tumors pose a significant threat to human health, and their early diagnosis is crucial for improving patient prognosis. Medical imaging techniques, such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) scans, are essential tools for brain tumor diagnosis. However, the scarcity of high-quality medical image data for brain tumors has become a major bottleneck in the development and application of medical image analysis models. This data scarcity is primarily due to patient privacy concerns, the time-consuming and costly nature of data collection and labeling, and the inherent complexity of medical imaging. As a result, the advancement of artificial intelligence (AI), particularly deep learning technologies, in the field of medical imaging has been significantly hampered.In recent years, deep learning models have demonstrated remarkable potential in medical image processing, showing superior performance in tasks such as image classification1,2, segmentation3,4, and anomaly detection5. These models excel at automatically learning complex feature representations from raw data, eliminating the need for manual feature engineering by experts6,7,8. However, in the field of medical image analysis, especially in brain cancer research, the diversity and richness of data are crucial for the training of models. However, due to the rarity of brain cancer cases and the sensitivity of patient data, high-quality MRI image data of brain cancer are often difficult to obtain. Furthermore, the existing datasets have certain limitations in terms of the morphology, size and location of tumors. As a powerful generative model, the diffusion model can generate new and diverse image data by learning the distribution of data, thereby effectively alleviating the problem of data scarcity. Its advantage lies in the ability to gradually remove the noise in the images and generate high-quality and realistic brain cancer images. These images can enhance the diversity and complexity of the data set, making the trained model more robust and have stronger generalization ability.To address this challenge, researchers have explored various image generation models to generate high-quality medical data for data augmentation. Traditional methods, such as Generative Adversarial Networks (GANs)9 and Variational Autoencoders (VAEs)10, have been widely used but suffer from significant limitations. GANs often encounter issues like gradient vanishing, mode collapse, and unstable training, resulting in generated images with limited diversity and variety. VAEs, while valuable in data generation, produce blurry and inadequately expressive images for complex models due to their numerous assumptions and constraints.Recently, Denoising Diffusion Probabilistic Models (DDPMs)11 have gained substantial attention in generative modeling due to their ability to generate high-quality image samples. DDPMs have been successfully applied in various domains, including super-resolution12, semantic segmentation13,14, anomaly detection5,15, and text-to-image generation16. However, their application in medical image analysis, particularly for brain tumor MRI data augmentation, remains underexplored. Existing diffusion-based models also lack a comprehensive approach to address the specific challenges of medical image data, such as spatial continuity and the need for accurate tumor region generation.To fill this gap, we propose a novel data augmentation technique based on a diffusion model, referred to as the Multi-Channel Fusion Diffusion Model (MCFDiffusion). This method tackles the issue of data imbalance by converting healthy brain MRI images into images containing tumors, thereby enabling deep learning models to achieve better performance and assisting physicians in making more accurate diagnoses and treatment plans. Our approach introduces a multi-channel input mechanism and fuses defective areas with healthy images, enhancing the applicability of diffusion models in medical imaging. Through comprehensive experiments on a publicly available brain tumor MRI dataset, we demonstrate that our method significantly improves the performance of image classification and segmentation tasks, outperforming other state-of-the-art image generation models in terms of image quality and clinical relevance.In summary, our study addresses the critical issue of data scarcity and imbalance in brain tumor MRI datasets, providing a robust and effective solution for data augmentation. By leveraging the advancements in diffusion models and introducing innovative techniques, we aim to bridge the gap in the literature and contribute to the development of more accurate and reliable medical image analysis models.Related workIn the field of medical image analysis, the quality and quantity of data are critical for the development and validation of algorithms. However, access to large, high-quality, and accurately labeled medical image datasets remains a significant challenge due to concerns over privacy protection, ethical review processes, and the high costs involved. Image generation models offer an innovative solution, capable of automatically generating realistic medical images to expand datasets and facilitate algorithm training. In light of the scarcity and difficulty in obtaining medical image data, many researchers have focused on image-based generation techniques.Historically, many researchers have turned to image generation methods to address the issue of limited access to medical data. In 2017, Pedro Costa et al.17 combined adversarial autoencoders and generative adversarial networks (GANs) to create an end-to-end system that generates retinal images and corresponding vascular networks from random samples. The synthetic data generated was compared with a model trained on real data, demonstrating that the generated images could be used to train medical image analysis algorithms to a certain extent. In 2020, Li et al.18, from Ocean University of China and Xidian University, used a GAN model based on Tuson graphs to augment the FLAIR dataset, achieving a Dice similarity coefficient of 0.76. In the same year, Sun et al.19 from Fudan University and Nanjing Agricultural University applied an end-to-end GAN framework to augment a cancer dataset, improving the Dice similarity coefficient for image segmentation by 0.16 to 0.17. Around the same time, Deepak and Ameer20 used a multi-scale GAN model to expand a cancer dataset containing three categories, achieving a classification accuracy of 93.1%. In 2021, Baeile et al.21 used GANs to augment a binary classification cancer dataset, achieving a Dice score of 81%. In 2022, Jha et al.22 utilized a conditional generative adjunctive model to augment a four-class cancer dataset, achieving impressive results in metrics such as MCC, specificity, F1-score, accuracy, and sensitivity, with values of 0.9846%, 99.85%, 98.72%, 98.79%, and 98.77%, respectively.In summary, current research in medical image generation demonstrates that image generation plays a significant role in medical data augmentation. However, most studies focus on adversarial models, and many existing methods are primarily aimed at classification tasks. In a review of medical data augmentation, Evgin Goceri23 observed that most current image segmentation data augmentation methods still rely on simple traditional techniques such as rotation, inversion, translation, and contrast adjustment. Only a few methods use image generation-based data augmentation techniques, especially those based on generative adversarial models.Most of the existing medical image data augging techniques are targeted at the field of medical classification such as20,21,22, etc., but rarely touch upon the field of medical image segmentation. In one of the few can do data for medical image segmentation is a significant portion of augmented method based on the traditional data such as image rotation and cutting augmented method such as24,25. The medical image augmentation method proposed in this paper changes the previous traditional augmentation methods. Instead, it innovatively proposes a brain cancer data augmentation algorithm based on DDPMs. This algorithm can improve various models simultaneously in the classification model and the segmentation model. Although the present some data augmented method based on GANs17,26, but are based on the single figure birth figure method, Training with the mask of the training set and augmenting the data with the mask of the training set, as well as training GANs, can also cause model collapse, resulting in too low diversity of the generated data and making it impossible to effectively train the model with a large amount of augmented data. The method we proposed can transform healthy data into unhealthy data, which is conducive to improving the model’s ability to recognize healthy images and unhealthy images, as well as healthy regions and unhealthy regions.The contributions of our study are as follows: (i) We introduce a novel Multi-Channel Fusion Diffusion Model (MCFDiffusion) specifically designed to augment brain tumor MRI datasets, addressing the critical issue of class imbalance in medical imaging data. This model leverages diffusion-based data augmentation techniques to transform healthy brain MRI images into those depicting tumors, thereby enriching the dataset and improving the performance of deep learning models in medical image analysis. (ii) The versatility and effectiveness of our MCFDiffusion model are demonstrated through comprehensive experiments on a publicly available brain tumor MRI dataset. We compare the performance of image classification and segmentation tasks before and after augmentation, showing significant improvements in both accuracy and segmentation metrics. We also make our code publicly available to facilitate further research and establish a baseline for future studies. (iii) Our method outperforms other state-of-the-art image generation models in terms of image quality, as evidenced by the lowest Fréchet Inception Distance (FID) scores. This indicates that the images generated by our MCFDiffusion model are closer to the real data distribution compared to other methods, showcasing the superior quality of our generated images. (iv) To assess the clinical relevance of our augmented images, we conduct a thorough evaluation of their impact on both image classification and segmentation models. The results demonstrate consistent improvements across various models, including ResNet, U-Net, SegNet, and Mask R-CNN, highlighting the broad applicability of our data augmentation technique and its potential to enhance the performance of a diverse array of models. (v) We explore the potential of pre-training on composite images to improve segmentation and classification models in data-limited settings. Our findings suggest that pre-training with augmented images can significantly enhance the generalization capabilities of these models, particularly in low-data regimes, which is a common challenge in medical imaging. This has important implications for clinical diagnosis and treatment planning of brain tumors, where high-quality datasets are often scarce.Material and methodsDatasetThe dataset used in our analysis is the LGG Segmentation Dataset, sourced from Kaggle27. This dataset consists of brain MRI images, each paired with manual FLAIR anomaly segmentation masks. It includes data from 110 patients, aged between 20 and 75 years, with a nearly equal male-to-female ratio. The dataset comprises a total of 3,642 images, of which 2,269 are labeled as healthy and 1,373 contain tumors. The data was carefully partitioned into training, validation, and test sets with an 8:1:1 ratio, respectively. Specifically, the training set consists of 2,913 images, the validation set includes 365 images, and the test set contains 364 images. It is clear that the dataset exhibits a significant imbalance in the distribution of healthy and tumor images. To address this issue, we applied our proposed methodology to balance the dataset, thereby enhancing the robustness of our subsequent analyses.DDPMThe Denoising Diffusion Probabilistic Model (DDPM) consists of two main processes: the Forward Process and the Reverse Process. The Forward Process is designed as a Gaussian Markov process, where the image state $\textbf{x}_t$ at each step is conditionally dependent only on the image state $\textbf{x}_{t-1}$ at the previous step. During this process, Gaussian noise is progressively added to the image at each timestep, gradually transforming the original image into pure noise.In the original DDPM paper, T was set to 1000, meaning that it takes 1000 steps of denoising to generate an image. Subsequent papers on diffusion models have mostly followed suit by setting T to 1000, and we are no exception, also setting T to 1000 steps.Since the noise in the forward process is artificially added, the forward process is known. Therefore, a formula can be derived from the forward process to directly add noise to $x_0$ to obtain $x_t$. The formula for the forward process is shown in Eq. (1).$$\begin{aligned} \textbf{x}_t = \sqrt{\bar{\alpha }_t} \textbf{x}_0 + \sqrt{1 - \bar{\alpha }_t} \epsilon \end{aligned}$$(1)In the DDPM, $\textbf{x}_t$ represents the data at time step t, while $\textbf{x}_0$ denotes the original data. The noise $\epsilon$ is sampled from the standard normal distribution $\mathcal {N}(0, \textbf{I})$. The term $\bar{\alpha }_t$ is defined as the cumulative product of $\alpha _s$ over all timesteps up to t, i.e., $\bar{\alpha }_t = \prod _{s=1}^t \alpha _s,$where $\alpha _s = 1 - \beta _s$ and $\beta _s$ is the noise level parameter.The Reverse Process is a procedure in which the noise added during the Forward Process is gradually removed in order to recover the original image from the noisy data.$$\begin{aligned} p_{\theta }(\textbf{x}_{t-1} | \textbf{x}_t) = \mathcal {N}(\textbf{x}_{t-1}; \mu _{\theta }(\textbf{x}_t, t), \Sigma _{\theta }(\textbf{x}_t, t)) \end{aligned}$$(2)In the Reverse Process, $p_{\theta }(\textbf{x}_{t-1} | \textbf{x}_t)$ represents the conditional distribution of the data $\textbf{x}_{t-1}$ at time step $t-1$ given the data $\textbf{x}_t$ at time step t. The mean $\mu _{\theta }(\textbf{x}_t, t)$ and variance $\Sigma _{\theta }(\textbf{x}_t, t)$ of this conditional distribution are learned by the neural network with parameters $\theta$.In DDPM, the variance term $\widetilde{\beta }_t$ is deterministic and does not need to be learned, while the mean $\mu _{\theta }$ must be learned by a neural network. The goal of this neural network is to model the probability distribution of $\textbf{x}_{t-1}$ based on the current data $\textbf{x}_t$.The image generation process of DDPM is shown in Fig. 1.Fig. 1This diagram illustrates the forward and reverse processes in diffusion models. The forward process (top) shows the progression from the original image $x_0$ to a noisy image $x_T$ through a series of steps, with the noise gradually increasing. The reverse process (bottom) demonstrates the denoising steps, starting from the noisy image $x_T$ and moving back to the original image $x_0$. The diagram includes equations $q(x_t | x_{t-1})$ for the forward process and $p_\theta (x_{t-1} | x_t)$ for the reverse process, highlighting the probabilistic nature of the transformations.Full size imageDDIMBuilding upon DDPM foundation, Jiaming Song et al. introduced Denoising Diffusion Implicit Models (DDIMs)28, which enhance DDPMs by transforming the Markov process into a non-Markov process, thereby accelerating image generation.The Forward Process of DDIM is exactly the same as that of DDPM. DDIM accelerates the sampling process by transforming the Markov chain process of DDPM into a non-Markov chain process. Additionally, the diversity of the generated images can be controlled by adjusting the noise coefficient.$$\begin{aligned} \hspace{2cm} \textbf{x}_{t-1} = \underbrace{ \begin{aligned} \sqrt{\alpha _{t-1}} \left( \frac{\textbf{x}_t - \sqrt{1 - \alpha _t} \epsilon _{\theta }^{(t)}(\textbf{x}_t)}{\sqrt{\alpha _t}} \right) \end{aligned} }_{\text {predicted}\,x_0}+ \underbrace{ \begin{aligned} \sqrt{1 - \alpha _{t-1} - \sigma _t^2} \cdot \epsilon _{\theta }^{(t)}(\textbf{x}_t) \end{aligned} }_{\text {direction pointing to}\,x_t}+ \underbrace{ \begin{aligned} \sigma _t \epsilon _t \end{aligned} }_{\text {random noise}} \end{aligned}$$(3)In DDIM, $\textbf{x}_{t-1}$ represents the data at time step $t-1$, and $\epsilon _{\theta }^{(t)}(\textbf{x}_t)$ is the noise predicted by the model at time step t. $\sigma _t$ is an adjustable variance parameter.The reverse process in DDIM is non-Markovian, meaning it does not follow the Markov property of the original DDPM. This allows for faster sampling, as the model does not need to wait for the entire Markov chain to reach equilibrium.FusionWe designed a simple image fusion algorithm that can fuse MRI images of tumor regions with healthy brain MRI images to generate MRI images of brain tumors. First, we add noise to a healthy brain MRI image for L steps. Then, we extract the tumor region from an MRI image with a tumor and add noise to it for L steps. Finally, we add the two noised results together and use DDIM to denoise and obtain the final result. We made some changes to the noising process. Due to excessive noise at the end of DDPM’s forward process, it contributes little to sample quality. To address this, we adopt the method proposed by Alex Nichol and Prafulla Dhariwal29 to enhance the forward process. Figure 2 visually compares the original and improved forward processes. This improvement is achieved by modifying $\bar{\alpha }_t$ to replace the original linear schedule with a cosine schedule, as shown in formula 4.This improvement allows for less detail loss in the image at each step of noising, making the network training more detailed.$$\begin{aligned} \hspace{5cm} f(t) = \cos \left( \frac{t/T+s}{1+s}\cdot \frac{\pi }{2}\right) ^2, \bar{\alpha }_t = \frac{f(t)}{f(0)} \end{aligned}$$(4)Fig. 2Latent samples from linear (top) and cosine (bottom) schedules respectively at linearly spaced values of t from 0 to T. In the last quarter of the linear schedule, latents are almost purely noise, while the cosine schedule introduces noise more slowly.Full size imageThe detailed image fusion method is shown in Fig. 3. Since two noisy images are being added, a simple addition of noise would result in an actual increase in the noise level by a factor of $2 \times L$. To prevent affecting the quality of the original image, we modify the noise equation, as illustrated in Eq. (5).$$\begin{aligned} \hspace{6cm} \textbf{x}_t = \sqrt{\bar{\alpha }_t} \textbf{x}_0 + \frac{1}{2} \sqrt{1 - \bar{\alpha }_t} \epsilon \end{aligned}$$(5)Fig. 3This image depicts the process of generating a brain tumor MRI image through fusion and denoising. The top row shows a healthy brain MRI image being added noise L steps, resulting in a noisy image. The bottom row shows a tumor area being extracted from a tumor MRI image, also added noise L steps, and then combined with the noisy healthy image. The combined image is then denoised L steps to produce the final result, which is a brain MRI image with a tumor.Full size imageThere’s an alternative fusion method. First, extract the tumor area images from brain tumor MIR. Then, combine them with healthy brain MRI images. Finally, use the trained U - Net model for step by denoising to get the final image. As this method adds noise only once, it avoids the repeated noise addition issue of the prior method. Thus, no change to the noise addition formula is needed. In other words, noise addition here is done solely via Eq. (1).The detailed of this image fusion method is shown in Fig. 4.Fig. 4This figure illustrates a fusion and denoising based process for generating brain tumor MRI images. The upper part of the image shows MRI scans of healthy brains. The lower part displays the tumor area extracted from unhealthy brain MRI images. This extracted area is then combined with healthy brain images. Finally, the resulting image undergoes L step denoising to produce the final image..Full size imageThe final effects of these two image fusion methods are basically the same, and either method can be chosen as needed.Multi-channel inputAfter performing simple fusion on the images, we found that using this method alone could not generate images of sufficient quality to achieve the purpose of data augmentation. Therefore, we utilized the characteristic of MRI images, that is, MRI images have spatial continuity. We converted the target image and its adjacent spatially continuous images into three-channel tensors and then stacked them together to form a nine-channel tensor as the input. Moreover, the noise addition and removal operations of the diffusion model only act on the target tensor located in the middle position. The output is the predicted noise, which is used to progressively denoise the input noisy tensor, forming $x_{t-1}$ from $x_t$ step by step using Eq. 3. As shown in Fig. 5, compared to using ordinary three-channel inputs, the images generated by denoising with multi-channel inputs have much higher quality.Fig. 5The image is divided into three parts: left, middle, and right. The rightmost part is the original image. The left part shows the result of denoising using a diffusion model with three-channel input after adding noise to the original image, while the middle part shows the result of denoising using a diffusion model with nine-channel input. It can be clearly seen that the method using multi-channel input can effectively improve the quality of the generated images through denoising by the diffusion model, both in terms of overall appearance and details.Full size imageMulti-channel fusion diffusionFig. 6This diagram illustrates the network architecture used in the proposed method. It shows the process of taking a 9-channel tensor as input, which is generated by combining healthy brain MRI images and tumor area MRI images. The input is then processed through an encoder and decoder (U-Net structure) to output the noise, which is used for denoising. The diagram also includes the condition ”if $t > 0$, denoising, $t = t-1$” to indicate the iterative denoising process until the final result is obtained when $t = 0$.Full size imageThe MCFDiffusion we propose has two key components: Fusion and Multi-Channel. By combining these two simple steps, a healthy image can be transformed into an image depicting a brain tumor.First, we randomly select three spatially continuous MRI images of healthy brains and three of brain tumor areas. Noise is added only to the middle image in one step, then the two groups are overlaid to form a 9-channel tensor for network input. U - Net predicts the noise in the L - 1 order images of the intermediate images. Here, the classifier - guided diffusion model by Dhariwal, Prafulla, Nichol, Alex, etc30 is applicable. This classifier-guidance method integrates classifier information into the diffusion model’s generation process, especially during noise processing in denoising. In reverse denoising, a trained classifier assesses the probability of the generated sample (the intermediate denoising state) belonging to the target category. Based on this probability, the denoising direction is adjusted by calculating the classifier’s gradient, which shows how to modify the sample to approach the target category. This gradient is combined with the diffusion model’s denoising update step to guide sample evolution toward the target category. Noise adjustment in each step is as per Eq. 6, then this noise is used to denoise the target image to step L - 1. Repeating this until the timestamp reaches 0 yields the final image. The overall process is shown in Fig. 6.$$\begin{aligned} \hspace{6cm} \epsilon ' = \epsilon _{\theta } - w\sqrt{1-\bar{\alpha }_t}\nabla \log p_{\phi }(y|x_t) \end{aligned}$$(6)In this equation, $\epsilon '$ represents the perturbed noise after introducing class-conditional guidance. It steers the reverse diffusion process towards generating samples that match the target class y. $\epsilon _{\theta }$ is the original noise estimated by the diffusion model at time step t, which is crucial for maintaining the structural information of the image during the generation process. The parameter w determines how strongly the diffusion process is influenced by the classifier. A larger w means more emphasis on class-conditional information, which can endow generated images with more distinctive class-specific features. However, it might also increase the risk of over-regularization. $\bar{\alpha }_t$ is the cumulative product of the noise schedule up to time step t. It reflects the overall noise level at time t and helps balance the contribution of the classifier gradient with respect to the current noise level in the diffusion process. Additionally, $\nabla \log p_{\phi }(y|x_t)$ is the gradient of the classifier’s log-probability with respect to the input $x_t$ and class y. This gradient offers directional information on how to adjust the input to enhance the likelihood of it belonging to class y, thus effectively guiding the diffusion process to generate images that better fit the target class. The validity of Eq. (6) originates from the concept of integrating the denoising capability of diffusion models with the class-discriminative power of classifiers. By adding the scaled gradient of the classifier’s log-probability to the original noise estimate, the model is encouraged to denoise the image in a way that aligns with the target class characteristics, thereby improving the quality of class-conditional image generation by leveraging the complementary strengths of diffusion models and classifiers.Algorithm 1MCFDiffusionFull size imageThe computational complexity of the network used in this study is shown in Table 1, where the classifier can be replaced by any classification network.Table 1 This table presents the parameter counts (params) and floating-point operations (FLOPs) of two models: U-Net and classifier.Full size tableResultsTraining processIn this study, we implement and train the MCFDiffusion model using the PyTorch framework, which is widely recognized for its flexibility and efficiency in building deep learning models. To ensure smooth training and fast computation, we utilized a hardware configuration that includes an Intel(R) Xeon(R) Silver 4316 CPU. This processor is designed to handle parallel processing tasks efficiently, making it suitable for the computational demands of training large neural networks. In addition, we employed an NVIDIA Tesla A30 GPU, which boasts approximately 24 GB of high-performance video memory. This GPU significantly accelerates the graphics computation, especially for tasks that involve large image data, such as those encountered in medical image processing and analysis.The core architecture of the diffusion model is based on a U-Net structure, which has proven to be highly effective in segmentation tasks due to its encoder-decoder design and skip connections. For this model, we set the number of channels in the U-Net to 32, and each downsampling stage includes 4 residual blocks. The residual blocks help in preserving the gradient flow during training, which mitigates the problem of vanishing gradients, particularly in deeper networks. The model is optimized using the AdamW optimizer, which combines the benefits of adaptive moment estimation with weight decay regularization, ensuring better convergence and preventing overfitting. The model is trained for a total of 3000 epochs across all datasets, with a learning rate of $1 \times 10^{-3}$ and a batch size of 16. These hyperparameters were chosen based on preliminary experiments to balance training time and model performance.In addition to training the diffusion model, we also trained a classifier as part of the overall pipeline. This classifier is designed to further improve the model’s performance by facilitating better image classification based on the learned features. The classifier is built using the downsampling part of the U-Net architecture, along with the skip connections that are integral to the U-Net’s design. These skip connections allow the model to retain spatial information from earlier layers, which is particularly important for tasks like segmentation and classification where spatial precision is essential. The classifier is optimized using the Adam optimizer with the same learning rate of $1 \times 10^{-3}$ and a larger batch size of 32, as it generally requires more stable updates compared to the diffusion model. This classifier is trained for 500 epochs, with the two target classes being healthy and unhealthy, representing the classification task of identifying whether a brain MRI contains a tumor or not.Together, the training of both the MCFDiffusion model and the classifier allows us to address the task of brain tumor segmentation and classification effectively. The combination of these models helps to enhance the dataset through data augmentation, leading to improved generalization capabilities, particularly when working with imbalanced datasets common in medical image analysis.Generated image qualityWe use this method to compare with other state-of-the-art image generation models, utilizing the Fréchet Inception Distance (FID) as the comparison metric. The calculation of FID is given by Eq. (7):$$\begin{aligned} \hspace{5cm} {FID}^2 = \Vert \mu _1 - \mu _2\Vert ^2 + {Tr}(\Sigma _1 + \Sigma _2 - 2(\Sigma _1\Sigma _2)^{1/2}) \end{aligned}$$(7)Where: $\mu _1$ and $\mu _2$ represent the mean vectors of the real data distribution and the generated model, respectively. $\Sigma _1$ and $\Sigma _2$ represent the covariance matrices of the real data and the generated model, respectively.This formula computes the FID value by calculating the mean and covariance matrix of both distributions, and then evaluating the Fréchet distance between them. A smaller FID value indicates that the generated images are closer to the real data distribution.Table 2 This table lists the FID (Fréchet inception distance) scores of various generative models, which are used to assess the quality and diversity of generated images.Full size tableAs shown in Table 2, the images generated by our method exhibit relatively higher quality compared to other approaches. Additionally, we provide some example images in Fig. 7 for a more intuitive observation of the MRI images produced by this method.Fig. 7This image displays multiple examples of generated brain MRI images. Each row represents a different set of images, showing variations in tumor size, shape, and location. The images demonstrate the diversity and quality of the generated MRI images, highlighting the effectiveness of the proposed method in creating realistic brain tumor images. The first and third columns display the original images, while the second and fourth columns show the brain tumor images transformed by MCFDiffusion.Full size imageIn addition to assessing the Fréchet inception distance (FID) values, we conducted a comprehensive evaluation of the methodologies under consideration using Magnetic Resonance Imaging (MRI) signal intensity values. Initially, the MRI data were converted to grayscale to ensure uniform analysis. Subsequently, we determined the mean signal intensity at each pixel along both the vertical and horizontal axes across the entire test dataset. Figures 8 and 9 illustrate the distribution of these mean signal intensity values.The horizontal axis (X-axis) represents the pixel position, while the vertical axis (Y-axis) corresponds to the mean signal intensity. The blue curve delineates the signal intensity distribution of the RealMRI dataset, serving as a benchmark for comparison. Our proposed method is depicted by the red curve, with the remaining curves representing alternative approaches.Our analysis revealed that the signal intensity distribution trends closely mirrored those of the RealMRI dataset for all methods except CycleGAN, which exhibited noticeable deviations. Furthermore, the Pix2Pix method demonstrated a degree of instability in its distribution curve. Notably, our method’s signal intensity distribution curves showed a marked similarity to the RealMRI dataset when compared with other methodologies.Despite this, we observed that the signal intensity values derived from the other methods were marginally lower than those of RealMRI. This suggests that our approach generates MRI images that closely approximate the characteristics of RealMRI, thereby enhancing the authenticity of the generated images.Fig. 8This image shows the average horizontal intensity profile of different generated MRI images compared to the real MRI image. The left side displays a sample MRI image with a tumor, and the right side shows the intensity profiles. The red line represents the MCFDiffusion method, which closely follows the blue line of the real MRI image, indicating that the generated images have similar intensity distributions to real images.Full size imageFig. 9This image shows the average vertical intensity profile of different generated MRI images compared to the real MRI image. The left side displays a sample MRI image with a tumor, and the right side shows the intensity profiles. Again, the red line (MCFDiffusion) closely matches the blue line (real MRI), further demonstrating the high quality of the generated images.Full size imageFor image classification modelWe categorized the dataset into two distinct classes: healthy brain images and brain tumor images. Initially, the training set consisted of 1815 healthy brain images and 1098 brain tumor images, resulting in a disparity of 717 more healthy brain images than brain tumor images. To achieve class balance, each generation model produced 717 additional brain tumor images.The training was conducted using the ResNet18 , ResNet3435, densenet12136 and shufflenet37 models, and the efficacy of the generated data was assessed by comparing the training outcomes, as shown in Table 3. It is evident that data augmentation using this method significantly improves the accuracy of the ResNet models, yielding relatively favorable results.Table 3 This table shows the classification accuracy of four models—ResNet18, ResNet34, DenseNet121, and ShuffleNet—on different datasets.Full size tableBased on the results, the use of this data augmentation method has led to a noticeable improvement in all models. This enhancement further substantiates the effectiveness of employing this method for augmenting data in image classification models.For image segmentation modelTo demonstrate the versatility of our method, we conducted a comparative analysis using multiple models and evaluation metrics. Specifically, we employed U-Net38, SegNet39, and Mask R-CNN40 to train the data and assessed the impact of each model on the expansion of our image dataset. The performance of the final training outcomes was evaluated using four metrics: Dice Coefficient (DC), Jaccard Coefficient (JC), Hausdorff Distance at 95% (HD95), and Average Symmetric Surface Distance (ASD).$$\begin{aligned} \text {DC} = \frac{2 \times |X \cap Y|}{|X| + |Y|} \end{aligned}$$(8)Where X is the predicted segmentation result, Y is the true segmentation result, and $|X \cap Y|$ is the size of the intersection of X and Y, while |X| and |Y| represent the sizes of X and Y, respectively.$$\begin{aligned} \text {JC} = \frac{|X \cap Y|}{|X \cup Y|} \end{aligned}$$(9)Where X is the predicted segmentation result, Y is the true segmentation result, $|X \cap Y|$ is the size of the intersection of X and Y, and $|X \cup Y|$ is the size of the union of X and Y.$$\begin{aligned} \text {HD95} = \text {95th percentile of } \left\{ \max _{x \in X} \min _{y \in Y} \Vert x - y\Vert , \max _{y \in Y} \min _{x \in X} \Vert x - y\Vert \right\} \end{aligned}$$(10)Where X is the predicted segmentation result, Y is the true segmentation result, and $\Vert x - y\Vert$ is the Euclidean distance between point x and point y. HD95 represents the 95th percentile of all point-to-point distances.$$\begin{aligned} \text {ASD} = \frac{1}{|X|} \sum _{x \in X} \min _{y \in Y} \Vert x - y\Vert + \frac{1}{|Y|} \sum _{y \in Y} \min _{x \in X} \Vert x - y\Vert \end{aligned}$$(11)Where X is the predicted segmentation result, Y is the true segmentation result, and $\Vert x - y\Vert$ is the Euclidean distance between point x and point y. The ASD is the average of all point-to-point distances between the predicted and true segmentation results.Table 4 This table evaluates the segmentation performance of the U-Net model with different data augmentation methods.Full size tableTable 5 This table presents the segmentation performance of the SegNet model with different data augmentation methods.Full size tableTable 6 This table evaluates the segmentation performance of the Mask R-CNN model with different data augmentation methods.Full size tableAs demonstrated in Tables 4, 5, and 6, the training outcomes for U-Net, SegNet, and Mask R-CNN have been significantly enhanced by the application of our data augmentation technique. We can also see from Fig. 10 that training models with the data augmented by our method will make the output of each model more consistent with the real results.Notably, this augmentation method yielded the most substantial improvements across all comparisons, underscoring its significant impact on the training efficacy of image segmentation models. Furthermore, these results substantiate the method’s broad applicability, indicating its potential to enhance the performance of a diverse array of models.Fig. 10This image presents the segmentation results of different models (U-Net, SegNet, Mask R-CNN) on both non-augmented and augmented data. Each column represents a different model, with ”non-aug” and ”aug” indicating the results on non-augmented and augmented data, respectively. The last column shows the target segmentation masks. The results indicate that the augmented data significantly improves the segmentation performance of all models.Full size imageConclusionThis study introduces the MCFDiffusion method, a novel approach to augment brain tumor datasets and address class imbalance. Our method transforms healthy brain MRI images into those depicting tumors, thereby enriching the dataset. To achieve this transformation, we developed a fusion algorithm grounded in diffusion models. Recognizing the potential degradation in image quality when using this technique in isolation, we integrated a multi-channel input mechanism, expanding from a three-channel to a nine-channel setup. This innovation allows us to input three sequentially continuous brain MRI images, focusing on predicting and removing noise from the central image to enhance the output quality.To assess the efficacy of our method, we conducted experiments across three dimensions: image quality, impact on image classification models, and impact on image segmentation models. The results were consistent and promising, demonstrating improvements in all areas. Specifically, the generated images achieved the lowest Fréchet Inception Distance (FID) scores compared to other methods, indicating superior quality. Classification accuracy improved by 2% to 3% with our augmented data. Moreover, three distinct image segmentation models–U-Net, SegNet, and Mask R-CNN—showed varying degrees of enhancement post-data augmentation.In summary, our research presents a viable solution for augmenting brain tumor segmentation datasets, which has the potential to significantly enhance clinical diagnosis and treatment of brain tumors. The MCFDiffusion method not only advances medical image analysis but also paves the way for future applications in diverse medical imaging contexts.Future workIn this study, the MCFDiffusion model was successfully applied to enhance the brain tumor MRI dataset, resulting in significant performance improvements. Given the demonstrated potential of this model to address data imbalance and improve the accuracy of medical image analysis, we plan to extend future research to other key areas, including liver and lung diseases. We believe that by applying the MCFDiffusion model to these fields, we can further promote the development of medical image analysis technology and provide stronger support for clinical diagnosis.First, we will explore the application of the MCFDiffusion model on liver and lung datasets. This will involve adapting existing models to the specific anatomical and pathological features of these organs. We plan to enhance the model’s ability to process 3D medical image data by introducing a 3D convolutional network, which will help capture the stereo features of liver and lung lesions more accurately.Additionally, we will investigate the fusion of multimodal datasets, combining data from multiple imaging techniques such as CT, MRI, and PET. This multimodal approach will provide more comprehensive lesion information and may reveal new diagnostic and therapeutic biomarkers. We will explore advanced fusion techniques to optimize the integration of information from different modalities and improve the model’s recognition ability for complex lesions.With these future directions, we expect that the MCFDiffusion model will not only enhance the diagnosis and treatment of brain tumors but also play a key role in the broader field of medical image analysis, particularly in improving the accuracy and efficiency of liver and lung disease diagnosis.DiscussionTheoretical and practical significanceThis study introduces the Multi-Channel Fusion Diffusion Model (MCFDiffusion), which makes several important contributions to the field of medical image analysis. Firstly, it extends the application of diffusion models in medical imaging by proposing a novel approach to address the issue of data imbalance. Traditional diffusion models have been widely used in various image generation tasks, but their potential in medical image data augmentation, especially for brain tumor MRI images, has not been fully explored. Our study fills this gap by demonstrating the effectiveness of using diffusion models to generate high-quality brain tumor MRI images, thereby enriching the dataset and improving the performance of deep learning models. The introduction of the multi-channel input mechanism is a significant innovation. By utilizing the spatial continuity of MRI images and converting the target image and its adjacent images into a nine-channel tensor, our method enhances the quality of the generated images. This approach allows the model to capture more comprehensive information from the original images, leading to more accurate and realistic image generation. The fusion of defective areas with healthy images further improves the applicability of the model in medical imaging, providing a new perspective for data augmentation in this domain. Secondly, the practical significance of this study is substantial. In the context of medical image analysis, the scarcity of high-quality data has been a major obstacle to the development and application of deep learning models. Our MCFDiffusion method offers a practical solution to this problem by generating synthetic brain tumor MRI images that closely resemble real data. This data augmentation technique can help balance the dataset, making it more suitable for training deep learning models. As demonstrated in our experiments, the augmented data significantly improved the performance of image classification and segmentation models, which has important implications for clinical diagnosis and treatment planning of brain tumors. For image classification tasks, our method achieved an improvement in classification accuracy of approximately 3% compared to using the original dataset. This enhancement can lead to more accurate diagnosis of brain tumors, enabling physicians to make better-informed decisions regarding treatment options. In image segmentation tasks, the Dice coefficient was improved by 1.5–2.5%, indicating better segmentation results. Accurate segmentation of brain tumors is crucial for determining the extent of the tumor and planning surgical interventions. Therefore, our data augmentation technique can potentially improve the accuracy and reliability of medical image analysis, benefiting both patients and healthcare providers.Advantages of the proposed methodThe MCFDiffusion method proposed in this study offers several significant advantages over existing approaches for brain tumor MRI data augmentation:Novel fusion approach based on diffusion modelsOur method introduces a fusion algorithm grounded in diffusion models, which is specifically designed to transform healthy brain MRI images into those depicting tumors. This approach effectively addresses the issue of data imbalance by enriching the dataset with synthetic tumor images. Unlike traditional methods that rely on simple image manipulation or existing generative models like GANs and VAEs, our fusion technique leverages the power of diffusion models to generate high-quality and realistic tumor images.Multi-channel input mechanismThe incorporation of a multi-channel input mechanism is a key innovation of our method. Instead of using a standard three-channel input, we expand it to a nine-channel setup by inputting three sequentially continuous brain MRI images. This allows the model to access spatial information from adjacent images, which is crucial for generating high-quality medical images. By focusing on predicting and removing noise from the central image, we can significantly enhance the output quality.Superior image qualityThe generated images using MCFDiffusion exhibit superior quality compared to other state-of-the-art image generation models. As demonstrated by the Fréchet Inception Distance (FID) scores, our method achieves the lowest FID values, indicating that the generated images are closer to the real data distribution. Additionally, the MRI signal intensity distribution analysis shows that our method closely mirrors the characteristics of real MRI images, further validating the high quality of the generated data.Significant improvement in model performanceOur method has been shown to significantly improve the performance of both image classification and segmentation models. For image classification tasks, the classification accuracy improved by approximately 3% when using the augmented data. In image segmentation tasks, the Dice coefficient was enhanced by 1.5–2.5%, indicating better segmentation results. These improvements are crucial for clinical diagnosis and treatment planning, as more accurate classification and segmentation can lead to better-informed decisions by healthcare professionals.Broad applicabilityThe MCFDiffusion method has been validated across multiple deep learning models, including ResNet, U-Net, SegNet, and Mask R-CNN. This broad applicability indicates that our data augmentation technique can be easily integrated into various existing pipelines, enhancing their performance without requiring significant modifications. This versatility makes our method a valuable tool for researchers and practitioners in the field of medical image analysis.Clinical relevanceThe augmented images generated by our method have been thoroughly evaluated for their impact on both image classification and segmentation models, showing consistent improvements across various models. This demonstrates the broad applicability of our data augmentation technique and its potential to enhance the performance of a diverse array of models in clinical settings. By providing more accurate and reliable data for model training, our method can contribute to better clinical diagnosis and treatment planning for brain tumors.Data availabilityThe dataset used in this study, titled ”Brain MRI Segmentation,” was obtained from Kaggle. The dataset is publicly available and can be accessed through the following link: https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation/dataReferencesKia, M. et al. Innovative fusion of vgg16, mobilenet, efficientnet, alexnet, and resnet50 for MRI-based brain tumor identification. Iran J. Comput. Sci. 8, 185–215. https://doi.org/10.1007/s42044-024-00216-6 (2025).Article Google Scholar Sarshar, N. T. et al. Advancing brain MRI image classification: Integrating vgg16 and resnet50 with a multi-verse optimization method. BioMed 4, 499–523. https://doi.org/10.3390/biomed4040038 (2024).Article Google Scholar Anari, S., Sadeghi, S., Sheikhi, G., Ranjbarzadeh, R. & Bendechache, M. Explainable attention based breast tumor segmentation using a combination of unet, resnet, densenet, and efficientnet models. Sci. Rep. 15, 1027. https://doi.org/10.1038/s41598-024-84504-y (2025).Article CAS PubMed PubMed Central Google Scholar Anari, S. et al. Efficientunetvit: Efficient breast tumor segmentation utilizing unet architecture and pretrained vision transformer. Bioengineering 11. https://doi.org/10.3390/bioengineering11090945 (2024).Wolleb, J., Bieder, F., Sandkühler, R. & Cattin, P. C. Diffusion models for medical anomaly detection. arXiv: 2203.04306 (2022).Lu, S. et al. Mutually aided uncertainty incorporated dual consistency regularization with pseudo label for semi-supervised medical image segmentation. Neurocomputing 548, 126411. https://doi.org/10.1016/j.neucom.2023.126411 (2023).Article Google Scholar Ayadi, W., Elhamzi, W. & Atri, M. A deep conventional neural network model for glioma tumor segmentation. Int. J. Imaging Syst. Technol. 33, 1593–1605 (2023).Article Google Scholar Roy, S. & Maji, P. Tumor delineation from 3-D MR brain images. Signal Image Video Process. 17, 3433–3441 (2023).Article Google Scholar Goodfellow, I. J. et al. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014).Google Scholar Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. arXiv: 1312.6114 (2022).Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models (2020). arXiv: 2006.11239.Lugmayr, A. et al. Repaint: Inpainting using denoising diffusion probabilistic models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11451–11461. https://doi.org/10.1109/CVPR52688.2022.01117 (2022).Amit, T., Shaharbany, T., Nachmani, E. & Wolf, L. Segdiff: Image segmentation with diffusion probabilistic models (2022). arXiv: 2112.00390.Wu, J., Fu, R., Fang, H. et al. Medsegdiff: Medical image segmentation with diffusion probabilistic model. In Medical Imaging with Deep Learning. 1623–1639 (PMLR, 2024).Rissanen, S., Heinonen, M. & Solin, A. Generative modelling with inverse heat dissipation. arXiv: 2206.13397 (2023).Nichol, A. et al. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv: 2112.10741 (2022).Costa, P. et al. End-to-end adversarial retinal image synthesis. IEEE Trans. Med. Imaging 37, 781–791 (2017).Article PubMed Google Scholar Li, Q. et al. Tumorgan: A multi-modal data augmentation framework for brain tumor segmentation. Sensors 20, 4203 (2020).Article ADS PubMed PubMed Central Google Scholar Sun, Y., Yuan, P. & Sun, Y. MM-GAN: 3D MRI data augmentation for medical image segmentation via generative adversarial networks. In 2020 IEEE International Conference on Knowledge Graph (ICKG). 227–234 (IEEE, 2020).Deepak, S. & Ameer, P. MSG-GAN based synthesis of brain MRI with meningioma for data augmentation. In 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). 1–6 (IEEE, 2020).Barile, B. et al. Data augmentation using generative adversarial neural networks on brain structural connectivity in multiple sclerosis. Comput. Methods Prog. Biomed. 206, 106113 (2021).Article Google Scholar Jha, M., Gupta, R. & Saxena, R. A framework for in-vivo human brain tumor detection using image augmentation and hybrid features. Health Inf. Sci. Syst. 10, 23 (2022).Article PubMed PubMed Central Google Scholar Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 56, 12561–12605 (2023).Article Google Scholar Fidon, L., Ourselin, S. & Vercauteren, T. Generalized Wasserstein Dice Score, Distributionally Robust Deep Learning, and Ranger for Brain Tumor Segmentation: BraTS 2020 Challenge. 200–214 (Springer, 2021).Isensee, F., Jaeger, P. F., Full, P. M., Vollmuth, P. & Maier-Hein, K. H. NNU-Net for brain tumor segmentation (2020). arXiv: 2011.00848.Qasim, A. B. et al. Red-GAN: Attacking class imbalance via conditioned generation. yet another perspective on medical image synthesis for skin lesion dermoscopy and brain tumor MRI (2021). arXiv: 2004.10734.Buda, M. & McKinney, S. Brain MRI segmentation. https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation/data (2018).Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models (2022). arXiv: 2010.02502.Nichol, A. & Dhariwal, P. Improved denoising diffusion probabilistic models (2021). arXiv: 2102.09672.Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).Google Scholar Mirza, M. & Osindero, S. Conditional Generative Adversarial Nets (2014). arXiv: 1411.1784.Odena, A., Olah, C. & Shlens, J. Conditional image synthesis with auxiliary classifier GANS (2017). arXiv: 1610.09585.Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-Image Translation with Conditional Adversarial Networks (2018). arXiv: 1611.07004.Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks (2020). arXiv: 1703.10593.He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition (2015). arXiv: 1512.03385.Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. Densely Connected Convolutional Networks. arXiv preprint arXiv:1608.06993 (2016).Zhang, X., Zhou, X., Lin, M. & Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017).Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation (2015). arXiv: 1505.04597.Badrinarayanan, V., Kendall, A. & Cipolla, R. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation (2016). arXiv: 1511.00561.He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask r-cnn (2018). arXiv: 1703.06870.Download referencesAuthor informationAuthors and AffiliationsSchool of Mathematics and Computer Science, Wuhan Polytechnic University, Wuhan, 430048, ChinaCuihua Zuo, Junhao Xue & Cao YuanAuthorsCuihua ZuoView author publicationsSearch author on:PubMed Google ScholarJunhao XueView author publicationsSearch author on:PubMed Google ScholarCao YuanView author publicationsSearch author on:PubMed Google ScholarContributionsC.Z.: Organized the data, designed the experiments, and drafted the original manuscript; J.X.: Proposed the methodology, wrote the code, and visualized the experimental results; C.Y.: Provided the necessary equipment and supervised the overall direction, quality, and progress of the research; All authors corrected the first draft of the manuscript and agreed on the final manuscript.Corresponding authorCorrespondence to Cao Yuan.Ethics declarationsCompeting interestsThe authors declare no competing interests.Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissionsOpen Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.Reprints and permissionsAbout this article