A new low-rank adaptation method for brain structure and metastasis segmentation via decoupled principal weight direction and magnitude

Wait 5 sec.

IntroductionMedical image segmentation is a crucial process in image analysis, used to accurately delineate anatomical structures, organs, or lesions from surrounding tissue. Segmentation plays an essential role in medical diagnosis, treatment planning, and postoperative assessment, directly influencing clinical decision-making. Recently, deep learning techniques, particularly convolutional neural networks (CNNs) and Transformer-based self-attention mechanisms, have been extensively applied to medical image segmentation, leading to significant improvements in accuracy and robustness1,2,3,4,5. Despite these advancements, deep learning models rely on large, manually annotated datasets and often require training from scratch, resulting in high computational costs, especially when adapting to new tasks. Consequently, efficient adaptation of large models to new tasks has emerged as a critical approach in modern medical image segmentation6.Parameter-efficient fine-tuning (PEFT) methods address this issue by adapting models to new tasks with minimal parameter adjustments, reducing the need for extensive training resources7. Low-rank adaptation (LoRA), one of the leading PEFT methods, facilitates efficient model adaptation by introducing linear adapters in parallel to the pre-trained model’s linear layers8. LoRA approximates the weight matrix W as the product of two smaller matrices, A and B, typically initializing A with Gaussian noise and B with zeros, effectively freezing the original matrix while updating the “noise”, which slows convergence. The PiSSA method, proposed by Meng et al. accelerates convergence by initializing with the principal singular values and vectors of the original matrix9while Liu et al.’s weight-decomposed low-rank adaptation (DoRA) method enhances LoRA’s efficiency by introducing weight decomposition10.In this paper, we propose PDoRA, a novel fine-tuning method based on Principal Weight Decomposition and Low-Rank Adaptation. PDoRA reduces the number of trainable parameters and enhances training efficiency by independently updating the direction and magnitude of principal weights via low-rank adaptation, making it highly effective for computationally intensive tasks such as medical image segmentation. Specifically, we apply singular value decomposition (SVD) to extract the principal weight \(\:{W}^{pri}\) from the original weight matrix \(\:W\). Next, \(\:{W}^{pri}\) is decomposed into magnitude and direction components, both of which are updated independently, with the direction fine-tuned using low-rank adaptation. Finally, the updated principal weights are combined with the frozen residual weights \(\:{W}^{res}\) to form the final weight matrix. We applied PDoRA to a pre-trained SwinUNETR model11 for downstream medical image segmentation tasks, including hippocampus segmentation and brain metastasis segmentation in lung cancer.We evaluated the segmentation performance of various fine-tuning methods under different rank settings. The experimental results demonstrate that with a rank of r = 16, PDoRA achieves Dice score improvements of 1.09%, 0.18%, and 5.73% compared to full fine-tuning (FT) on the LPAB40, EADC, and Bra-MET datasets, respectively. These findings underscore PDoRA’s effectiveness in enhancing segmentation accuracy while simultaneously reducing computational overhead, outperforming existing PEFT methods such as LoRA8Hydra12DoRA10and PiSSA9. The contributions of this paper are as follows:We present PDoRA, a novel Principal Weight Decomposition-based Low-Rank Adaptation fine-tuning method, tailored for medical image segmentation tasks. PDoRA achieves performance on par with or exceeding that of FT, demonstrating its effectiveness in both accuracy and computational efficiency.PDoRA improves optimization efficiency by decoupling the direction and magnitude of principal weight updates, reducing the number of trainable parameters. This decomposition allows the model to better capture essential features during fine-tuning, thereby enhancing its adaptability and reducing training costs.Our experimental results, applying PDoRA to the pre-trained SwinUNETR model for hippocampus and brain metastasis segmentation, consistently show that PDoRA outperforms state-of-the-art fine-tuning methods. This highlights its potential as a robust solution for improving medical image segmentation performance.Related workDeep learning-based methods for medical image segmentationDeep learning techniques have been widely applied in the field of medical image segmentation, with convolutional neural networks (CNNs) being particularly successful. Among various CNN-based segmentation methods, U-Net and its variants are some of the most representative network architectures1,13,14,15. Jha et al. proposed the Double U-Net model, which stacks two U-Net architectures to capture more semantic information and improve segmentation accuracy16. Wu et al. introduced the Fully Convolutional Network (FCN) with a joint pyramid upsampling module, replacing traditional dilated convolutions to enhance the capture of global contextual information17. Peng et al. developed the GCN model, which incorporates larger convolutional kernels and deeper layers in skip connections to transform local semantic information into higher-level feature representations18. Zhu et al. proposed the Sparse Dynamic Volume TransUNet, which combines voxel information, inter-layer feature connections, and intra-axis information to achieve accurate segmentation19. While CNNs excel at extracting fine-grained local information, they have limitations in capturing long-range dependencies.In recent years, the Transformer architecture, introduced by Ashish Vaswani, has garnered significant attention due to its ability to capture long-range dependencies through attention mechanisms. By building the entire encoder-decoder structure using self-attention, Transformers are able to model relationships across distant regions in images2. Dosovitskiy et al.‘s Vision Transformer (ViT) demonstrated the powerful capability of Transformers in visual tasks by directly classifying images, setting a precedent for their use in computer vision20. The Swin Transformer, with its local non-overlapping window attention mechanism, enhances computational efficiency while maintaining the ability to model long-range dependencies, making it well-suited for large-scale vision tasks21. SwinUNETR applies the Swin Transformer as its encoder and integrates it into a U-Net-like architecture, optimizing it for 3D volumetric medical image segmentation tasks11,22. Similarly, TransUNet combines CNNs’ local feature extraction strengths with a Transformer encoder, allowing the model to capture both local and global features in medical images23. Despite these advancements, the increased computational complexity of Transformers presents challenges, particularly in resource-constrained medical image segmentation tasks.Parameter-efficient fine-tuning methodsPEFT methods, which achieve performance comparable to FT by updating only a small subset of parameters, have become widely used in natural language processing (NLP). Hu et al. introduced LoRA, a notable PEFT approach that freezes pre-trained model weights and inserts trainable low-rank decomposition matrices into each Transformer layer, significantly reducing the number of parameters required for downstream tasks8. Building on this, Chen et al.’s SuperLoRA enhances multi-layer attention modules for improved performance while retaining parameter efficiency24. Zhang et al.’s AdaLoRA applies SVD for incremental updates, pruning less important singular values to further reduce the parameter count25. LoRA-XS, developed by Klaudia Bałazy et al., offers an even more lightweight adaptation by also leveraging SVD26. Si et al.’s FLoRA generalizes PEFT through low-rank tensor decomposition, maintaining the structure of N-dimensional parameter spaces27. Sun et al.’s SVF method similarly uses SVD, fine-tuning only the singular values while freezing other parameters28. Other techniques include Compacter, which applies the Kronecker product for efficient weight updates29and ConvLoRA, which adapts LoRA to convolutional layers, adding low-rank matrices to reduce the number of trainable parameters in convolutional models30.In the field of medical image segmentation, PEFT methods have seen increasing adoption. Chen et al. introduced the MA-SAM method, which incorporates 3D adapters into the Transformer modules of image encoders, allowing only partial updates to the weight increments while preserving most of the pre-trained SAM model weights31. Zhang et al. applied the LoRA fine-tuning strategy in their SAMed method for semantic segmentation of medical images32. Wu et al.‘s Med-SA method combined SD-Trans and HyPAdapt’s parameter-efficient adaptation techniques, effectively integrating medical domain knowledge to enhance segmentation performance33. Despite the promising results, the application of PEFT methods in medical image segmentation remains relatively limited. This paper aims to further explore and extend PEFT techniques in this domain, with the goal of reducing computational costs while improving segmentation accuracy.MethodOur work is inspired by the DoRA and PiSSA methods, which are thoroughly reviewed in “DoRA” and “PiSSA” sections, respectively. In “PDoRA” section, we provide a detailed explanation of our newly proposed method.Fig. 1Illustration of model adaptation methods, where (a) represents the DoRA method, (b) represents the PiSSA method, and (c) represents the proposed PDoRA method.Full size imageDoRAThe workflow of the DoRA method is illustrated in Fig. 1a. The method decomposes pre-trained weights into magnitude and direction components, which are fine-tuned independently. The magnitude vector is initialized as \(\ m = \left\| {W_{0} } \right\|_{c} \in R^{{1 \times k}}\), where \(\:{W}_{0}\) is the pre-trained weight matrix and \(\left\| \cdot \right\|\) denotes the column-wise norm. The direction matrix is initialized as \(\:V={W}_{0}\in\:{R}^{d\times\:k}\).The LoRA is used to update the direction matrix by computing the direction increment \(\:\varDelta\:V\)=\(\:BA\), where \(\:B\in\:{R}^{d\times\:r}\) and \(\:A\in\:{R}^{r\times\:k}\) are low-rank matrices, and \(\:r\) (with \(\:r