IntroductionIn breast cancer, human epidermal growth factor receptor 2 (HER2) levels can identify patients who may benefit from anti-HER2 therapy, so accurate evaluation of HER2 expression levels is required1. In the American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) HER2 testing guidelines, HER2 immunohistochemistry (IHC) scores of 0 and 1+ are both regarded as HER2-negative in breast cancer2,3. Recently, the concept of HER2-low status has gained increasing emphasis in breast cancer, and HER2 expression is classified into three levels: HER2-positive (IHC 3+ and 2+ /amplified fluorescence in situ hybridization [FISH +]), HER2-low (IHC 1+ and 2+ /non-amplified FISH [FISH−]), and HER2-negative (IHC 0)4,5,6. Antibody drug conjugates (ADCs) showed encouraging responses in breast cancer patients with HER2-low expression, especially the third-generation drug DS8201 (trastuzumab deruxtecan)7,8,9,10,11. The latest study found that HER2-low breast cancer has unique clinicopathological and prognostic characteristics. Therefore, accurate interpretation of HER2-IHC scores is particularly important, especially in differentiating between scores 0 and 1+ 12,13.The HER2 testing guidelines include detailed instructions for distinguishing IHC scores 0 and 1+. However, since scores 0 and 1+ are both classified as HER2-negative in the current testing guidelines, pathologists may not strictly distinguish between scores 0 and 1+ in daily practice, resulting in inconsistencies or inaccurate interpretations. One study found that the false-negative results due to inaccurate detection of HER2-IHC scores 0 and 1+ caused approximately 2.27% of breast cancer patients to miss their opportunity for targeted therapy every year, indicating the need to re-interpret existing IHC scores 0 and 1+ results or to add FISH testing14. One study analyzed the differences in HER2 evaluation between local and central laboratories. The results showed that the disagreement rate between IHC scores 0 and 1+ was the highest, approaching 50%, although the overall consistency was kappa = 0.7915. A phase 1b study on HER2-low treatment reported that the consistency rates between local and central laboratories in assigning IHC scores of 1+ and 2+ were 70% and 40%, respectively16. Another study reported that 85% (87/102) of IHC 0 slides evaluated by local laboratories were re-evaluated as IHC 1+ or 2+ (false negative) by the central laboratory17. IHC scores 0 and 1+ correspond to two subtypes with different DNA, RNA, and protein levels15,18, as well as prognoses19,20,21. A recent authoritative multicenter analysis showed that 19% of cases read by laboratories generated results with less than or equal to 70% concordance for IHC 0 vs 1+ . When 18 pathologists interpreted the scanned slides, only 26% concordance was noted between scores 0 and 1+ compared with 58% concordance between scores 2+ and 3+. This inaccuracy in the real world may lead to misallocation of therapy with trastuzumab deruxtecan (T-DXd) for many patients22.IHC scores of 0 and 1+ correspond to two subtypes with low reproducibility of interpretations, differences in molecular characteristics, and prognosis. Scores 0 and 1+ must be accurately distinguished to ensure that patients do not miss the opportunity for combined targeted therapy. This study aims to construct an artificial intelligence (AI) microscope-assisted model for accurate interpretation and differentiation of HER2 IHC scores 0 and 1+, thus contributing to appropriate diagnosis and treatment options for breast cancer with HER2-low expression.Materials and methodsData collectionThis study was approved by the Ethics Committee on Biomedical Research, West China Hospital of Sichuan University (No. 20220764). The HER2 IHC detection was performed using the VENTANA anti-HER2/neu (4B5) rabbit monoclonal primary antibody (Roche Diagnostics GmbH). Data included in this study were obtained from the cases of invasive breast cancer (IBC) at West China Hospital of Sichuan University, whose pathological diagnosis reports and HER2 IHC slides were collected retrospectively.A total of 698 HER2 IHC slides with different expression levels of HER2 from January 2017 to December 2017 were collected to develop the AI microscope for IBC region segmentation and nuclei detection. Moreover, 544 HER2 IHC 0 and 1+ slides from January 2019 to December 2019 were collected to test the interpretation performance of the AI microscope. For the test set, the inclusion criteria were breast cancer with HER2 IHC 0 and 1+, and the exclusion criteria were HER2 IHC 2+ and 3+ , carcinoma in situ, and slides with poor quality. A junior pathologist screened the slides first, and then three senior pathologists reinterpreted them according to the 2023 HER2 testing guidelines2,3. The gold standard is established as follows: each slide is initially evaluated by two senior pathologists. If their interpretations match, this consensus becomes the gold standard. If differ, a third senior pathologist makes the final determination. Finally, the test set included 501 IBC slides of HER2 IHC 0 and 1+, and the screening process was shown in Supplementary Fig. 1. We divided 501 cases into 5 subsets for cross-validation. The first four subsets each contained 100 randomly selected cases, and the last subset contained the remaining 101 cases. In the i-th subset of cross-validation, we used the data not in the i-th subset for threshold selection and reported the results on the i-th subset.Data preprocessingThe 698 HER2 IHC slides used for training the IBC region segmentation model (Model I) and nucleus detection model (Model II) were labeled by a pathologist. The IBC regions were labeled using the open-source graphical image annotation tool Labelme (version number v3.19.0). A closed polygon was drawn to outline the IBC region based on the pre-segmentation results, and the outline was as close as possible to the outer edge of the IBC region. We used the Mark Point tool (version number v1.0.0.3) and developed a method to label the nuclei dots. Dot labeling was used to modify each nucleus in the field of view based on the pre-detection results, making the labeled point as close as possible to the nucleus centroid.IBC region segmentation model (model I)A bilateral segmentation network (BiSeNet v2)23 was used to train an IBC region segmentation model (Model I). This segmentation model has the advantage of processing low-level network details and high-level semantic classification separately, thereby achieving high-precision, high-efficiency, and real-time semantic segmentation. We iterated each epoch on the training set and evaluated the model’s mean intersection over union (MIoU) on the validation set until the model’s MIoU no longer increased beyond 10 epochs. The model’s best weight on the validation set was defined, and the true performance was evaluated on the test set.Nuclei detection model (model II)Nucleus detection is a small and dense target detection task. The target detection algorithm requires annotation of the outer rectangular box of cell nuclei, which causes an immense workload. Traditional saliency detection algorithms rely on a priori knowledge of the data, and cell nuclei tend to overlap and are easily missed. In this study, we used a standard fully convolutional network (FCN) to develop the nucleus segmentation and detection model24-Model II. Data division, iterations at different magnifications, and the prediction evaluation were similar to those of Model I.Identification of the optimal thresholds 1 and 2HER2 0 and 1+ interpretation involves the assessment of the membrane staining cell percentage and staining intensity, so determining the appropriate thresholds for the membrane staining percentage (threshold 1, th1) and staining intensity (threshold 2, th2) is critical. The optimal th1 and th2 were determined using the 501 cases with the gold standard from three senior pathologists. First, the tumor cells of these cases were interpreted based on Model I and Model II. Then, the optimal th1 and th2 were identified by five-subsets. The searching range of the mean membrane staining intensity was [0–255] with a step size of 0.1, and that of the proportion of weakly-stained cells was [0–100%] with a step size of 1%. Based on all possible combinations of th1 and th2, the true positive ratio (TPR), false positive ratio (FPR), receiver operating characteristic (ROC) curve, and the area under the ROC curve (AUC) value were investigated to examine the interpretation performances.HER2 0 and 1+ interpretation by the AI modelTo interpret the HER2 staining intensity, we first captured a field of vision with a size of 3008 × 3008 pixels (View I) using the image acquisition device. Then, Model I was used to identify the IBC region S in View I. Model II was used to locate the nucleus centroid in the IBC region S, which produced the coordinates of the tumor cell set D. Further, the Image I’ was obtained by extracting the diaminobenzidine (DAB) channel from View I in RGB color space and normalizing the channel to [0–255]. In Image I’, the regions of interest (ROIs) used to determine the HER2 IHC score consisted of surrounding pixels centered at the cell nuclei of the tumor cell set D (20× : d = 30, 0.5 μm/pixel; 40×: d = 60, 0.25 μm/pixel). The mean value was calculated based on each ROI to determine the cell membrane staining intensity.The staining intensity of all tumor cells was recorded, and then the tumor cells were classified as unstained or weakly-stained according to the classification th2. HER2 membrane staining intensity of each case was interpreted based on the results of different fields of vision of the entire slide. The proportion of weakly-stained tumor cells was calculated and compared with the th1. If it was greater than the th1, the case was interpreted as HER2 IHC 1+; otherwise, the case was interpreted as HER2 IHC 0.HER2 IHC evaluation by pathologistsAccording to the knowledge gained from published studies25,26,27 and clinical workflows, a pathologist first determined the location of the lesion at 4 × , then selected multiple fields of vision with the IBC at 10 × , and finally interpreted five fields at 20 × and 10 fields at 40×. For some slides, the IBC regions were not sufficiently large to generate enough fields, so the IBC regions were completely collected. The microscope was Nikon (ECLIPSE Ci-L), and the microscope camera system was constructed using TOUPCAM (E3ISPM09000KPB).Statistical analysisThe performance of Model I for IBC region segmentation was evaluated using MIoU, which is a standard metric for semantic segmentation, representing the ratio of intersection and union. The MIoU is the average of the intersection to union ratio of each class in the dataset. The performances of the nuclei detection and AI microscope classification were evaluated using the F1-score. Consistency analysis was assessed using the Cohen’s kappa coefficient test28. All analyses were performed using SPSS Statistics (version 26, IBM Corporation) and GraphPad Prism 8.0 (GraphPad Software, Inc.). AI algorithms were conducted using Python 3.8 and Scikit-learn (version 0.23.2).ResultsPerformance of model I and model IIThe process for staining classification by the AI models was shown in Fig. 1. To train the AI models, we divided the data of 698 HER2 IHC slides into the training set, validation set, and test set at a ratio of 3:1:1. As shown in Supplementary Table 1, 28,199 patches (1024 × 1024 pixels) at 20 × magnification and 38,086 patches (1024 × 1024 pixels) at 40× magnification were identified. The labeling interface is shown in Supplementary y Fig. 2. The representative images of the Model I and II was shown in Supplementary Fig. 3. Model I achieved MIoU values of 0.879 and 0.880 in the test set, at 20 × and 40×, respectively; Model II showed F1-scores of 0.866 and 0.878 in the test set at 20× and 40×, respectively. The detailed results are shown in Table 1. The performance of the two models in the test set showed their capability to accurately identify IBC regions and detect the nuclei of the cells.Fig. 1HER2 IHC scores 0 and 1+ classification system.Full size imageTable 1 Metrics results of IBC region segmentation and nuclei detection.Full size tableAI microscope interpretation of HER2 IHC 0 and 1+The representative images of AI microscope in IBC region segmentation and cell nuclei detection are shown in Fig. 2. We evaluated the classification performance of AI microscope model on five subsets. The average F1-scores for the five subsets at magnifications of 20× and 40× were 0.879 (95%CI: 0.864–0.894 ) and 0.906 (95%CI: 0.886–0.926), respectively. The F1-scores and other performance metrics for each subset were provided in Supplementary Table 2. Since subset 3 achieved the highest F1-score (0.890) at the 20× magnification, it was selected as the dataset for subsequent analysis. The strategy for threshold determination was to calculate the effect of changing th2 under different th1 levels on the classification results. As shown in Fig. 3, within subset 3, we fixed th1, traversed th2, and evaluated the TPR and FPR under each threshold combination. Then we plotted ROC curves and calculated AUC values. The ROC curves for the remaining four subsets were shown in Supplementary Fig. 4A–H. At a fixed th1 of 5%, the average AUC across the five subsets at 20× magnification was 0.930 (95%CI: 0.926–0.934); at a fixed th1 of 2%, the average AUC across the five subsets at 40× magnification was 0.953 (95%CI: 0.951–0.955). Detailed AUC results under different thresholds were provided in Supplementary Table 3.Fig. 2An example of the segmentation of IBC regions and nucleus detection on HER2 IHC scores 0 and 1+ images using the AI microscope. A1, B1, C1, and D1 are the images of HER2 IHC slides that were interpreted as 0 or 1+ by the pathologist at 20× and 40×; A2, B2, C2, and D2 are the corresponding predicted probability maps of the IBC area. The more the color shifts toward red, the higher the probability that this region belonging to IBC, while blue means that the probability of this region belonging to IBC is zero; A3, B3, C3, and D3 denote the segmentation masks of the corresponding IBC region prediction probability map after binarization, where black is the background indicating non-IBC regions and white is the IBC region; A4, B4, C4, and D4 are the corresponding nucleus detection probability maps, where white dots represent the probability that a given pixel can be characterized as the nucleus centroid and black regions represent that pixels are not the nucleus centroid; A5, B5, C5, and D5 are the renderings of the predicted results of the integrated IBC region segmentation and nucleus detection, where red is the nucleus centroid of the tumor cell marked in the original image.Full size imageFig. 3The performances of the thresholds in subset 3. Magnifications of 20× (A) and 40× (B). The red line showed the best AUC, and the green line showed the lowest AUC.Full size imageInterpretation analysis of HER2 IHC 0 and 1+ in the test setThe AI microscope interpretation performances at 20× and 40× were higher than that of the junior pathologist (F1-scores of 20×: 0.878, 40×: 0.906, junior pathologist: 0.871). The detailed results are shown in Table 2. As shown in Table 3, among the 501 IBC cases, 211 cases were diagnosed as IHC 0, and 290 cases were diagnosed as IHC 1+ by the junior pathologist. Compared with the gold standard, the junior pathologist interpreted 42 cases of IHC 1+ as IHC 0 and 34 cases of IHC 0 as IHC 1+. AI microscope achieved better interpretation at 20× and 40× (kappa = 0.703 and 0.774, P