An ultra-wide-field fundus image dataset for intelligent diagnosis of intraocular tumors

Wait 5 sec.

Background & SummaryIntraocular tumors can be categorized as either benign or malignant1. Benign tumors include choroidal hemangioma (CH)2, retinal capillary hemangioma (RCH)3, choroidal osteoma (CO)4, et cetera, while malignant tumors encompass retinoblastoma (RB)5, uveal melanoma (UM)6, and intraocular metastases7. Both benign and malignant tumors can cause visual impairment, with malignant tumors posing a substantial risk of mortality due to their potential for local invasion and metastasis4,6,8. The insidious onset of intraocular tumors often leads to delayed medical treatment, as most patients seeking medical care only after experiencing significant visual impairment. Early and accurate diagnosis of intraocular tumors is critical in preserving vision and preventing potentially life-threatening complications.Fundus photography, as a non-invasive imaging modality, is widely regarded as an effective tool for screening, diagnosis, and monitoring of fundus diseases9,10. Compared with the 30° to 45° field of view provided by traditional fundus color imaging, the emerging ultra-wide-field (UWF) fundus imaging system can capture a 200° panoramic retinal image11. This wide-field imaging is particularly crucial for detecting and monitoring intraocular tumors, which frequently extend into or present in the peripheral fundus12. It facilitates the identification of early-stage lesions and enables more accurate monitoring of tumor progression.Recent advancements in artificial intelligence (AI) have shown great promise in automating the screening and diagnosis of fundus diseases, including diabetic retinopathy13, age-related macular degeneration14, glaucoma15, and retinopathy of prematurity16. AI algorithms trained on large datasets of fundus photographs have achieved good diagnostic performance in sensitivity and specificity and addressed the immense screening burden. However, the application of AI algorithms to intraocular tumors remains limited, primarily due to the scarcity of cases and the lack of large, publicly available fundus image datasets17. Currently, most fundus image datasets for intraocular tumors are non-public, typically focus on single disease type, and predominantly utilize traditional fundus photography18,19,20,21. We have recently published a dataset for intelligent diagnosing ROP; however, these datasets are not yet capable of supporting the automatic diagnosis of multiple types of intraocular tumors16. These limitations significantly hinder the development and validation of AI models for intraocular tumors.To address these challenges, this study aims to provide a more comprehensive dataset of UWF fundus images, encompassing multiple intraocular tumors alongside normal fundus. Brief description of the study is present in Fig. 1. This dataset is intended to enhance the training and validation of AI algorithms, facilitating improved diagnostic accuracy and ultimately contributing to better patient outcomes in the management of intraocular tumors.Fig. 1Workflow for dataset establishment. (a) Data collection. (b) UWF fundus images export and database formation. (c) Image classification and annotation. (d) Development of AI models for intelligent diagnosis of intraocular tumors.Full size imageAccurate diagnosis of intraocular tumors in clinical practice typically requires the integration of multimodal information, including patient history, ultrasonography, OCT, CT, MRI, and histopathological findings, in addition to fundus imaging. In this study, we focused exclusively on UWF fundus images to construct a foundational, image-based dataset intended for developing AI tools capable of automated lesion screening and early referral. Therefore, clinical metadata and additional imaging modalities were not included in the current version. We acknowledge this as a limitation. In addition, the dataset does not yet include tumor-mimicking conditions (such as retinal detachment, choroidal nevi, or Coats’ disease) which may resemble intraocular tumors in fundus photographs. Incorporating such cases, along with multimodal clinical data, will be a critical step in future work to enhance the model’s specificity, support differential diagnosis, and improve its clinical applicability.MethodsData collectionA total of 2,031 UWF fundus images were collected from Shenzhen Eye Hospital (SZEH) between 2019 and 2024. All images were captured using the Optomap Daytona scanning laser ophthalmoscope (SLO), an ultra-wide-field fundus camera. Fundus images that were blocked, blurry, or unfocused were excluded during the image quality control process. All selected images were exported in JPG format. The dataset consists of 677 UWF fundus images from intraocular tumors patients and 1354 UWF fundus images from healthy participants, collected from 332 eyes of 218 individuals, all of whom are Asian. Due to the progressive nature of intraocular tumor and the necessity for ongoing treatment and follow-up, a single patient may have multiple fundus images captured from the same eye at different stages of the disease. Thus, the dataset includes multiple images from the same eye, reflecting tumor changes or treatment responses over time. The data collection was approved by the Ethics Committee of Shenzhen Eye Hospital (2024KYPJ108), and informed consent was waived due to the retrospective design and data anonymization.Image categorizationAll UWF fundus images were divided into six categories: normal images and five types of intraocular tumors, including Choroidal Hemangioma (CH), Retinal Capillary Hemangioma (RCH), Choroidal Osteoma (CO), Retinoblastoma (RB), and Uveal Melanoma (UM). Representative fundus images of each category are shown in Fig. 2. The images were classified by three experienced annotators from SZEH. Specifically, each UWF image was independently classified by two junior annotators independently. In cases of inconsistent classifications, a senior annotator reviewed and reclassified the images to ensure accuracy. Table 1 presents the distribution of images across the six categories. T-distributed Stochastic Neighbor Embedding (t-SNE) was used to visualize the distributions for different categories of the dataset as described previously16 (Fig. 3).Fig. 2Representative UWF fundus images of six categories. (a) Normal fundus images. (b) Retinal Capillary Hemangioma (RCH). (c) Choroidal Osteoma (CO). (d) Uveal Melanoma (UM). (e) Retinoblastoma (RB). (f) Choroidal Hemangioma (CH).Full size imageTable 1 Distribution of UMF fundus images for six categories.Full size tableFig. 3The t-SNE distribution of the dataset.Full size imageData RecordsThe dataset, titled “UWF Fundus Images of Intraocular Tumors”, is publicly available on Figshare22. It is provided as a zipped file containing six subfolders, each corresponding to a specific category: Normal, Choroidal Hemangioma (CH), Retinal Capillary Hemangioma (RCH), Choroidal Osteoma (CO), Retinoblastoma (RB), and Uveal Melanoma (UM). Each subfolder includes all the fundus images belonging to its respective category. The dataset is designed to facilitate the development of AI algorithms aimed at automating the detection of intraocular tumors.Technical ValidationTo evaluate the dataset’s utility for training and testing AI models, we developed four deep learning models to automatically identify intraocular tumors based on UWF fundus images. The dataset was randomly divided into training, validation, and test sets using a stratified sampling approach with an 8:1:1 ratio. This stratification ensures that the proportional distribution of each of the six categories is preserved across all subsets, as detailed in Table 1. The models were implemented using the PyTorch framework and trained on an NVIDIA V100 GPU. Prior to training, all input images were resized to 224 × 224 pixels. To enhance model generalization and prevent overfitting, we applied a series of data augmentation techniques, including random horizontal and vertical flips, random rotations, and color jitter. For this 6-class classification task, we used the standard Cross-Entropy Loss function, defined as:$$L=-\,\mathop{\sum }\limits_{i=1}^{6}{y}_{i}\log ({\hat{y}}_{i}),$$where \({y}_{i}\) is the binary indicator (1 if class i is the correct classification, and 0 otherwise), and \({\hat{y}}_{i}\) is the predicted probability for class i. The models were optimized using the Stochastic Gradient Descent (SGD) optimizer. The initial learning rate was set to 0.001 and was managed by a dynamic schedule that combined a linear warmup phase with cosine annealing to ensure stable and effective convergence. Training was conducted for 100 epochs with a batch size of 256. Our evaluation protocol consisted of training the model on the training set, selecting the best-performing checkpoint based on accuracy on a separate validation set, and finally, reporting the model’s performance on an unseen test set to ensure an unbiased assessment. We are confident these additions will greatly enhance the clarity and credibility of our work. Four algorithms (ResNet5023, ResNet10124, ConvNeXt-T25, and ViT-B26) were selected for AI model development and validation (Fig. 4). To validate the robustness of our models, we conducted five independent runs for each model, using a different random seed for each run. The performance of each model was evaluated using various metrics, including accuracy, area under the receiver operating characteristic curve (AUC), precision, sensitivity, F1 score, specificity, and kappa (Table 2). Based on the results from these five runs, we have calculated the mean, standard deviation (SD), and 95% confidence intervals (CI) for every performance metric. Additionally, confusion matrix was utilized to clearly illustrate the correspondence between the AI model’s predictions and actual classifications, facilitating comprehensive performance evaluation and error analysis. To further understand the classification performance of the models, we employed t-SNE to visualize the high-dimensional feature embeddings learned by each model on the test set. The t-SNE visualizations were generated based on the final-layer feature embeddings of each model, extracted just before the classification head. These visualizations offer qualitative insights into how the models organize the data in their learned feature space, complementing the quantitative metrics. A visual inspection of the t-SNE plots (Fig. 4c) reveals several key findings. First, all four models effectively separate the Normal and Retinoblastoma (RB) classes from the other tumor types, indicating that the features of these two categories are distinct enough to be learned reliably. However, the clusters produced by the ViT-B model for these classes exhibit greater intra-class compactness, with data points being more tightly grouped, suggesting a more confident and consistent feature extraction. In contrast, for the more challenging and visually similar tumor types—Choroidal Osteoma (CO), Uveal Melanoma (UM), Choroidal Hemangioma (CH), and Retinal Capillary Hemangioma (RCH)—the plots for ResNet50, ResNet101, and ConvNeXt-T show considerable overlap between clusters. This visual confusion directly corresponds to the off-diagonal errors observed in their respective confusion matrices (Fig. 4a). The ViT-B model, however, demonstrates superior inter-class separability even among these difficult classes, with more defined boundaries between the clusters.Fig. 4Visualization of AI model performance on internal validation (SZEH). (a) Confusion matrixes of different AI models. (b) ROC curves of different AI models. (c) The t-SNE visualization of different AI models.Full size imageTable 2 Model classification performances on internal validation (SZEH).Full size tableThe results (Table 2) show that while all models achieved high specificity (all > 94%), their performance varied considerably across other key metrics like accuracy, sensitivity, and F1 score. Notably, the ViT-B model demonstrated the strongest and most consistent overall performance. It achieved the highest mean accuracy of 91.46%, AUC of 96.87%, F1 score of 81.42%, and Kappa score of 84.14%. This suggests that the Vision Transformer architecture is particularly effective at learning the discriminative features required for classifying a diverse set of intraocular tumors from UWF fundus images. To formally validate whether the observed superiority of the ViT-B model was statistically significant, we performed the DeLong test to compare the AUC of ViT-B against each of the other three models. The results confirmed a statistically significant advantage for the ViT-B model in all comparisons: ViT-B vs. ResNet50 (p = 0.00793), ViT-B vs. ResNet101 (p