Comparison of Foundation and Supervised Learning-Based Models for Detection of Referable Glaucoma from Fundus Photographs

Kyle Bolo; Tran Huy Nguyen; Sreenidhi Iyengar; Zhiwei Li; Van Nguyen; Brandon J. Wong; Jiun L. Do; Jose-Luis Ambite; Carl Kesselman; Lauren P. Daskivich; Benjamin Y. Xu

doi:10.1101/2025.08.21.25334170

ABSTRACT

Purpose To compare the performance of a foundation model and a supervised learning-based model for detecting referable glaucoma from fundus photographs.

Design Evaluation of diagnostic technology.

Participants 6,116 participants from the Los Angeles County Department of Health Services Teleretinal Screening Program.

Methods Fundus photographs were labeled for referable glaucoma (cup-to-disc ratio ≥ 0.6) by certified optometrists. Four deep learning models were trained on cropped and uncropped images (Training N = 8,996; Validation N = 3,002) using two architectures: a vision transformer with self-supervised pretraining on fundus photographs (RETFound) and a convolutional neural network (VGG-19). Models were evaluated on a held-out test set (N = 1,000) labeled by glaucoma specialists and an external test set (N = 300) from University of Southern California clinics. Performance was assessed while varying training set size and stratifying by demographic factors. xRAI was used for saliency mapping.

Main Outcome Measures Area under the receiver operating characteristic curve (AUC-ROC) and threshold-specific metrics.

Results The cropped image VGG-19 model achieved the highest AUC-ROC (0.924 [0.907-0.940]), which was comparable (p = 0.07) to the cropped image RETFound model (0.911 [0.892-0.930]), which achieved the highest Youden-optimal performance (sensitivity 82.6%, specificity 88.2%) and F1 score (0.801). Cropped image models outperformed their uncropped counterparts within each architecture (p < 0.001 for AUC-ROC comparisons). RETFound models had a performance advantage when trained on smaller datasets (N < 2000 images), and the uncropped image RETFound model performed best on external data (p < 0.001 for AUC-ROC comparisons). The cropped image RETFound model performed consistently across ethnic groups (p = 0.20), while the others did not (p < 0.04); performance did not vary by age or gender. Saliency maps for both architectures consistently included the optic nerve.

Conclusion While both RETFound and VGG-19 models performed well for classification of referable glaucoma, foundation models may be preferable when training data is limited and when domain shift is expected. Training models using images cropped to the region of the optic nerve improves performance regardless of architecture but may reduce model generalizability.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by grant R01 EY035677 and K23 EY032985 from the National Eye Institute, National Institutes of Health, Bethesda, Maryland; a DHS-USC Safety Net Innovation Award from the Southern California Clinical and Translational Science Institute; a AI4Health Award from the University of Southern California; and an unrestricted grant to the Department of Ophthalmology from Research to Prevent Blindness, New York, NY.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was approved by the Institutional Review Boards of the University of Southern California.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.