Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images

Gaurav Bhole, S Suba, View ORCID ProfileNita Parekh
doi: https://doi.org/10.1101/2025.01.31.25321510
Gaurav Bhole
1IIIT-Hyderabad, Gachibowli, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: gauravhb2543{at}gmail.com
S Suba
1IIIT-Hyderabad, Gachibowli, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nita Parekh
1IIIT-Hyderabad, Gachibowli, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nita Parekh
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Breast cancer remains a significant global health concern, and machine learning algorithms and computer-aided detection systems have shown great promise in enhancing the accuracy and efficiency of mammography image analysis. However, there is a critical need for large, benchmark datasets for training deep learning models for breast cancer detection. In this work we developed Mammo-Bench, a large-scale benchmark dataset of mammography images, by collating data from seven well-curated resources, viz., DDSM, INbreast, KAU-BCMD, CMMD, CDD-CESM, DMID, and RSNA Screening Dataset. To ensure consistency across images from diverse sources while preserving clinically relevant features, a preprocessing pipeline that includes breast segmentation, pectoral muscle removal, and intelligent cropping is proposed. The dataset consists of 74,436 high-quality mammographic images from 26,500 patients across 7 countries and is one of the largest open-source mammography databases to the best of our knowledge. To show the efficacy of training on the large dataset, performance of ResNet101 architecture was evaluated on Mammo-Bench and the results compared by training independently on a few member datasets and an external dataset, VinDr-Mammo. An accuracy of 78.8% (with data augmentation of the minority classes) and 77.8% (without data augmentation) was achieved on the proposed benchmark dataset, compared to the other datasets for which accuracy varied from 25 – 69%. Noticeably, improved prediction of the minority classes is observed with the Mammo-Bench dataset. These results establish baseline performance and demonstrate Mammo-Bench's utility as a comprehensive resource for developing and evaluating mammography analysis systems.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used ONLY openly available human data that were originally located at: University of South Florida Digital Mammography Home Page DDSM, https://www.kaggle.com/datasets/ramanathansp20/inbreast-dataset, https://www.kaggle.com/datasets/asmaasaad/king-abdulaziz-university-mammogram-dataset, https://www.cancerimagingarchive.net/collection/cmmd/, https://www.cancerimagingarchive.net/collection/cdd-cesm/, https://www.kaggle.com/competitions/rsna-breast-cancer-detection/data, https://figshare.com/articles/dataset/_b_Digital_mammography_Dataset_for_Breast_Cancer_Diagnosis_Research_DMID_b_DMID_rar/24522883

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • {gaurav.bhole{at}research.iiit.ac.in, suba.s{at}research.iiit.ac.in} and nita{at}iiit.ac.in

  • * dataset with restricted access

Data Availability

All data produced are available online at: https://india-data.org/dataset-details/c86fb00c-0fb8-4e0e-85a2-4d415f9c1ada

https://india-data.org/dataset-details/c86fb00c-0fb8-4e0e-85a2-4d415f9c1ada

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted February 02, 2025.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images
Gaurav Bhole, S Suba, Nita Parekh
medRxiv 2025.01.31.25321510; doi: https://doi.org/10.1101/2025.01.31.25321510
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images
Gaurav Bhole, S Suba, Nita Parekh
medRxiv 2025.01.31.25321510; doi: https://doi.org/10.1101/2025.01.31.25321510

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Radiology and Imaging
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4483)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15283)
  • Forensic Medicine (31)
  • Gastroenterology (1134)
  • Genetic and Genomic Medicine (6651)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4606)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1624)
  • Hematology (545)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15965)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (675)
  • Neurology (6699)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3370)
  • Ophthalmology (989)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (670)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5497)
  • Public and Global Health (9288)
  • Radiology and Imaging (2225)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1202)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (536)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)