Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records

Zahra Zad, Victoria S. Jiang, Amber T. Wolf, Taiyao Wang, J. Jojo Cheng, View ORCID ProfileIoannis Ch. Paschalidis, Shruthi Mahalingaiah
doi: https://doi.org/10.1101/2023.07.27.23293255
Zahra Zad
1Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, 15 St. Mary’s Street, Brookline, MA 02446, USA
BS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Victoria S. Jiang
2Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Massachusetts General Hospital, 55 Fruit Street, Yawkey 10, Boston, MA 02114, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amber T. Wolf
3Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, USA
BA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Taiyao Wang
1Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, 15 St. Mary’s Street, Brookline, MA 02446, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
J. Jojo Cheng
4Department of Biostatistics and Medical Informatics, University of Wisconsin, West Johnson Street, Madison, WI 53792, USA
BA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ioannis Ch. Paschalidis
1Division of Systems Engineering, Center for Information and Systems Engineering (CISE), Boston University, 15 St. Mary’s Street, Brookline, MA 02446, USA
5Department of Electrical & Computer Engineering, Department of Biomedical Engineering, and Faculty for Computing & Data Sciences, Boston University, 8 St. Mary’s Street, Boston, MA 02215, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ioannis Ch. Paschalidis
Shruthi Mahalingaiah
2Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Massachusetts General Hospital, 55 Fruit Street, Yawkey 10, Boston, MA 02114, USA
6Department of Environmental Health, Harvard T.H. Chan School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA
MD, MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: shruthi@hsph.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Introduction Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.

Methods This is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.

Results We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.

Conclusions Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was funded by National Institutes of Health (R01 GM135930), National Institutes of Health (UL54 TR004130), Boston University Kilachand Fund for Integrated Life Science and Engineering, National Science Foundation (CCF-2200052), National Science Foundation (IIS-1914792), and National Science Foundation (DMS-1664644).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Institutional Review Board of Boston University School of Medicine and the Harvard T.H. Chan School of Public Health (Protocol # H35708) agave ethical approval for this work

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Study funding/competing interest(s): This study was partially supported by National Science Foundation grants CCF-2200052, IIS-1914792, and DMS-1664644, by the NIH under grants R01 GM135930 and UL54 TR004130, and by the Boston University Kilachand Fund for Integrated Life Science and Engineering

  • Disclosure Summary: The authors declare no conflict of interest and nothing to disclose.

  • Introduction expanded for model used; vastly added to discussion to expanded on predictive model

Data availability

All datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted October 01, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records
Zahra Zad, Victoria S. Jiang, Amber T. Wolf, Taiyao Wang, J. Jojo Cheng, Ioannis Ch. Paschalidis, Shruthi Mahalingaiah
medRxiv 2023.07.27.23293255; doi: https://doi.org/10.1101/2023.07.27.23293255
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records
Zahra Zad, Victoria S. Jiang, Amber T. Wolf, Taiyao Wang, J. Jojo Cheng, Ioannis Ch. Paschalidis, Shruthi Mahalingaiah
medRxiv 2023.07.27.23293255; doi: https://doi.org/10.1101/2023.07.27.23293255

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Obstetrics and Gynecology
Subject Areas
All Articles
  • Addiction Medicine (280)
  • Allergy and Immunology (577)
  • Anesthesia (139)
  • Cardiovascular Medicine (1942)
  • Dentistry and Oral Medicine (252)
  • Dermatology (183)
  • Emergency Medicine (331)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (697)
  • Epidemiology (11086)
  • Forensic Medicine (8)
  • Gastroenterology (619)
  • Genetic and Genomic Medicine (3156)
  • Geriatric Medicine (307)
  • Health Economics (560)
  • Health Informatics (2037)
  • Health Policy (861)
  • Health Systems and Quality Improvement (781)
  • Hematology (309)
  • HIV/AIDS (682)
  • Infectious Diseases (except HIV/AIDS) (12702)
  • Intensive Care and Critical Care Medicine (707)
  • Medical Education (317)
  • Medical Ethics (92)
  • Nephrology (334)
  • Neurology (2977)
  • Nursing (162)
  • Nutrition (461)
  • Obstetrics and Gynecology (589)
  • Occupational and Environmental Health (614)
  • Oncology (1548)
  • Ophthalmology (474)
  • Orthopedics (185)
  • Otolaryngology (265)
  • Pain Medicine (201)
  • Palliative Medicine (57)
  • Pathology (402)
  • Pediatrics (909)
  • Pharmacology and Therapeutics (381)
  • Primary Care Research (354)
  • Psychiatry and Clinical Psychology (2774)
  • Public and Global Health (5583)
  • Radiology and Imaging (1090)
  • Rehabilitation Medicine and Physical Therapy (630)
  • Respiratory Medicine (755)
  • Rheumatology (338)
  • Sexual and Reproductive Health (311)
  • Sports Medicine (287)
  • Surgery (342)
  • Toxicology (48)
  • Transplantation (159)
  • Urology (131)