Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Extraction of Human Phenotype Ontology (HPO) Concepts from Clinical Notes Utilizing Large Language Models (LLM) with Model Context Protocol (MCP)

Michael Larsen, Ian M. Campbell, Lori A. Orlando, Peter Robinson, Nephi A. Walton
doi: https://doi.org/10.64898/2026.05.23.26353963
Michael Larsen
1Division of Medical Genetics, University of Utah Health, Salt Lake City, UT, USA
M.D.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ian M. Campbell
2Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
M.D., Ph.D., FACMG
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lori A. Orlando
3Department of Medicine, Wake Forest University School of Medicine, NC, USA
M.D, MHS, MMCI.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Robinson
4Rahel Hirsch Center for Translational Medicine, Berlin Institute of Health at Charité, Berlin, Germany
5The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
M.D., MSc.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nephi A. Walton
6Center for Artificial Intelligence Research, Wake Forest University School of Medicine, NC, USA
7Department of Epidemiology and Prevention, Wake Forest University School of Medicine, NC, USA
M.D., M.S., FACMG
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: nephi.walton{at}advocatehealth.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Background Accurate extraction of Human Phenotype Ontology (HPO) terms from clinical notes is essential for variant prioritization and genetic diagnosis. Large language models (LLMs) often struggle to balance precision, hallucination avoidance, and ontology mapping accuracy, and prior work has shown that retrieval-based grounding can improve performance for individual models. We hypothesized that real-time ontology grounding through external tools would improve these metrics across heterogeneous LLMs, and we evaluated the Model Context Protocol (MCP), a standardized open framework for integrating external tools, as a vendor-agnostic mechanism for delivering such grounding.

Methods Five LLMs (Claude Sonnet 4.5, GPT-5.1, Gemini 2.5 Pro, Grok 4.1, and Qwen3 30B) extracted HPO terms from four synthetic clinical genetics notes under two conditions: baseline (“No Tools,” internal knowledge only) and tool-augmented (“With Tools”), with real-time HPO retrieval delivered through MCP for models with native support and through functionally equivalent native tool-calling interfaces otherwise. Each model performed ≥50 runs per note per condition (>2,000 total runs). Performance was evaluated using Precision, Recall, and F1-score. Outputs were manually adjudicated to classify mapping errors and hallucinations. Results were benchmarked against a commercial EHR-based HPO extraction tool.

Results Tool augmentation significantly improved performance across all models. Mean aggregate F1-score increased from 0.46 (SD 0.22) in the baseline condition to 0.72 (SD 0.15) with tools (p < 0.001). Mapping Error Rate decreased from 40.9% to 7.8% (p < 0.001), and Precision increased from 56% to 90%. Performance gains were observed across all model families, including the open-weight Qwen3 model (F1 0.11→0.50). For inferred phenotypes, F1 improved from 0.20 to 0.34 (p < 0.001) without a significant increase in hallucination rate (p = 0.08). Compared with the commercial benchmark, tool-augmented LLMs achieved higher F1-scores and substantially greater recall for inferred phenotypes.

Conclusions Real-time ontology grounding substantially improves HPO extraction across diverse LLMs by reducing mapping errors and enhancing phenotype inference. The Model Context Protocol provides a standardized, interoperable mechanism for delivering such grounding, supporting reproducible, vendor-agnostic deployment of clinical LLM pipelines in genomic medicine.

40-Word Summary Real-time ontology grounding substantially improves Human Phenotype Ontology term extraction from clinical notes across diverse large language models. The Model Context Protocol provides a standardized, vendor-agnostic mechanism for delivering this grounding, supporting interoperable clinical AI deployments.

Competing Interest Statement

I.M.C reports previous research support from Google Cloud LLC; the funder had no involvement in the current study. All other authors declare no conflict of interest.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

DATA AVAILABILITY

All software tools, synthetic clinical notes, gold-standard annotations, and evaluation scripts used in this study are publicly available via GitHub at https://github.com/clinical-mcp/pheno-extract-ai and https://github.com/clinical-mcp/hpo_mcp. The complete run-level result dataset and final curated annotations will be deposited in the Dryad Digital Repository upon publication, and a draft DOI will be reserved at that time.

Funder Information Declared

I.M.C. was supported by grant K08-HD111688 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development., K08-HD111688
. P.N.R. was supported by grant 5U24HG011449 from the National Human Genome Research Institute (The Human Phenotype Ontology: Accelerating Computational Integration of Clinical Data for Genomics) and by a professorship from the Alexander von Humboldt Foundation., 5U24HG011449
Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted May 25, 2026.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Extraction of Human Phenotype Ontology (HPO) Concepts from Clinical Notes Utilizing Large Language Models (LLM) with Model Context Protocol (MCP)
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Extraction of Human Phenotype Ontology (HPO) Concepts from Clinical Notes Utilizing Large Language Models (LLM) with Model Context Protocol (MCP)
Michael Larsen, Ian M. Campbell, Lori A. Orlando, Peter Robinson, Nephi A. Walton
medRxiv 2026.05.23.26353963; doi: https://doi.org/10.64898/2026.05.23.26353963
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Extraction of Human Phenotype Ontology (HPO) Concepts from Clinical Notes Utilizing Large Language Models (LLM) with Model Context Protocol (MCP)
Michael Larsen, Ian M. Campbell, Lori A. Orlando, Peter Robinson, Nephi A. Walton
medRxiv 2026.05.23.26353963; doi: https://doi.org/10.64898/2026.05.23.26353963

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (867)
  • Anesthesia (306)
  • Cardiovascular Medicine (4480)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (614)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15276)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6644)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4603)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1275)
  • Infectious Diseases (except HIV/AIDS) (15960)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6693)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1152)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (668)
  • Pediatrics (1703)
  • Pharmacology and Therapeutics (699)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5494)
  • Public and Global Health (9285)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (720)
  • Sports Medicine (535)
  • Surgery (720)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)