Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

GEN-KnowRD: Reframing AI for Rare Disease Recognition

Chao Yan, Wu-Chen Su, View ORCID ProfileYi Xin, Monika E. Grabowska, View ORCID ProfileVern E. Kerchberger, Victor A. Borza, Jinlian Wang, Liwei Wang, Rui Li, Jacob Lynn, Alyson L. Dickson, Cathy Shyr, View ORCID ProfileQiPing Feng, Charles M. Stein, View ORCID ProfileKai Wang, Peter J. Embi, Bradley A. Malin, Hongfang Liu, Wei-Qi Wei
doi: https://doi.org/10.64898/2026.03.02.26347469
Chao Yan
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wu-Chen Su
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yi Xin
2Department of Computer Science, Vanderbilt University, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yi Xin
Monika E. Grabowska
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vern E. Kerchberger
3Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vern E. Kerchberger
Victor A. Borza
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jinlian Wang
4McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Liwei Wang
4McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rui Li
4McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacob Lynn
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alyson L. Dickson
3Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cathy Shyr
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
5Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
6Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
QiPing Feng
3Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for QiPing Feng
Charles M. Stein
3Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kai Wang
7Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kai Wang
Peter J. Embi
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
3Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bradley A. Malin
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
2Department of Computer Science, Vanderbilt University, Nashville, TN, USA
5Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hongfang Liu
4McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wei-Qi Wei
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
2Department of Computer Science, Vanderbilt University, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wei-qi.wei{at}vumc.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Rare diseases affect over 300 million people worldwide, yet patients often endure years-long diagnostic delays that limit timely intervention and trial opportunities. Computational rare disease recognition (RDR) remains constrained by knowledge resources that are often incomplete, heterogeneous, and dependent on extensive multi-disciplinary expert curation that cannot scale. Large language models (LLMs) applied directly for end-to-end diagnosis or disease discrimination face similar knowledge bottlenecks while also raising concerns around cost, reproducibility, and data governance. Here, we introduce GEN-KnowRD, a knowledge-layer-first framework that leverages LLMs to generate schema-guided rare disease profiles, systematically assesses their quality, and constructs a computable knowledge base (PheMAP-RD) for local deployment. GEN-KnowRD integrates this knowledge into lightweight inference pipelines for both general-purpose disease screening and specialized early discrimination from longitudinal electronic health records. Across six public benchmarks for general-purpose screen (9,290 patients spanning 798 rare diseases), GEN-KnowRD significantly improves disease ranking compared to a state-of-the-art, HPO-centered diagnostic framework (up to 345.8% improvement in top-1 success), advanced end-to-end LLM reasoning (up to 129.1% improvement), and a variant of GEN-KnowRD instantiated with expert-curated knowledge rather than LLM-generated profiles. In two real-world cohorts for early diagnosis of idiopathic pulmonary fibrosis (511 patients) as a use case, GEN-KnowRD also demonstrates robust discrimination performance gains, supporting effective RDR during the pre-diagnostic window. These findings demonstrate that repositioning LLMs from diagnostic reasoning to the knowledge layer—decoupling knowledge construction from patient-level inference—yields stronger RDR, while providing scalable, continuously updatable, and reusable infrastructure for diagnosis, screening, and clinical research across the rare disease landscape.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work is supported in part by National Institute of Health grants R01HG012748, K99LM014428, R00LM014429, R01HG013031, R01AG084550, R01HL171809, R01HG012748, R01LM012806, P50HD106446, R01GM139891, and UL1TR002243.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The Institutional Review Boards (IRB) at Vanderbilt University Medical Center approved this study. The IRBs grants a full waiver of written informed consent from patients due to the nature of the retrospective observational study.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted March 03, 2026.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
GEN-KnowRD: Reframing AI for Rare Disease Recognition
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
GEN-KnowRD: Reframing AI for Rare Disease Recognition
Chao Yan, Wu-Chen Su, Yi Xin, Monika E. Grabowska, Vern E. Kerchberger, Victor A. Borza, Jinlian Wang, Liwei Wang, Rui Li, Jacob Lynn, Alyson L. Dickson, Cathy Shyr, QiPing Feng, Charles M. Stein, Kai Wang, Peter J. Embi, Bradley A. Malin, Hongfang Liu, Wei-Qi Wei
medRxiv 2026.03.02.26347469; doi: https://doi.org/10.64898/2026.03.02.26347469
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
GEN-KnowRD: Reframing AI for Rare Disease Recognition
Chao Yan, Wu-Chen Su, Yi Xin, Monika E. Grabowska, Vern E. Kerchberger, Victor A. Borza, Jinlian Wang, Liwei Wang, Rui Li, Jacob Lynn, Alyson L. Dickson, Cathy Shyr, QiPing Feng, Charles M. Stein, Kai Wang, Peter J. Embi, Bradley A. Malin, Hongfang Liu, Wei-Qi Wei
medRxiv 2026.03.02.26347469; doi: https://doi.org/10.64898/2026.03.02.26347469

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4482)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15278)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6645)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4605)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15961)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6695)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (669)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5495)
  • Public and Global Health (9285)
  • Radiology and Imaging (2224)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (536)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)