Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Exploring Zero-Shot Cross-Lingual Biomedical Concept Normalization via Large Language Models

View ORCID ProfileHossein Rouhizadeh, View ORCID ProfileAnthony Yazdani, View ORCID ProfileBoya Zhang, View ORCID ProfileDouglas Teodoro
doi: https://doi.org/10.1101/2025.02.27.25323007
Hossein Rouhizadeh
aDepartment of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hossein Rouhizadeh
  • For correspondence: hossein.rouhizadeh{at}unige.ch
Anthony Yazdani
aDepartment of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anthony Yazdani
Boya Zhang
aDepartment of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Boya Zhang
Douglas Teodoro
aDepartment of Radiology and Medical Informatics, Faculty of Medicine, University of Geneva, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Douglas Teodoro
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Over the past few years, discriminative and generative large language models (LLMs) have emerged as the predominant approaches in natural language processing. However, despite significant advancements, there remains a gap in comparing the performance of discriminative and generative LLMs in cross-lingual biomedical concept normalization. In this paper, we perform a comparative study across several LLMs on the challenging task of cross-lingual biomedical concept normalization via dense retrieval. We utilize the XL-BEL dataset covering 10 languages to evaluate the model’s capacity to generalize across various linguistic contexts without further adaptation. The experimental findings demonstrate that e5, a discriminative model, exhibited superior performance, whereas BioMistral emerged as the top-performing generative LLM. The code for reproducing the experiments is available at: https://github.com/hrouhizadeh/zsh_cl_bcn.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was funded by the Innosuisse - project no.: 55441.1 IP ICT.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present work are contained in the manuscript

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 27, 2025.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Exploring Zero-Shot Cross-Lingual Biomedical Concept Normalization via Large Language Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Exploring Zero-Shot Cross-Lingual Biomedical Concept Normalization via Large Language Models
Hossein Rouhizadeh, Anthony Yazdani, Boya Zhang, Douglas Teodoro
medRxiv 2025.02.27.25323007; doi: https://doi.org/10.1101/2025.02.27.25323007
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Exploring Zero-Shot Cross-Lingual Biomedical Concept Normalization via Large Language Models
Hossein Rouhizadeh, Anthony Yazdani, Boya Zhang, Douglas Teodoro
medRxiv 2025.02.27.25323007; doi: https://doi.org/10.1101/2025.02.27.25323007

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (430)
  • Allergy and Immunology (757)
  • Anesthesia (221)
  • Cardiovascular Medicine (3298)
  • Dentistry and Oral Medicine (365)
  • Dermatology (280)
  • Emergency Medicine (479)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1173)
  • Epidemiology (13384)
  • Forensic Medicine (19)
  • Gastroenterology (899)
  • Genetic and Genomic Medicine (5157)
  • Geriatric Medicine (482)
  • Health Economics (783)
  • Health Informatics (3275)
  • Health Policy (1143)
  • Health Systems and Quality Improvement (1193)
  • Hematology (432)
  • HIV/AIDS (1019)
  • Infectious Diseases (except HIV/AIDS) (14636)
  • Intensive Care and Critical Care Medicine (913)
  • Medical Education (478)
  • Medical Ethics (127)
  • Nephrology (525)
  • Neurology (4930)
  • Nursing (262)
  • Nutrition (730)
  • Obstetrics and Gynecology (886)
  • Occupational and Environmental Health (795)
  • Oncology (2524)
  • Ophthalmology (728)
  • Orthopedics (282)
  • Otolaryngology (347)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (544)
  • Pediatrics (1302)
  • Pharmacology and Therapeutics (551)
  • Primary Care Research (557)
  • Psychiatry and Clinical Psychology (4218)
  • Public and Global Health (7512)
  • Radiology and Imaging (1708)
  • Rehabilitation Medicine and Physical Therapy (1016)
  • Respiratory Medicine (980)
  • Rheumatology (480)
  • Sexual and Reproductive Health (498)
  • Sports Medicine (424)
  • Surgery (549)
  • Toxicology (72)
  • Transplantation (236)
  • Urology (205)