Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Use of large language models as a scalable approach to understanding public health discourse

View ORCID ProfileLaura Espinosa, View ORCID ProfileMarcel Salathé
doi: https://doi.org/10.1101/2024.02.06.24302383
Laura Espinosa
1Digital Epidemiology Lab, School of Life Sciences, School of Computer and Communication Sciences, EPFL, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Laura Espinosa
  • For correspondence: laura.espinosamontalban{at}epfl.ch
Marcel Salathé
1Digital Epidemiology Lab, School of Life Sciences, School of Computer and Communication Sciences, EPFL, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marcel Salathé
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Online public health discourse is becoming more and more important in shaping public health dynamics. Large Language Models (LLMs) offer a scalable solution for analysing the vast amounts of unstructured text found on online platforms. Here, we explore the effectiveness of Large Language Models (LLMs), including GPT models and open-source alternatives, for extracting public stances towards vaccination from social media posts. Using an expert-annotated dataset of social media posts related to vaccination, we applied various LLMs and a rule-based sentiment analysis tool to classify the stance towards vaccination. We assessed the accuracy of these methods through comparisons with expert annotations and annotations obtained through crowdsourcing. Our results demonstrate that few-shot prompting of best-in-class LLMs are the best performing methods, and that all alternatives have significant risks of substantial misclassification. The study highlights the potential of LLMs as a scalable tool for public health professionals to quickly gauge public opinion on health policies and interventions, offering an efficient alternative to traditional data analysis methods. With the continuous advancement in LLM development, the integration of these models into public health surveillance systems could substantially improve our ability to monitor and respond to changing public health attitudes.

Authors summary We examined how Large Language Models (LLMs), including GPT models and open-source versions, can analyse online discussions about vaccination from social media. Using a dataset with expert-checked posts, we tested various LLMs and a sentiment analysis tool to identify public stance towards vaccination. Our findings suggest that using LLMs, and prompting them with labelled examples, is the most effective approach. The results show that LLMs are a valuable resource for public health experts to quickly understand the dynamics of public attitudes towards health policies and interventions, providing a faster and efficient option compared to traditional methods. As LLMs continue to improve, incorporating these models into digital public health monitoring could greatly improve how we observe and react to dynamics in public health discussions.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by Fondation Botnar and the European Union's Horizon H2020 grant VEO (874735).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used only openly available data from X (former Twitter)

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The data (list of tweets identifiers and summarised anonymised datasets) and R and Python code used can be found in the online repository at https://github.com/digitalepidemiologylab/llm_crowd_experts_annotation.

https://github.com/digitalepidemiologylab/llm_crowd_experts_annotation

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 06, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Use of large language models as a scalable approach to understanding public health discourse
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Use of large language models as a scalable approach to understanding public health discourse
Laura Espinosa, Marcel Salathé
medRxiv 2024.02.06.24302383; doi: https://doi.org/10.1101/2024.02.06.24302383
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Use of large language models as a scalable approach to understanding public health discourse
Laura Espinosa, Marcel Salathé
medRxiv 2024.02.06.24302383; doi: https://doi.org/10.1101/2024.02.06.24302383

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (434)
  • Allergy and Immunology (760)
  • Anesthesia (222)
  • Cardiovascular Medicine (3316)
  • Dentistry and Oral Medicine (366)
  • Dermatology (282)
  • Emergency Medicine (480)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1175)
  • Epidemiology (13403)
  • Forensic Medicine (19)
  • Gastroenterology (900)
  • Genetic and Genomic Medicine (5181)
  • Geriatric Medicine (483)
  • Health Economics (786)
  • Health Informatics (3286)
  • Health Policy (1146)
  • Health Systems and Quality Improvement (1199)
  • Hematology (432)
  • HIV/AIDS (1024)
  • Infectious Diseases (except HIV/AIDS) (14657)
  • Intensive Care and Critical Care Medicine (917)
  • Medical Education (478)
  • Medical Ethics (128)
  • Nephrology (526)
  • Neurology (4957)
  • Nursing (263)
  • Nutrition (735)
  • Obstetrics and Gynecology (889)
  • Occupational and Environmental Health (797)
  • Oncology (2531)
  • Ophthalmology (730)
  • Orthopedics (284)
  • Otolaryngology (348)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (547)
  • Pediatrics (1308)
  • Pharmacology and Therapeutics (552)
  • Primary Care Research (559)
  • Psychiatry and Clinical Psychology (4224)
  • Public and Global Health (7526)
  • Radiology and Imaging (1717)
  • Rehabilitation Medicine and Physical Therapy (1022)
  • Respiratory Medicine (982)
  • Rheumatology (480)
  • Sexual and Reproductive Health (500)
  • Sports Medicine (425)
  • Surgery (551)
  • Toxicology (73)
  • Transplantation (237)
  • Urology (206)