Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

OnSIDES (ON-label SIDE effectS resource) Database : Extracting Adverse Drug Events from Drug Labels using Natural Language Processing Models

View ORCID ProfileYutaro Tanaka, Hsin Yi Chen, Pietro Belloni, Undina Gisladottir, Jenna Kefeli, Jason Patterson, Apoorva Srinivasan, Michael Zietz, Gaurav Sirdeshmukh, Jacob Berkowitz, Kathleen LaRow Brown, View ORCID ProfileNicholas P. Tatonetti
doi: https://doi.org/10.1101/2024.03.22.24304724
Yutaro Tanaka
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
2Department of Applied Physics and Applied Mathematics, Fu Foundation School of Engineering and Applied Sciences, Columbia University; New York, NY 10027, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yutaro Tanaka
Hsin Yi Chen
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pietro Belloni
5Department of Statistical Sciences, University of Padova; Padova, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Undina Gisladottir
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jenna Kefeli
3Department of Systems Biology, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jason Patterson
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Apoorva Srinivasan
6Department of Computational Biomedicine, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
7Cedars-Sinai Cancer, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Zietz
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gaurav Sirdeshmukh
6Department of Computational Biomedicine, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
7Cedars-Sinai Cancer, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacob Berkowitz
6Department of Computational Biomedicine, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
7Cedars-Sinai Cancer, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kathleen LaRow Brown
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicholas P. Tatonetti
1Department of Biomedical Informatics, Columbia University Irving Medical Center, Columbia University; New York, NY 10032, USA
4Herbert Irving Comprehensive Cancer Center, New York Presbyterian/Columbia University Irving Medical Center; New York, NY 10032, USA
6Department of Computational Biomedicine, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
7Cedars-Sinai Cancer, Cedars-Sinai Medical Center; Los Angeles, CA 90069, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicholas P. Tatonetti
  • For correspondence: nicholas.tatonetti{at}cshs.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting the opportunity to study drug safety on a broader, systematic scale. Recent advances in Natural Language Processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text. As such, we fine-tuned a PubMedBERT model to extract ADE terms from descriptive text in FDA Structured Product Labels for prescription drugs. With this model, we achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels’ “Adverse Reactions”. We further utilize this method to extract serious ADEs from labels’ “Boxed Warnings”, and ADEs specifically noted for pediatric patients. Here, we present OnSIDES (ON-label SIDE effectS resource), a compiled, computable database of drug-ADE pairs generated with this method. OnSIDES contains more than 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels. Additionally, we expand this method to extract ADEs from drug labels of other major nations/regions - Japan, the UK, and the EU - to build a complementary OnSIDES-INTL database. To present potential applications, we used OnSIDES to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures. We conclude that OnSIDES can be utilized as a comprehensive resource to study and enhance drug safety.

One Sentence Summary OnSIDES is a large, comprehensive database of adverse drug events extracted from drug labels using natural language processing methods.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was primarily supported by the National Institutes of Health (NIH), National Institute of General Medical Sciences (NIGMS) grant R35GM131905. Additionally, U.G, M.Z, K.L.B are supported by a NIH National Library of Medicine (NLM) grant T15LM007079, and H.Y.C is supported by the NIH NIGMS grant T32GM145440.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data and Code Availability

All of the data, code, and models trained and generated to construct the OnSIDES database and all other complementing databases are available and maintained at https://github.com/tatonetti-lab/onsides. Any requests for additional materials can be made via email to the corresponding author.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted March 24, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
OnSIDES (ON-label SIDE effectS resource) Database : Extracting Adverse Drug Events from Drug Labels using Natural Language Processing Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
OnSIDES (ON-label SIDE effectS resource) Database : Extracting Adverse Drug Events from Drug Labels using Natural Language Processing Models
Yutaro Tanaka, Hsin Yi Chen, Pietro Belloni, Undina Gisladottir, Jenna Kefeli, Jason Patterson, Apoorva Srinivasan, Michael Zietz, Gaurav Sirdeshmukh, Jacob Berkowitz, Kathleen LaRow Brown, Nicholas P. Tatonetti
medRxiv 2024.03.22.24304724; doi: https://doi.org/10.1101/2024.03.22.24304724
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
OnSIDES (ON-label SIDE effectS resource) Database : Extracting Adverse Drug Events from Drug Labels using Natural Language Processing Models
Yutaro Tanaka, Hsin Yi Chen, Pietro Belloni, Undina Gisladottir, Jenna Kefeli, Jason Patterson, Apoorva Srinivasan, Michael Zietz, Gaurav Sirdeshmukh, Jacob Berkowitz, Kathleen LaRow Brown, Nicholas P. Tatonetti
medRxiv 2024.03.22.24304724; doi: https://doi.org/10.1101/2024.03.22.24304724

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Pharmacology and Therapeutics
Subject Areas
All Articles
  • Addiction Medicine (427)
  • Allergy and Immunology (753)
  • Anesthesia (220)
  • Cardiovascular Medicine (3281)
  • Dentistry and Oral Medicine (362)
  • Dermatology (274)
  • Emergency Medicine (478)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1164)
  • Epidemiology (13340)
  • Forensic Medicine (19)
  • Gastroenterology (897)
  • Genetic and Genomic Medicine (5130)
  • Geriatric Medicine (479)
  • Health Economics (781)
  • Health Informatics (3253)
  • Health Policy (1138)
  • Health Systems and Quality Improvement (1189)
  • Hematology (427)
  • HIV/AIDS (1014)
  • Infectious Diseases (except HIV/AIDS) (14613)
  • Intensive Care and Critical Care Medicine (910)
  • Medical Education (475)
  • Medical Ethics (126)
  • Nephrology (522)
  • Neurology (4901)
  • Nursing (261)
  • Nutrition (725)
  • Obstetrics and Gynecology (880)
  • Occupational and Environmental Health (795)
  • Oncology (2516)
  • Ophthalmology (722)
  • Orthopedics (280)
  • Otolaryngology (346)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (540)
  • Pediatrics (1298)
  • Pharmacology and Therapeutics (548)
  • Primary Care Research (554)
  • Psychiatry and Clinical Psychology (4193)
  • Public and Global Health (7482)
  • Radiology and Imaging (1702)
  • Rehabilitation Medicine and Physical Therapy (1010)
  • Respiratory Medicine (979)
  • Rheumatology (478)
  • Sexual and Reproductive Health (495)
  • Sports Medicine (424)
  • Surgery (546)
  • Toxicology (71)
  • Transplantation (235)
  • Urology (203)