Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient Centered Outcomes

Mohammed Ali Al-Garadi, Yuan-Chi Yang, Sahithi Lakamana, Jie Lin, Sabrina Li, Angel Xie, Whitney Hogg-Bremer, Mylin Torres, Imon Banerjee, View ORCID ProfileAbeed Sarker
doi: https://doi.org/10.1101/2020.05.17.20104778
Mohammed Ali Al-Garadi
1Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta GA 30322, USA, {,,,,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: m.a.al-garadi@emory.edu yuan-chi.yang@emory.edu slakama@emory.edu whitney.hogg@emory.edu imon.banerjee@emory.edu abeed.sarker@emory.edu
Yuan-Chi Yang
1Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta GA 30322, USA, {,,,,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: m.a.al-garadi@emory.edu yuan-chi.yang@emory.edu slakama@emory.edu whitney.hogg@emory.edu imon.banerjee@emory.edu abeed.sarker@emory.edu
Sahithi Lakamana
1Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta GA 30322, USA, {,,,,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: m.a.al-garadi@emory.edu yuan-chi.yang@emory.edu slakama@emory.edu whitney.hogg@emory.edu imon.banerjee@emory.edu abeed.sarker@emory.edu
Jie Lin
3Department of Computer Science, College of Arts and Sciences, {,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: linyi.li@emory.edu jie.lin@emory.edu angel.xie@emory.edu
Sabrina Li
3Department of Computer Science, College of Arts and Sciences, {,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: linyi.li@emory.edu jie.lin@emory.edu angel.xie@emory.edu
Angel Xie
3Department of Computer Science, College of Arts and Sciences, {,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: linyi.li@emory.edu jie.lin@emory.edu angel.xie@emory.edu
Whitney Hogg-Bremer
1Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta GA 30322, USA, {,,,,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: m.a.al-garadi@emory.edu yuan-chi.yang@emory.edu slakama@emory.edu whitney.hogg@emory.edu imon.banerjee@emory.edu abeed.sarker@emory.edu
Mylin Torres
2Department of Radiation Oncology, School of Medicine, Emory University, Atlanta GA 30322, USA,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: matorre@emory.edu
Imon Banerjee
1Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta GA 30322, USA, {,,,,,}
4Department of Radiology, School of Medicine, Emory University, Atlanta GA 30322, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: m.a.al-garadi@emory.edu yuan-chi.yang@emory.edu slakama@emory.edu whitney.hogg@emory.edu imon.banerjee@emory.edu abeed.sarker@emory.edu
Abeed Sarker
1Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta GA 30322, USA, {,,,,,}
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Abeed Sarker
  • For correspondence: abeed@dbmi.emory.edu m.a.al-garadi@emory.edu yuan-chi.yang@emory.edu slakama@emory.edu whitney.hogg@emory.edu imon.banerjee@emory.edu abeed.sarker@emory.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Breast cancer patients often discontinue their long-term treatments, such as hormone therapy, increasing the risk of cancer recurrence. These discontinuations may be caused by adverse patient-centered outcomes (PCOs) due to hormonal drug side effects or other factors. PCOs are not detectable through laboratory tests, and are sparsely documented in electronic health records. Thus, there is a need to explore complementary sources of information for PCOs associated with breast cancer treatments. Social media is a promising resource, but extracting true PCOs from it first requires the accurate detection of breast cancer patients. We describe a natural language processing (NLP) architecture for automatically detecting breast cancer patients from Twitter based on their self-reports. The architecture employs breast cancer related keywords to collect streaming data from Twitter, applies NLP patterns to pre-filter noisy posts, and then employs a machine learning classifier trained using manually-annotated data (n=5019) for distinguishing firsthand self-reports of breast cancer from other tweets. A classifier based on bidirectional encoder representations from transformers (BERT) showed human-like performance and achieved F1-score of 0.857 (inter-annotator agreement: 0.845; Cohen’s kappa) for the positive class, considerably outperforming the next best classifier—a deep neural network (F1-score: 0.665). Qualitative analyses of posts from automatically-detected users revealed discussions about side effects, non-adherence and mental health conditions, illustrating the feasibility of our social media-based approach for studying breast cancer related PCOs from a large population.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Effort for this work was funded by Emory University School of Medicine.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Data will be made available after peer review.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted May 21, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient Centered Outcomes
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient Centered Outcomes
Mohammed Ali Al-Garadi, Yuan-Chi Yang, Sahithi Lakamana, Jie Lin, Sabrina Li, Angel Xie, Whitney Hogg-Bremer, Mylin Torres, Imon Banerjee, Abeed Sarker
medRxiv 2020.05.17.20104778; doi: https://doi.org/10.1101/2020.05.17.20104778
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient Centered Outcomes
Mohammed Ali Al-Garadi, Yuan-Chi Yang, Sahithi Lakamana, Jie Lin, Sabrina Li, Angel Xie, Whitney Hogg-Bremer, Mylin Torres, Imon Banerjee, Abeed Sarker
medRxiv 2020.05.17.20104778; doi: https://doi.org/10.1101/2020.05.17.20104778

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (162)
  • Allergy and Immunology (416)
  • Anesthesia (91)
  • Cardiovascular Medicine (862)
  • Dentistry and Oral Medicine (159)
  • Dermatology (98)
  • Emergency Medicine (251)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (394)
  • Epidemiology (8570)
  • Forensic Medicine (4)
  • Gastroenterology (387)
  • Genetic and Genomic Medicine (1756)
  • Geriatric Medicine (167)
  • Health Economics (373)
  • Health Informatics (1249)
  • Health Policy (622)
  • Health Systems and Quality Improvement (468)
  • Hematology (196)
  • HIV/AIDS (378)
  • Infectious Diseases (except HIV/AIDS) (10316)
  • Intensive Care and Critical Care Medicine (553)
  • Medical Education (192)
  • Medical Ethics (51)
  • Nephrology (212)
  • Neurology (1680)
  • Nursing (97)
  • Nutrition (252)
  • Obstetrics and Gynecology (328)
  • Occupational and Environmental Health (451)
  • Oncology (930)
  • Ophthalmology (263)
  • Orthopedics (102)
  • Otolaryngology (172)
  • Pain Medicine (114)
  • Palliative Medicine (40)
  • Pathology (253)
  • Pediatrics (536)
  • Pharmacology and Therapeutics (254)
  • Primary Care Research (209)
  • Psychiatry and Clinical Psychology (1775)
  • Public and Global Health (3851)
  • Radiology and Imaging (624)
  • Rehabilitation Medicine and Physical Therapy (320)
  • Respiratory Medicine (521)
  • Rheumatology (208)
  • Sexual and Reproductive Health (168)
  • Sports Medicine (158)
  • Surgery (191)
  • Toxicology (36)
  • Transplantation (101)
  • Urology (76)