Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States

Ari Z. Klein, Steven Meanley, Karen O’Connor, José A. Bauermeister, Graciela Gonzalez-Hernandez
doi: https://doi.org/10.1101/2021.08.23.21261924
Ari Z. Klein
1Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: ariklein@pennmedicine.upenn.edu
Steven Meanley
2Department of Family and Community Health, School of Nursing, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen O’Connor
1Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José A. Bauermeister
2Department of Family and Community Health, School of Nursing, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Graciela Gonzalez-Hernandez
1Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of Human Immunodeficiency Virus (HIV). There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. While Twitter content has been analyzed as a source of PrEP-related data (e.g., barriers), methods have not been developed to enable the use of Twitter as a platform for implementing PrEP-related interventions.

Objective Men who have sex with men (MSM) are the population most affected by HIV in the United States. Therefore, the objective of this study was to develop and assess an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or MSM.

Methods Between September 2020 and January 2021, we used the Twitter Streaming Application Programming Interface (API) to collect more than 3 million tweets containing keywords that men may include in posts reporting that they are gay, bisexual, or MSM. We deployed handwritten, high-precision regular expressions on the tweets and their user profile metadata designed to filter out noise and identify actual self-reports. We identified 10,043 unique users geolocated in the United States, and drew upon a validated NLP tool to automatically identify their ages.

Results Based on manually distinguishing true and false positive self-reports in the tweets or profiles of 1000 of the 10,043 users identified by our automated pipeline, our pipeline has a precision of 0.85. Among the 8756 users for which a United States state-level geolocation was detected, 5096 (58.2%) of them are in the 10 states with the highest numbers of new HIV diagnoses. Among the 6240 users for which a county-level geolocation was detected, 4252 (68.1%) of them are in counties or states considered priority jurisdictions by the Ending the HIV Epidemic (EHE) initiative. Furthermore, the majority of the users are in the same two age groups as the majority of MSM in the United States with new HIV diagnoses.

Conclusions Our automated NLP pipeline can be used to identify MSM in the United States who may be at risk for acquiring HIV, laying the groundwork for using Twitter on a large scale to target PrEP-related interventions directly at this population.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was supported by a grant from the Penn Center for AIDS Research (CFAR), an NIH-funded program (P30 AI 045008).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Institutional Review Board (IRB) of the University of Pennsylvania

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The annotated Twitter data used to validate our automated natural language processing pipeline will be made available by request.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 26, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States
Ari Z. Klein, Steven Meanley, Karen O’Connor, José A. Bauermeister, Graciela Gonzalez-Hernandez
medRxiv 2021.08.23.21261924; doi: https://doi.org/10.1101/2021.08.23.21261924
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States
Ari Z. Klein, Steven Meanley, Karen O’Connor, José A. Bauermeister, Graciela Gonzalez-Hernandez
medRxiv 2021.08.23.21261924; doi: https://doi.org/10.1101/2021.08.23.21261924

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (162)
  • Allergy and Immunology (416)
  • Anesthesia (91)
  • Cardiovascular Medicine (863)
  • Dentistry and Oral Medicine (159)
  • Dermatology (98)
  • Emergency Medicine (251)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (394)
  • Epidemiology (8571)
  • Forensic Medicine (4)
  • Gastroenterology (388)
  • Genetic and Genomic Medicine (1758)
  • Geriatric Medicine (167)
  • Health Economics (373)
  • Health Informatics (1249)
  • Health Policy (622)
  • Health Systems and Quality Improvement (468)
  • Hematology (196)
  • HIV/AIDS (378)
  • Infectious Diseases (except HIV/AIDS) (10318)
  • Intensive Care and Critical Care Medicine (553)
  • Medical Education (192)
  • Medical Ethics (51)
  • Nephrology (213)
  • Neurology (1681)
  • Nursing (97)
  • Nutrition (252)
  • Obstetrics and Gynecology (328)
  • Occupational and Environmental Health (451)
  • Oncology (930)
  • Ophthalmology (264)
  • Orthopedics (102)
  • Otolaryngology (172)
  • Pain Medicine (114)
  • Palliative Medicine (40)
  • Pathology (253)
  • Pediatrics (538)
  • Pharmacology and Therapeutics (254)
  • Primary Care Research (209)
  • Psychiatry and Clinical Psychology (1775)
  • Public and Global Health (3853)
  • Radiology and Imaging (626)
  • Rehabilitation Medicine and Physical Therapy (320)
  • Respiratory Medicine (521)
  • Rheumatology (208)
  • Sexual and Reproductive Health (168)
  • Sports Medicine (158)
  • Surgery (191)
  • Toxicology (36)
  • Transplantation (101)
  • Urology (76)