Abstract
Background The rapid growth of inherently complex and heterogeneous data in HIV/AIDS research underscores the importance of Big Data Science. Recently, there have been increasing uptakes of Big Data techniques in basic, clinical, and public health fields of HIV/AIDS research. However, no studies have systematically elaborated on the evolving applications of Big Data in HIV/AIDS research. We sought to explore the emergence and evolution of Big Data Science in HIV/AIDS-related publications that were funded by the US federal agencies.
Methods We identified HIV/AIDS and Big Data related publications that were funded by seven federal agencies from 2000 to 2019 by integrating data from National Institutes of Health (NIH) ExPORTER, MEDLINE, and MeSH. Building on bibliometrics and Natural Language Processing (NLP) methods, we constructed co-occurrence networks using bibliographic metadata (e.g., countries, institutes, MeSH terms, and keywords) of the retrieved publications. We then detected clusters among the networks as well as the temporal dynamics of clusters, followed by expert evaluation and clinical implications.
Results We harnessed nearly 600 thousand publications related to HIV/AIDS, of which 19,528 publications relating to Big Data were included in bibliometric analysis. Results showed that (1) the number of Big Data publications has been increasing since 2000, (2) US institutes have been in close collaborations with China, Canada, and Germany, (3) some institutes (e.g., University of California system, MD Anderson Cancer Center, and Harvard Medical School) are among the most productive institutes and started using Big Data in HIV/AIDS research early, (4) Big Data research was not active in public health disciplines until 2015, (5) research topics such as genomics, HIV comorbidities, population-based studies, Electronic Health Records (EHR), social media, precision medicine, and methodologies such as machine learning, Deep Learning, radiomics, and data mining emerge quickly in recent years.
Conclusions We identified a rapid growth in the cross-disciplinary research of HIV/AIDS and Big Data over the past two decades. Our findings demonstrated patterns and trends of prevailing research topics and Big Data applications in HIV/AIDS research and suggested a number of fast-evolving areas of Big Data Science in HIV/AIDS research including secondary analysis of EHR, machine learning, Deep Learning, predictive analysis, and NLP.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award number R01AI127203.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study uses publicly available data thus no IRB review is required.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Data referred to in the manuscript are available upon request.