ABSTRACT
COVID-19 pandemic has taught us many lessons, including the need to manage the exponential growth of knowledge, fast-paced development or modification of existing AI models, limited opportunities to conduct extensive validation studies, the need to understand bias and mitigate it, and lastly, implementation challenges related to AI in healthcare. While the nature of the dynamic pandemic, resource limitations, and evolving pathogens were key to some of the failures of AI to help manage the disease, the infodemic during the pandemic could be a key opportunity that we could manage better. We share our research related to the use of deep learning methods to quantitatively and qualitatively evaluate AI-based COVID-19 publications which provides a unique approach to identify “mature” publications using a validated model and how that can be leveraged further by focused human-in-loop analysis. The study utilized research articles in English that were human-based, extracted from PubMed spanning the years 2020 to 2022. The findings highlight notable patterns in publication maturity over the years, with consistent and significant contributions from China and the United States. The analysis also emphasizes the prevalence of image datasets and variations in employed AI model types. To manage an infodemic during a pandemic, we provide a specific knowledge surveillance method to identify key scientific publications in near real-time. We hope this will enable data-driven and evidence-based decisions that clinicians, data scientists, researchers, policymakers, and public health officials need to make with time sensitivity while keeping humans in the loop.
1 INTRODUCTION
The COVID-19 pandemic has highlighted the urgency of strengthening literature-guided disease surveillance to enable policymakers in formulating evidence-driven policies [14][7] [2][17] [9]. The scientific community has rapidly produced many research papers and leveraged artificial intelligence (AI) technologies to understand the virus, its spread, and potential interventions. However, the abundance of AI-based COVID-19 publications necessitates a robust evaluation process to ensure the reliability and usefulness of these works in informing surveillance strategies [4][1],[5] [12][11][15].
This paper presents an innovative approach to enhance disease surveillance capabilities by conducting a qualitative and quantitative evaluation of AI-based COVID-19 publications using deep learning techniques. By harnessing the power of deep learning algorithms, we seek to evaluate the quality, relevance, and impact of research papers in the context of disease surveillance [16][20][13][10].
The outcomes of this study will not only enhance disease surveil-lance capabilities during the COVID-19 pandemic but also establish a foundation for future surveillance efforts in the face of emerging infectious diseases. By leveraging AI-based deep learning techniques, we can streamline the evaluation process and empower policymakers and public health officials to make informed decisions based on trustworthy and impactful scientific evidence.
Moreover, the implementation of such a maturity model will improve the quality of papers and serve as a catalyst for improving paper quality and industry standards. By following the quality standards established by the maturity model, upcoming research and writing will have great improvements in excellence and merit. Following the global discourse that resulted from the COVID-19 pandemic, more coordination and collaboration can be opened up with the implementation of this new maturity system, which will create a more common understanding and impact of COVID-19 research. In both evaluating current research papers and formulating a structure to follow for upcoming COVID-19 research, this maturity model offers a new step forward in partnership and reliability of research, which is crucial to molding research-based surveillance strategies and interventions.
2 METHODS
This paper presents a study that combines a data-driven approach with human evaluation to enhance the robustness, explainability, and real-time relevance of our findings in understanding the maturity patterns of research papers (Fig 1). The level of maturity of a research paper is determined by the answer to the question, ‘Does the output of the proposed model has a significant impact on real-world applications? which is derived from the original article [23], and defined in the context of COVID-19.
2.1 Dataset
This study utilized in-house data specifically focused on “Artificial Intelligence in Healthcare” reviews from 2020 to 2022. The data collection process involved conducting a PubMed search using the keywords “machine learning” or “artificial intelligence” in conjunction with the years “2020” “2021” and “2022”. The search was limited to English language publications and those involving human subjects until December 31 of each year. The initial search yielded a preliminary list of 5885, 4164, and 9974 papers for 2020, 2021, and 2022, respectively. Subsequently, each paper underwent an individual examination, and any papers found to have flaws in the PubMed search results or deemed irrelevant to the study were excluded. The final cohort comprised 3232, 2182, and 7916 papers, which were carefully selected, examined, and classified into one or more medical disciplines for the respective years. Among these papers, there were 322, 134, and 504 that specifically focused on Covid-19 [19][3].
2.2 Maturity Model
We utilized an approach (Fig. 1) developed to classify the research paper’s maturity based on its abstract [23]. The title and abstract were used as predictors of the paper’s level of maturity. 2, 500 manually labeled abstracts from 1998 to 2020 were utilized to fine-tune the hyperparameters of the BERT PubMed classifier. BERT [8] is a deep learning model for NLP tasks that are built on transformers. BERT’s functioning completely depends on attentional mechanisms that understand the contextual relationships between words in a text. The maturity classifier was validated on a test set (n=784) and prospectively on abstracts from 2021 (n=2494). The test set model had an accuracy of 99% and a precision F1 score of 93%, while the prospective validation model had an accuracy of 99 % and an F1 score of 91%.
2.3 Analysis
To calculate the spectrum of maturity in the publication of COVID-19 articles, we have conducted the following analysis:
First, we calculated the overall year-wise percentage of mature articles.
Next, we determined the geographical distribution of mature COVID-19 articles regarding the country where the senior author is based.
Finally, we manually annotated the data type and AI model type used in the matured articles.
3 RESULTS
3.1 Maturity patterns by the year
In 2020, a total of 322 publications were recorded, out of which 15 were categorized as mature publications. The following year, in 2021, the overall number of publications decreased to 134, with only 2 being classified as mature publications. However, in 2022, the trend reversed, as the total number of publications surged to 504, with 19 being identified as mature publications (Fig. 2A). These results highlight the fluctuating nature of publication output over the three-year period. While there was a significant increase in overall publications from 2021 to 2022, the proportion of mature publications remained relatively low throughout the study period.
3.2 Mature article’s frequency distribution by the geographic location of the senior authors
Next, we calculated the distribution of mature articles based on the geographic location of the senior authors. The study aimed to analyze the distribution of mature articles across different countries over three years.
In 2020, 15 mature articles were identified, with senior authors from various countries. The United States, the Netherlands, Malaysia, Korea, France, China, and Austria were all represented in the dataset. Notably, China had the highest count with seven mature articles (Fig. 2B).
Moving to 2021, the number of mature articles decreased, and the geographic distribution became more diverse. Italy and Switzerland each had one mature article with senior authors from their respective countries (Fig. 2B). In 2022, the overall count of mature articles increased again. The United States (referred to as the USA in the table) and China continued to dominate, with seven and three mature articles, respectively. Japan, the UK, Saudi Arabia, Jordan, Portugal, Greece, Spain, and India were also represented, each with one mature article (Fig. 2B).
3.3 Comparison of Various Datasets and AI Models Employed in Mature Articles
Next, we aimed to analyze the distribution and utilization of different data types and model types in mature articles across various years. The analysis provided valuable insights into the prevalence and trends within the field of mature research. Regarding data types, the findings revealed that image datasets were commonly employed in mature articles, with notable variations across the years. In 2020, 15 articles utilized image datasets, while three articles relied on tabular datasets, and no articles solely focused on textual data. However, in 2021, there was a shift, with no mature articles solely focused on image datasets. Instead, one article utilized a tabular dataset, and another article employed textual data. In 2022, the usage of image datasets increased again, with 15 articles incorporating this type of data, along with two articles based on tabular datasets (Fig. 2C).
Regarding model types, the analysis showed variations in their utilization as well. In 2020, deep learning (DL) models were predominantly employed, with 14 articles utilizing them, while one article used machine learning (ML) models (Fig. 2D, E). There were no mature articles focused on general artificial intelligence (AI), probabilistic models, natural language processing (NLP), or statistical models in that year. In 2021, the distribution shifted, with DL models employed in only one article and one article focusing on general AI. However, no mature articles specifically utilized ML models, probabilistic models, NLP, or statistical models. In 2022, both ML and DL models were employed, with two articles using ML models and seventeen articles utilizing DL models. Nonetheless, there were no mature articles that employed general AI, probabilistic models, NLP, or statistical models in that year (Fig. 2D,E).
4 DISCUSSION
In our study, we conducted a qualitative and quantitative evaluation of AI-based COVID-19 publications using deep learning techniques. By harnessing the power of deep learning algorithms, we aimed to evaluate the quality, relevance, and impact of research papers in the context of disease surveillance [18][22][6]. Our objective was to propose a near real-time evaluation of publications that can help guide disease management and further research in the face of emerging infectious diseases with a low level of prior knowledge but rapidly developing new knowledge.
We utilized an in-house dataset compiled from the PubMed data search focusing on the years 2020-2022, with specific keywords related to machine learning and artificial intelligence. The dataset was carefully curated, and papers were classified into medical disciplines to ensure relevance. We then employed a maturity model that utilized the BERT PubMed classifier, a deep learning model for natural language processing (NLP) tasks. The model was fine-tuned using manually labeled abstracts and validated on a test set and prospective abstracts from 2021. Our analysis yielded several important results. First, we observed fluctuations in publication output and the proportion of mature publications over the three-year period. While there was a significant increase in overall publications from 2021 to 2022, the proportion of mature publications remained relatively low. This finding emphasizes the need for rigorous evaluation and selection processes to ensure the reliability and impact of AI-based COVID-19 publications. We also analyzed the mature articles based on geographical location. China and the USA produced aconsistently higher number of mature publications. Other countries exhibit a more varied level of contribution, and the global nature of AI-based research highlights international collaboration. Furthermore, our findings on the data type and model type used in mature articles provided insight into prevailing trends within the field. We found that image data is most commonly used with variation across years. Deep learning models were predominantly employed in the year 2020, but there was a shift in the subsequent years. These findings shed light on the evolving research approaches and the adoption of different data types and model types in the field.
We also examine the fundamental reasons for these discrepancies in publication patterns in addition to analyzing trends in the AI-based COVID-19. First and foremost, data accessibility is important. AI research may benefit from improved access to extensive and varied data-sets in some countries. To process and analyze massive data-sets, this also entails the availability of money, computational infrastructure, and human resources. For instance, the United States has made substantial investments in AI research and infrastructure, including the National Artificial Intelligence Initiative (NAII) and significant funding for cutting-edge research 1. Furthermore, each nation has a different capacity for utilizing its resources. It’s possible that certain nations have developed research networks, partnerships, and financing options that support AI development. The regional and temporal heterogeneity seen in our study is influenced by these variables. wWe also recognize the influence of journal specialty areas, special issues, and the accessibility of specialized funding. These elements may encourage or prioritize particular research topics or methodologies, which may have an impact on the publication landscape. The observed variances in AI-based COVID-19 publications may be influenced by such factors. According to various studies, countries such as China, the United States, Japan, the United Kingdom, and Germany are leading the way in AI research and development. However, other countries like Canada, South Korea, France, and Germany are also making significant strides in AI technology. Developing nations like China and India are also rapidly developing their national AI programs [21]. The regional and temporal heterogeneity seen in AI development is influenced by these variables. Additionally, validation and implementation-focused research is hampered by the pandemic’s rapid evolution. The pandemic’s requirement to meet urgent needs may put a time and resource limit on thorough validation and implementation studies. As a result, the published literature can contain a greater percentage of exploratory or preliminary research findings.
However, our work has certain limitations. The exclusion criteria used for selecting papers could introduce bias and limit the generalizability of the findings. The study’s use of deep learning techniques for evaluation may also be subject to limitations, including the potential for model bias and the need for large amounts of labeled data. Finally, It is important to note that these findings are specific to this study and the definition of a mature study. They are also expected in an emergency situation like COVID-19. Our study is also limited to research papers that have been gathered through PubMed and do not include every research paper in AI during the period of study.
In conclusion, our study provides a robust evaluation framework using deep learning techniques for AI-based COVID-19 research publications. The results highlight the disease surveillance capabilities by identifying trends in the publication output, geographical distribution, and research approaches. By providing reliable and impactful scientific evidence, our findings can empower policymakers and public health officials to make informed decisions. We believe our study lays the foundation for future surveillance efforts in the face of emerging infectious diseases, facilitating the development of effective surveillance strategies and interventions.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
5 ACKNOWLEDGMENTS
We acknowledge Graham Dodge, president at PathCheck foundation and Bethany LoMonaco, Director, strategy and development for their constant support. We acknowledge the funding support by Pathcheck foundation awarded to Dr. Aditya Nagori and Mr. Raghav Awasthi. We also acknowledge BrainX for providing the well-annotated datasets.
Footnotes
raghav.awasthi{at}pathcheck.org
aditya.nagori{at}pathcheck.org
shreya.mishra693{at}gmail.com
anya2mathur{at}gmail.com
Pmathurmd{at}gmail.com
bouchra.nasri{at}umontreal.ca