Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

A Comparative Analysis of System Features Used in the TREC-COVID Information Retrieval Challenge

View ORCID ProfileJimmy Chen, View ORCID ProfileWilliam R. Hersh
doi: https://doi.org/10.1101/2020.10.15.20213645
Jimmy Chen
1School of Medicine, Oregon Health & Science University, Portland, OR, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jimmy Chen
  • For correspondence: chenjim@ohsu.edu
William R. Hersh
2Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for William R. Hersh
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The COVID-19 pandemic has resulted in a rapidly growing quantity of scientific publications from journal articles, preprints, and other sources. The TREC-COVID Challenge was created to evaluate information retrieval methods and systems for this quickly expanding corpus. Based on the COVID-19 Open Research Dataset (CORD-19), several dozen research teams participated in over 5 rounds of the TREC-COVID Challenge. While previous work has compared IR techniques used on other test collections, there are no studies that have analyzed the methods used by participants in the TREC-COVID Challenge. We manually reviewed team run reports from Rounds 2 and 5, extracted features from the documented methodologies, and used a univariate and multivariate regression-based analysis to identify features associated with higher retrieval performance. We observed that fine-tuning datasets with relevance judgments, MS-MARCO, and CORD-19 document vectors was associated with improved performance in Round 2 but not in Round 5. Though the relatively decreased heterogeneity of runs in Round 5 may explain the lack of significance in that round, fine-tuning has been found to improve search performance in previous challenge evaluations by improving a system’s ability to map relevant queries and phrases to documents. Furthermore, term expansion was associated with improvement in system performance, and the use of the narrative field in the TREC-COVID topics was associated with decreased system performance in both rounds. These findings emphasize the need for clear queries in search. While our study has some limitations in its generalizability and scope of techniques analyzed, we identified some IR techniques that may be useful in building search systems for COVID-19 using the TREC-COVID test collections.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

We have no funding sources to disclose.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

No IRB approval needed for this study of publicly available data from TREC-COVID.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Contact Information: Jimmy Chen, BA, School of Medicine - MD Program 3181 SW Sam Jackson Park Rd., Oregon Health & Science University, Portland, OR, 97239, Email: jimmyschen94{at}gmail.com

  • Financial Support: None.

  • Conflicts of Interest: Jimmy Chen and William Hersh have no conflicts of interest to disclose.

Data Availability

N/A

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted October 20, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Comparative Analysis of System Features Used in the TREC-COVID Information Retrieval Challenge
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Comparative Analysis of System Features Used in the TREC-COVID Information Retrieval Challenge
Jimmy Chen, William R. Hersh
medRxiv 2020.10.15.20213645; doi: https://doi.org/10.1101/2020.10.15.20213645
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
A Comparative Analysis of System Features Used in the TREC-COVID Information Retrieval Challenge
Jimmy Chen, William R. Hersh
medRxiv 2020.10.15.20213645; doi: https://doi.org/10.1101/2020.10.15.20213645

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (62)
  • Allergy and Immunology (142)
  • Anesthesia (46)
  • Cardiovascular Medicine (412)
  • Dentistry and Oral Medicine (69)
  • Dermatology (47)
  • Emergency Medicine (142)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (171)
  • Epidemiology (4843)
  • Forensic Medicine (3)
  • Gastroenterology (183)
  • Genetic and Genomic Medicine (674)
  • Geriatric Medicine (70)
  • Health Economics (192)
  • Health Informatics (626)
  • Health Policy (318)
  • Health Systems and Quality Improvement (203)
  • Hematology (85)
  • HIV/AIDS (156)
  • Infectious Diseases (except HIV/AIDS) (5326)
  • Intensive Care and Critical Care Medicine (328)
  • Medical Education (93)
  • Medical Ethics (25)
  • Nephrology (75)
  • Neurology (685)
  • Nursing (42)
  • Nutrition (114)
  • Obstetrics and Gynecology (126)
  • Occupational and Environmental Health (205)
  • Oncology (439)
  • Ophthalmology (140)
  • Orthopedics (36)
  • Otolaryngology (89)
  • Pain Medicine (35)
  • Palliative Medicine (16)
  • Pathology (129)
  • Pediatrics (194)
  • Pharmacology and Therapeutics (131)
  • Primary Care Research (84)
  • Psychiatry and Clinical Psychology (778)
  • Public and Global Health (1810)
  • Radiology and Imaging (323)
  • Rehabilitation Medicine and Physical Therapy (138)
  • Respiratory Medicine (255)
  • Rheumatology (86)
  • Sexual and Reproductive Health (69)
  • Sports Medicine (62)
  • Surgery (100)
  • Toxicology (23)
  • Transplantation (29)
  • Urology (37)