Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Breaching the curation bottleneck with human-machine reading symbiosis

Cliff Wong, Rajesh Rao, Taofei Yin, Cara Statz, Susan Mockus, Sara Patterson, Hoifung Poon
doi: https://doi.org/10.1101/2021.07.14.21260440
Cliff Wong
1Microsoft Research, Redmond, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rajesh Rao
1Microsoft Research, Redmond, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Taofei Yin
2The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cara Statz
2The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susan Mockus
3Precision Biomarker Laboratories, Cedars-Sinai, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sara Patterson
2The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sara.patterson{at}jax.org hoifung{at}microsoft.com
Hoifung Poon
1Microsoft Research, Redmond, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sara.patterson{at}jax.org hoifung{at}microsoft.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Purpose The explosion of molecular biomarker and treatment information in the precision medicine era drastically exacerbated difficulty in identifying patient-relevant knowledge for clinical researchers and practitioners. Curated knowledgebases, such as the JAX Clinical Knowledgebase (CKB) are tools to organize and display knowledge in a readily accessible format; however, curators face the same challenges in comprehensively identifying clinically relevant information for curation. Natural language processing (NLP) has emerged as a promising direction for accelerating manual curation, but prior applications were often conceived as stand-alone efforts to automate curation, and the scope is often limited to simple entity and relation extraction. In this paper, we study the alternative paradigm of assisted curation and identify key desiderata to scale up knowledge curation with human-computer symbiosis.

Methods We chose precision oncology for a case study and introduced self-supervised machine reading, which can automatically generate noisy training examples from unlabeled text. We developed a curation user interface (UI) for precision oncology and through iterative “curathons” (curation hackathons), conducted retrospective and prospective user studies for head-to-head comparison between manual and machine-assisted curation.

Results Contrary to the prevailing assumption, we showed that high recall is more important for end-to-end assisted curation. In extensive user studies, we showed that assisted curation can double the curation speed and increase the number of findings by an order of magnitude for previously scarcely curated drugs.

Conclusion We demonstrated that an iterative and thoughtful collaboration between professional curators and NLP researchers can facilitate rapid advances in assisted curation for precision medicine. Human-machine reading symbiosis can potentially be applicable to clinical care and research scenarios where curation is a major bottleneck.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

No external funding was received.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study does not require IRB approval.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Disclaimer: None

Data Availability

Full-text publication data used for machine reading was from the PubMed Central Open Access Dataset (PMC) https://www.ncbi.nlm.nih.gov/pmc/. The full dataset for the results can be made available upon request.

https://www.ncbi.nlm.nih.gov/pmc/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted July 16, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Breaching the curation bottleneck with human-machine reading symbiosis
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Breaching the curation bottleneck with human-machine reading symbiosis
Cliff Wong, Rajesh Rao, Taofei Yin, Cara Statz, Susan Mockus, Sara Patterson, Hoifung Poon
medRxiv 2021.07.14.21260440; doi: https://doi.org/10.1101/2021.07.14.21260440
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Breaching the curation bottleneck with human-machine reading symbiosis
Cliff Wong, Rajesh Rao, Taofei Yin, Cara Statz, Susan Mockus, Sara Patterson, Hoifung Poon
medRxiv 2021.07.14.21260440; doi: https://doi.org/10.1101/2021.07.14.21260440

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (431)
  • Allergy and Immunology (757)
  • Anesthesia (221)
  • Cardiovascular Medicine (3298)
  • Dentistry and Oral Medicine (365)
  • Dermatology (280)
  • Emergency Medicine (479)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1173)
  • Epidemiology (13384)
  • Forensic Medicine (19)
  • Gastroenterology (899)
  • Genetic and Genomic Medicine (5158)
  • Geriatric Medicine (482)
  • Health Economics (783)
  • Health Informatics (3276)
  • Health Policy (1143)
  • Health Systems and Quality Improvement (1193)
  • Hematology (432)
  • HIV/AIDS (1019)
  • Infectious Diseases (except HIV/AIDS) (14637)
  • Intensive Care and Critical Care Medicine (913)
  • Medical Education (478)
  • Medical Ethics (127)
  • Nephrology (525)
  • Neurology (4930)
  • Nursing (262)
  • Nutrition (730)
  • Obstetrics and Gynecology (886)
  • Occupational and Environmental Health (795)
  • Oncology (2524)
  • Ophthalmology (728)
  • Orthopedics (282)
  • Otolaryngology (347)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (544)
  • Pediatrics (1302)
  • Pharmacology and Therapeutics (551)
  • Primary Care Research (557)
  • Psychiatry and Clinical Psychology (4218)
  • Public and Global Health (7512)
  • Radiology and Imaging (1708)
  • Rehabilitation Medicine and Physical Therapy (1016)
  • Respiratory Medicine (980)
  • Rheumatology (480)
  • Sexual and Reproductive Health (498)
  • Sports Medicine (424)
  • Surgery (549)
  • Toxicology (72)
  • Transplantation (236)
  • Urology (205)