Elsevier

Journal of Biomedical Informatics

Volume 52, December 2014, Pages 319-328
Journal of Biomedical Informatics

Comparing different knowledge sources for the automatic summarization of biomedical literature

https://doi.org/10.1016/j.jbi.2014.07.014Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Automatic summarization may help biomedical researchers manage information overload.

  • Biomedical summarization systems make use of domain knowledge from external sources.

  • We show that the selection and representation of this knowledge has a significant impact on performance.

Abstract

Objective:

Automatic summarization of biomedical literature usually relies on domain knowledge from external sources to build rich semantic representations of the documents to be summarized. In this paper, we investigate the impact of the knowledge source used on the quality of the summaries that are generated.

Materials and methods:

We present a method for representing a set of documents relevant to a given biological entity or topic as a semantic graph of domain concepts and relations. Different graphs are created by using different combinations of ontologies and vocabularies within the UMLS (including GO, SNOMED-CT, HUGO and all available vocabularies in the UMLS) to retrieve domain concepts, and different types of relationships (co-occurrence and semantic relations from the UMLS Metathesaurus and Semantic Network) are used to link the concepts in the graph. The different graphs are next used as input to a summarization system that produces summaries composed of the most relevant sentences from the original documents.

Results and conclusions:

Our experiments demonstrate that the choice of the knowledge source used to model the text has a significant impact on the quality of the automatic summaries. In particular, we find that, when summarizing gene-related literature, using GO, SNOMED-CT and HUGO to extract domain concepts results in significantly better summaries than using all available vocabularies in the UMLS. This finding suggests that successful biomedical summarization requires the selection of the appropriate knowledge source, whose coverage, specificity and relations must be in accordance to the type of the documents to summarize.

Keywords

Biomedical knowledge sources
Unified Medical Language System
Semantic graph
Automatic summarization

Cited by (0)