Skip to main content

Advertisement

Log in

A corpus for research in text processing for evidence based medicine

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Evidence based medicine (EBM) urges the medical doctor to incorporate the latest available clinical evidence at point of care. A major stumbling block in the practice of EBM is the difficulty to keep up to date with the clinical advances. In this paper we describe a corpus designed for the development and testing of text processing tools for EBM, in particular for tasks related to the extraction and summarisation of answers and corresponding evidence related to a clinical query. The corpus is based on material from the Clinical Inquiries section of The Journal of Family Practice. It was gathered and annotated by a combination of automated information extraction, crowdsourcing tasks, and manual annotation. It has been used for the original summarisation task for which it was designed, as well as for other related tasks such as the appraisal of clinical evidence and the clustering of the results. The corpus is available at SourceForge (http://sourceforge.net/projects/ebmsumcorpus/).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.ncbi.nlm.nih.gov/pubmed/.

  2. http://www.thecochranelibrary.com.

  3. http://www.tripdatabase.com/.

  4. http://sourceforge.net/projects/ebmsumcorpus/.

  5. http://www.jfponline.com/.

  6. http://www.cochrane.org/.

  7. http://www.parkhurstexchange.com/searchQA. Accessed on 7th January, 2015.

  8. http://rgai.inf.u-szeged.hu/bioscope. Accessed on 7th January, 2015.

  9. http://bioasq.org/.

  10. http://www.nist.gov/tac/2014/BiomedSumm/index.html.

  11. As stated in the task presentation at TAC 2014.

  12. http://www.cochrane.org/cochrane-reviews. Accessed on 26th May, 2014.

  13. http://cochraneclinicalanswers.com/.

  14. The annotation tool was designed by a research programmer. Details are provided by Mollá and Santiago-Martínez (2011).

  15. https://www.mturk.com/mturk/. Accessed on 26th May, 2014.

  16. Note that \(3.06\times 2.17\times 1.11=8.1\), which is greater than 6.57. This indicates that in some cases the same reference was used in multiple detailed justifications.

  17. SORT only has the grades A, B, and C, but some authors used the grade D to specify very poor evidence.

  18. http://sourceforge.net/projects/ebmsumcorpus/.

References

  • Afantenos, S., Karkaletsis, V., & Stamatopoulos, P. (2005). Summarization from medical documents: A survey. Artificial Intelligence in Medicine, 33(2), 157–177. doi:10.1016/j.artmed.2004.07.017.

    Article  Google Scholar 

  • Athenikos, S. J., & Han, H. (2010). Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine, 99(1), 1–24. doi:10.1016/j.cmpb.2009.10.003.

    Article  Google Scholar 

  • Demner-Fushman, D., & Lin, J. J. (2007). Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1), 63–103.

    Article  Google Scholar 

  • Ebell, M. H., Siwek, J., Weiss, B. D., Woolf, S. H., Susman, J., Ewigman, B., et al. (2004). Strength of recommendation taxonomy (SORT): A patient-centered approach to grading evidence in the medical literature. The Journal of the American Board of Family Practice/American Board of Family Practice, 17(1), 59–67.

    Article  Google Scholar 

  • Ekbal, A., Saha, S., Mollá, D., & Ravikumar, K. (2013). Multiobjective optimization for clustering of medical publications. In Proceedings ALTA.

  • Elhadad, N., Kan, M. Y., Klavans, J. L., & McKeown, K. R. (2005). Customization in a unified framework for summarizing medical literature. Artificial Intelligence in Medicine, 33(2), 179–198. doi:10.1016/j.artmed.2004.07.018.

    Article  Google Scholar 

  • Fiszman, M., Rindflesch, T.C., & Kilicoglu, H. (2004). Abstraction summarization for managing the biomedical research literature. In Proceedings of HLT-NAACL workshop on computational lexical semantics (pp. 76–83).

  • Kim, S. N., Martinez, D., Cavedon, L., & Yencken, L. (2011). Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics, 12(Suppl 2), S5. doi:10.1186/1471-2105-12-S2-S5.

    Article  Google Scholar 

  • Mollá, D. (2010). A corpus for evidence based medicine summarisation. In Proceedings of the Australasian language technology workshop (Vol. 8, pp. 76–80).

  • Mollá, D., & Santiago-Martínez, M. E. (2011). Development of a corpus for evidence based medicine summarisation. In Proceedings of the Australasian language technology workshop.

  • Mollá, D., & Santiago-Martínez, M. E. (2012). Creation of a corpus for evidence based medicine summarisation. Australasian Medical Journal, 5(9), 503–506.

    Article  Google Scholar 

  • Mollá, D., & Sarker, A. (2011). Automatic grading of evidence: The 2011 ALTA Shared Task. In Proceedings of the 2011 Australasian language technology workshop (ALTA 2011). doi:10.13140/2.1.1706.1444.

  • Mounsey, A. L., & Henry, S. L. (2009). Clinical inquiries. Which treatments work best for hemorrhoids? The Journal of Family Practice, 58(9), 492–493.

    Google Scholar 

  • Niu, Y., Hirst, G., McArthur, G., & Rodriguez-Gianolli, P. (2003). Answering clinical questions with role identification. In Proceedings of the ACL, workshop on natural language processing in biomedicine. http://citeseer.ist.psu.edu/581532.html.

  • Sackett, D. L., Rosenberg, W. M., Gray, J., Haynes, R. B., & Richardson, W. S. (1996). Evidence based medicine: What it is and what it isn’t. BMJ, 312(7023), 71–72.

    Article  Google Scholar 

  • Sarker, A., & Molla, D. (2012). Towards two-step multi-document summarisation for evidence based medicine: A quantitative analysis. In Proceedings of the 2012 Australasian language technology workshop (ALTA 2012).

  • Sarker, A., Molla, D., & Paris, C. (2013). An approach for query-focused text summarisation for evidence based medicine. In Proceedings of the 14th Conference on artificial intelligence in medicine (AIME 2013) (pp. 295–304), Springer, Murcia, Spain. http://link.springer.com/chapter/10.1007%2F978-3-642-38326-7_41.

  • Shash, S. F., & Mollá, D. (2013). Clustering of medical publications for evidence based medicine summarisation. In Artificial intelligence in medicine (pp. 305–309). doi:10.1007/978-3-642-38326-7_42.

  • Yu, H., & Cao, Y. G. (2008). Automatically extracting information needs from ad hoc clinical questions. In AMIA annual symposium proceedings, American medical informatics association (Vol. 2008, p. 96).

Download references

Acknowledgments

Parts of this research were funded by the Oak Ridge Institute for Science and Education (ORISE), Macquarie University, and the Commonwealth Scientific and Industrial Research Organisation (CSIRO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Mollá.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mollá, D., Santiago-Martínez, M.E., Sarker, A. et al. A corpus for research in text processing for evidence based medicine. Lang Resources & Evaluation 50, 705–727 (2016). https://doi.org/10.1007/s10579-015-9327-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-015-9327-2

Keywords

Navigation