Abstract
We built a natural language processing (NLP) method to automatically extract clinical findings in radiology reports and characterize their level of change and significance according to a radiology-specific information model. We utilized a combination of machine learning and rule-based approaches for this purpose. Our method is unique in capturing different features and levels of abstractions at surface, entity, and discourse levels in text analysis. This combination has enabled us to recognize the underlying semantics of radiology report narratives for this task. We evaluated our method on radiology reports from four major healthcare organizations. Our evaluation showed the efficacy of our method in highlighting important changes (accuracy 99.2%, precision 96.3%, recall 93.5%, and F1 score 94.7%) and identifying significant observations (accuracy 75.8%, precision 75.2%, recall 75.7%, and F1 score 75.3%) to characterize radiology reports. This method can help clinicians quickly understand the key observations in radiology reports and facilitate clinical decision support, review prioritization, and disease surveillance.
Similar content being viewed by others
References
Smith R. Strategies for coping with information overload. Bmj. 2010;341:c7126.
Davidoff F, Miglus J. Delivering clinical evidence where it’s needed: building an information system worthy of the profession. JAMA. 2011;305(18):1906–7.
Luhn HP. The automatic creation of literature abstracts. IBM Journal of research and development. 1958;2(2):159–65.
Baxendale PB. Machine-made index for technical literature: an experiment. IBM Journal of Research and Development. 1958;2(4):354–61.
Das D, Martins AF. A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU. 2007;4:192–5.
Mitkov R.(2005) The Oxford handbook of computational linguistics. Chapter 32, Oxford University Press; Jan 13
Gupta V, Lehal GS. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence. 2010;2(3):258–68.
Elfayoumy S, Thoppil J. A survey of unstructured text summarization techniques. The International Journal of Advanced Computer Science and Applications. 2014;5(7):149–54.
Lloret E.(2008) Text summarization: an overview. Paper supported by the Spanish Government under the project TEXT-MESS (TIN2006-15265-C06-01).
Afantenos S, Karkaletsis V, Stamatopoulos P. Summarization from medical documents: a survey. Artificial intelligence in medicine. 2005;33(2):157–77.
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, Del Fiol G. Text summarization in the biomedical domain: a systematic review of recent research. Journal of biomedical informatics. 2014;52:457–67.
Pivovarov R, Elhadad N. Automated methods for the summarization of electronic health records. Journal of the American Medical Informatics Association. 2015;22(5):938–47.
Sarkar K. Using domain knowledge for text summarization in medical domain. International Journal of Recent Trends in Engineering. 2009;1(1):200–5.
Reeve L, Han H, Brooks AD (2006). BioChain: lexical chaining methods for biomedical text summarization. In Proceedings of the 2006 ACM Symposium on Applied Computing Apr 23 (pp. 180–184). ACM
Chuang WT, Yang J (2000). Extracting sentence segments for text summarization: a machine learning approach. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Jul 1 (pp. 152–159). ACM
Fiszman M, Rindflesch TC, Kilicoglu H (2004). Abstraction summarization for managing the biomedical research literature. In Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics May 6 (pp. 76–83). Association for Computational Linguistics.
Elhadad N, Kan MY, Klavans JL, McKeown KR. Customization in a unified framework for summarizing medical literature. Artificial intelligence in medicine. 2005 33(2):179–98.
McKeown KR, Elhadad N, Hatzivassiloglou V (2003). Leveraging a common representation for personalized search and summarization in a medical digital library. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries May 27 (pp. 159–170). IEEE Computer Society
Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association. 1994;1(2):161–74.
Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ. Automatic detection of acute bacterial pneumonia from chest X-ray reports. Journal of the American Medical Informatics Association. 2000;7(6):593–604.
Mendonça EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. Journal of biomedical informatics. 2005;38(4):314–21.
Mani I, Maybury MT. Advances in automatic text summarization. MIT press; 1999.
Zafar HM, Chadalavada SC, Kahn Jr CE, et al. Code abdomen: an assessment coding scheme for abdominal imaging findings possibly representing cancer. Journal of the American College of Radiology: JACR. 2015;12(9):947.
Carletta J. Assessing agreement on classification tasks: the kappa statistic. Computational linguistics. 1996;22(2):249–54.
Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artificial intelligence in medicine. 2016;66(1):29–39.
Langlotz CP. RadLex: a new method for indexing online educational materials. Radiographics. 2006;26(6):1595–7.
Lafferty J, McCallum A, Pereira FC (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco. (pp. 282–289). Morgan Kaufmann Publishers Inc.
Sutton C, McCallum A. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning. 2006:93-128.
Klein D, Manning CD (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1 Jul 7 (pp. 423–430). Association for Computational Linguistics.
Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–97.
Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2011;2(3):27.
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics. 2001;34(5):301–10.
Powers DM (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. (1): 37–63
R Core Team (2013). R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria
Deng L, Yu D. Deep learning: methods and applications. Foundations and Trends in Signal Processing. 2014;7(3–4):197–387.
Pezzullo JA, Tung GA, Rogg JM, Davis LM, Brody JM, Mayo-Smith WW. Voice recognition dictation: radiologist as transcriptionist. Journal of digital imaging. 2008;21(4):384–389.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems 2013 (pp. 3111-3119).
Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In EMNLP 2014 (Vol. 14, pp. 1532-1543).
Church KW, Hanks P. Word association norms, mutual information, and lexicography. Computational linguistics. 1990;16(1):22–9.
Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge: MIT press; 1999 (pp. 543).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hassanpour, S., Bay, G. & Langlotz, C.P. Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing. J Digit Imaging 30, 314–322 (2017). https://doi.org/10.1007/s10278-016-9931-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-016-9931-8