TY - JOUR T1 - medExtractR: A medication extraction algorithm for electronic health records using the R programming language JF - medRxiv DO - 10.1101/19007286 SP - 19007286 AU - Hannah L. Weeks AU - Cole Beck AU - Elizabeth McNeer AU - Cosmin A. Bejan AU - Joshua C. Denny AU - Leena Choi Y1 - 2019/01/01 UR - http://medrxiv.org/content/early/2019/09/23/19007286.abstract N2 - Objective We developed medExtractR, a natural language processing system to extract medication dose and timing information from clinical notes. Our system facilitates creation of medication-specific research datasets from electronic health records.Materials and Methods Written using the R programming language, medExtractR combines lexicon dictionaries and regular expression patterns to identify relevant medication information (‘drug entities’). The system is designed to extract particular medications of interest, rather than all possible medications mentioned in a clinical note. MedExtractR was developed on notes from Vanderbilt University’s Synthetic Derivative, using two medications (tacrolimus and lamotrigine) prescribed with varying complexity, and with a third drug (allopurinol) used for testing generalizability of results. We evaluated medExtractR and compared it to three existing systems: MedEx, MedXN, and CLAMP.Results On 50 test notes for each development drug and 110 test notes for the additional drug, medExtractR achieved high overall performance (F-measures > 0.95). This exceeded the performance of the three existing systems across all drugs, with the exception of a couple specific entity-level evaluations including dose amount for lamotrigine and allopurinol.Discussion MedExtractR successfully extracted medication entities for medications of interest. High performance in entity-level extraction tasks provides a strong foundation for developing robust research datasets for pharmacological research. However, its targeted approach provides a narrower scope compared with existing systems.Conclusion MedExtractR (available as an R package) achieved high performance values in extracting specific medications from clinical text, leading to higher quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Institutes of Health/National Institute of General Medical Sciences (R01-GM124109)Author DeclarationsAll relevant ethical guidelines have been followed and any necessary IRB and/or ethics committee approvals have been obtained.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.Not ApplicableAny clinical trials involved have been registered with an ICMJE-approved registry such as ClinicalTrials.gov and the trial ID is included in the manuscript.Not ApplicableI have followed all appropriate research reporting guidelines and uploaded the relevant Equator, ICMJE or other checklist(s) as supplementary files, if applicable.Not ApplicableRaw data may contain protected health information and are not available to be widely shared. The authors can be contacted for questions or aggregate versions of the data. ER -