@article {Friberg2021.05.08.21256421, author = {Julia E. Friberg and Abdul H. Qazi and Brenden Boyle and Carrie Franciscus and Mary Vaughan-Sarrazin and Dax Westerman and Olga V. Patterson and Sharidan K. Parr and Michael E. Matheny and Shipra Arya and Kim G. Smolderen and Brian C. Lund and Glenn T. Gobbel and Saket Girotra}, title = {Ankle and Toe Brachial Index Extraction from Clinical Reports For Peripheral Artery Disease Identification: Unlocking Clinical Data through Novel Methods}, elocation-id = {2021.05.08.21256421}, year = {2021}, doi = {10.1101/2021.05.08.21256421}, publisher = {Cold Spring Harbor Laboratory Press}, abstract = {Importance Despite its high prevalence and poor outcomes, research on peripheral artery disease (PAD) remains limited due to the poor accuracy of billing codes for identifying PAD in health systems.Objective Design a natural language processing (NLP) system that can extract ankle brachial index (ABI) and toe brachial index (TBI) values and evaluate the performance of extracted ABI/TBI values to identify patients with PAD in the Veterans Health Administration (VHA).Design, Setting, Participants From a corpus of 392,244 ABI test reports at 94 VHA facilities during 2015-2017, we selected a random sample of 800 documents for NLP development. Using machine learning, we designed the NLP system to extract ABI and TBI values and laterality (right or left).Performance was optimized through sequential iterations of 10-fold cross validation and error analysis on 3 sets of 200 documents each, and tested on a final, independent set of 200 documents.Performance of NLP-extracted ABI and TBI values to identify PAD in a random sample of Veterans undergoing ABI testing was compared to structured chart review.Exposure ABI <=0.9, or TBI <=0.7 in either right or left limb was used to define PAD at the patient-levelMain Outcome Precision (or positive predictive value), recall (or sensitivity), F-1 measure (overall measure of accuracy, defined as harmonic mean of precision and recall)Results The NLP system had an overall precision of 0.85, recall of 0.93 and F1-measure of 0.89 to correctly identify ABI/TBI values and laterality. The F-1 measure was similar for both ABI and TBI (0.88 to 0.91). Recall was higher for ABI (0.95 to 0.97) while precision was higher for TBI (0.94 to 0.95). Among 261 patients with ABI testing (49\% with PAD), the NLP system achieved a positive predictive value of 92.3\%, sensitivity of 83.1\% and specificity of 93.1\% to identify PAD when compared to a structured chart review.Conclusion We have successfully developed and validated an NLP system to extract ABI and TBI values which can be used to accurately identify PAD within the VHA. Our findings have broad implications for PAD research and quality improvement efforts in large health systems.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was funded by the Veterans Affairs Health Services Research \& Development Pilot Grant (I21HX002365; PI: Girotra). The funding organization had no role in: 1) the design and conduct of the study; 2) collection, management, analysis, and interpretation of the data; 3) preparation, review, or approval of the manuscript; or 4) decision to submit the manuscript for publication. The views expressed here are those of the authors and do not represent the Department of Veterans Affairs. Dr. Girotra and Dr. Gobbel had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This work was approved by the Institutional Review Board and Research \& Development Committee at both the Iowa City and Tennessee Valley VA.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesDue to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available}, URL = {https://www.medrxiv.org/content/early/2021/05/10/2021.05.08.21256421}, eprint = {https://www.medrxiv.org/content/early/2021/05/10/2021.05.08.21256421.full.pdf}, journal = {medRxiv} }