<?xml version='1.0' encoding='UTF-8'?><xml><records><record><source-app name="HighWire" version="7.x">Drupal-HighWire</source-app><ref-type name="Journal Article">17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Berman, Adam N.</style></author><author><style face="normal" font="default" size="100%">Ginder, Curtis</style></author><author><style face="normal" font="default" size="100%">Sporn, Zachary A.</style></author><author><style face="normal" font="default" size="100%">Tanguturi, Varsha</style></author><author><style face="normal" font="default" size="100%">Hidrue, Michael K.</style></author><author><style face="normal" font="default" size="100%">Borden, Linnea R.</style></author><author><style face="normal" font="default" size="100%">Zhao, Yunong</style></author><author><style face="normal" font="default" size="100%">Blankstein, Ron</style></author><author><style face="normal" font="default" size="100%">Turchin, Alexander</style></author><author><style face="normal" font="default" size="100%">Wasfy, Jason H.</style></author></authors><secondary-authors></secondary-authors></contributors><titles><title><style face="normal" font="default" size="100%">Natural Language Processing for the Ascertainment and Phenotyping of Left Ventricular Hypertrophy and Hypertrophic Cardiomyopathy on Echocardiogram Reports</style></title><secondary-title><style face="normal" font="default" size="100%">medRxiv</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2023</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2023-01-01 00:00:00</style></date></pub-dates></dates><elocation-id><style  face="normal" font="default" size="100%">2023.05.17.23290116</style></elocation-id><doi><style  face="normal" font="default" size="100%">10.1101/2023.05.17.23290116</style></doi><volume><style face="normal" font="default" size="100%"></style></volume><issue><style face="normal" font="default" size="100%"></style></issue><abstract><style  face="normal" font="default" size="100%">Objective Extracting and accurately phenotyping electronic health documentation is critical for medical research and clinical care. While there are a variety of techniques to accomplish this task, natural language processing (NLP) has been developed for numerous domains to transform clinical documentation into data available for computational work. Accordingly, we sought to develop a highly accurate and open-source NLP module to ascertain and phenotype left ventricular hypertrophy (LVH) and hypertrophic cardiomyopathy (HCM) diagnoses on echocardiogram reports from a diverse hospital network.Methods 700 echocardiogram reports from six hospitals were randomly selected from data repositories within the Mass General Brigham healthcare system and manually adjudicated by physicians for 10 subtypes of LVH and diagnoses of HCM. Using an open-source NLP system, the module was developed on 300 training set reports and validated on 400 reports. The sensitivity, specificity, positive predictive value, and negative predictive value were calculated to assess the discriminative accuracy of the NLP module.Results The NLP demonstrated robust performance across the 10 LVH subtypes with overall sensitivity and specificity exceeding 96%. Additionally, the NLP module demonstrated excellent performance detecting HCM diagnoses, with sensitivity and specificity exceeding 93%.Conclusion We designed a highly accurate NLP module to determine the presence of LVH and HCM on echocardiogram reports. Our work demonstrates the feasibility of NLP to detect diagnoses on imaging reports, even when described in free-text. These modules have been placed in the public domain to advance research, trial recruitment, and population health management for individuals with LVH-associated conditions.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any external funding.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The IRB of Mass General Brigham gave approval for this work.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors. https://canary.bwh.harvard.edu/library/</style></abstract></record></records></xml>