TY - JOUR T1 - Making Words Count with Computerised Identification of Hypertrophic Cardiomyopathy Patients JF - medRxiv DO - 10.1101/2021.04.13.21255353 SP - 2021.04.13.21255353 AU - Luke T Slater AU - William Bradlow AU - Trupti Desai AU - Amir Aziz AU - Felicity Evison AU - Simon Ball AU - Georgios V. Gkoutos Y1 - 2021/01/01 UR - http://medrxiv.org/content/early/2021/04/15/2021.04.13.21255353.abstract N2 - Background The traditional outpatient model in hypertrophic cardiomyopathy (HCM) is under pressure. Population health management based on an accurate patient record provides an efficient, cost-effective alternative.Methods To improve the accuracy of the HCM patient list in a single hospital, we developed a rule-based information extraction natural language processing (NLP) framework. The framework employed ontological expansion of vocabulary and exclusion-first annotation, and received training by an ‘expert in the loop’. The output stratified patients with atrial fibrillation (AF) and heart failure (HF), those without active cardiology care and likely screened individuals.Results The algorithm was validated against multiple data sources, including manual validation, for HCM, AF and HF and family history of the disease. Overall precision and recall were 0.854 and 0.865 respectively. The pipeline found 25,356 documents featuring HCM-related terms belonging to 11,083 patients. Excluding scanned documents resulted in 17,178 letters from 3,120 patients. Subsequent categorisation identified 1,753 real cases, of whom 357 had AF and 205 had HF. There were 696 likely screened individuals. Adjusting for 304 false-negative patients, the total HCM cohort was 2,045 patients. 214 were not under a cardiologist. NLP uncovered 709 patients who were absent in the registry or hospital disease codes.Conclusion This novel NLP framework generated a hospital-wide record of patients with HCM and defined various cohorts, including the small set of HCM patients lacking current cardiology input. Existing data sources inadequately described this population, spotlighting NLP’s essential role for clinical teams planning to move to a population health management model of care.Competing Interest StatementThe authors have declared no competing interest.Funding StatementGVG and LTS acknowledge support from the NIHR Birmingham ECMC, NIHR Birmingham SRMRC, Nanocommons H2020-EU (731032) and the NIHR Birmingham Biomedical Research Centre and the MRC HDR UK (HDRUK/CFC/01), an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, the Medical Research Council or the Department of Health.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The University of Birmingham Ethical Review granted approval (ERN_20-0338). No informed consent was required as this was a service improvement project, and the documents were not de-identified as we intend to follow up individuals lost to follow up.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSoftware used to run the analysis is available as open source software. Vocabulary is included as appendix. Data is not available, as it describes patients. https://github.com/reality/komenti ER -