TY - JOUR T1 - Combining structured and unstructured data in eMRs to create clinically-defined eMR-derived cohorts JF - medRxiv DO - 10.1101/2020.07.27.20163279 SP - 2020.07.27.20163279 AU - Charmaine S Tam AU - Janice Gullick AU - Aldo Saavedra AU - Stephen T Vernon AU - Gemma A Figtree AU - Clara K Chow AU - Michelle Cretikos AU - Richard W Morris AU - Maged William AU - Jonathan Morris AU - David Brieger Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/07/29/2020.07.27.20163279.abstract N2 - Background There have been few studies describing how eMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined eMR-derived patient cohorts using structured and unstructured data in eMRs.Methods Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the eMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs.Results Among 802,742 encounters in a 5 year dataset (1/1/13 to 30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4-64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years.Conclusions Clinically-defined eMR-derived cohorts combining structured and unstructured data during cohort identification is prerequisite for critical validation work required for secondary use of eMR data.Competing Interest StatementThe authors have declared no competing interest.Funding StatementFunding for the SPEED-EXTRACT study was provided by grants from the Agency for Clinical Innovation, NSW Ministry of Health and Sydney Health Partners. Author CST was supported by a National Health and Medical Research Centre Early Career Fellowship from Australia (#1037275). Author AS was supported by Sydney Health Partners (Harnessing the eMR to improve care of patients with acute chest pain within Sydney Health)Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Ethics committee approval was provided by Northern Sydney Local Health District.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data reported in this paper is patient data owned by the Health Organisation and is not publically available. ER -