PT - JOURNAL ARTICLE AU - Wu, Honghan AU - Wang, Minhong AU - Zeng, Qianyi AU - Chen, Wenjun AU - Pan, Jeff Z. AU - Sudlow, Cathie AU - Robertson, Dave TI - Knowledge Driven Phenotyping AID - 10.1101/19013748 DP - 2019 Jan 01 TA - medRxiv PG - 19013748 4099 - http://medrxiv.org/content/early/2019/12/06/19013748.short 4100 - http://medrxiv.org/content/early/2019/12/06/19013748.full AB - Extracting patient phenotypes from routinely collected health data (such as Electronic Health Records) requires translating clinically-sound phenotype definitions into queries/computations executable on the underlying data sources by clinical researchers. This requires significant knowledge and skills to deal with heterogeneous and often imperfect data. Translations are time-consuming, error-prone and, most importantly, hard to share and reproduce across different settings. This paper proposes a knowledge driven framework that (1) decouples the specification of phenotype semantics from underlying data sources; (2) can automatically populate and conduct phenotype computations on heterogeneous data spaces. We report preliminary results of deploying this framework on five Scottish health datasets.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis project is funded by MRC/ISCF HDRUK(Grand no: MC_PC_18029) and MRC/HDRUK (Grand no: MR/S004149/1).Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe research is conducted on national and regional health data in Scotland. Given the privacy/sensitivity of the data, we cannot share the actual health data. However, synthetic data can be generated using BadMedicine (https://github.com/HicServices/ BadMedicine). It uses the actual data schema and represents the distributions learned from the real data.