PT - JOURNAL ARTICLE AU - Lebovitch, Dannielle S. AU - Johnson, Jessica S. AU - Dueñas, Hillary R. AU - Huckins, Laura M. TI - Phenotype Risk Scores: moving beyond ‘cases’ and ‘controls’ to classify psychiatric disease in hospital-based biobanks AID - 10.1101/2021.01.25.21249615 DP - 2021 Jan 01 TA - medRxiv PG - 2021.01.25.21249615 4099 - http://medrxiv.org/content/early/2021/01/26/2021.01.25.21249615.short 4100 - http://medrxiv.org/content/early/2021/01/26/2021.01.25.21249615.full AB - Current phenotype classifiers for large biobanks with coupled electronic health records EHR and multi-omic data rely on ICD-10 codes for definition. However, ICD-10 codes are primarily designed for billing purposes, and may be insufficient for research. Nuanced phenotypes composed of a patients’ experience in the EHR will allow us to create precision psychiatry to predict disease risk, severity, and trajectories in EHR and clinical populations. Here, we create a phenotype risk score (PheRS) for major depressive disorder (MDD) using 2,086 cases and 31,000 individuals from Mount Sinai’s biobank BioMe ™. Rather than classifying individuals as ‘cases’ and ‘controls’, PheRS provide a whole-phenome estimate of each individual’s likelihood of having a given complex trait. These quantitative scores substantially increase power in EHR analyses and may identify individuals with likely ‘missing’ diagnoses (for example, those with large numbers of comorbid diagnoses and risk factors, but who lack explicit MDD diagnoses).Our approach applied ten-fold cross validation and elastic net regression to select comorbid ICD-10 codes for inclusion in our PheRS. We identified 158 ICD-10 codes significantly associated with Moderate MDD (F33.1). Phenotype Risk Score were significantly higher among individuals with ICD-10 MDD diagnoses compared to the rest of the population (Kolgorov-Smirnov p<2.2e-16), and were significantly correlated with MDD polygenic risk scores (R2>0.182). Accurate classifiers are imperative for identification of genetic associations with psychiatric disease; therefore, moving forward research should focus on algorithms that can better encompass a patient’s phenome.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by a Faculty Scholar Award from the Seaver Foundation: "Analytical Genomics of Vulnerable Populations".Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Founded in September 2007, BioMe is a biobank that links genetic and electronic medical record (EMR) data for over 30,000 individuals recruited primarily in ambulatory care settings in the Mount Sinai Health System (MSHS) in New York City. The current study was approved by the Icahn School of Medicine at Mount Sinai's Institutional Review Board (Institutional Review Board 07-0529). All study participants provided written informed consent.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSummary data is made available in the supplementary tables of this article.