PT - JOURNAL ARTICLE AU - Matthew Zawistowski AU - Lars G. Fritsche AU - Anita Pandit AU - Brett Vanderwerff AU - Snehal Patil AU - Ellen M. Schmidt AU - Peter VandeHaar AU - Chad M. Brummett AU - Sachin Keterpal AU - Xiang Zhou AU - Michael Boehnke AU - Gonçalo R. Abecasis AU - Sebastian Zöllner TI - The Michigan Genomics Initiative: a biobank linking genotypes and electronic clinical records in Michigan Medicine patients AID - 10.1101/2021.12.15.21267864 DP - 2021 Jan 01 TA - medRxiv PG - 2021.12.15.21267864 4099 - http://medrxiv.org/content/early/2021/12/16/2021.12.15.21267864.short 4100 - http://medrxiv.org/content/early/2021/12/16/2021.12.15.21267864.full AB - The recent wave of biobank repositories linking individual-level genetic data with dense clinical health history has introduced a dramatic paradigm shift in phenotyping for human genetic studies. The mechanism by which biobanks recruit participants can vary dramatically according to factors such as geographic catchment and sampling strategy. These enrollment differences leave an imprint on the cohort, defining the demographics and the utility of the biobank for research purposes. Here we introduce the Michigan Genomics Initiative (MGI), a rolling enrollment, single health system biobank currently consisting of >85,000 participants recruited primarily through surgical encounters at Michigan Medicine. A strong ascertainment effect is introduced by focusing recruitment on individuals in Southeast Michigan undergoing surgery. MGI participants are, on average, less healthy than the general population, which produces a biobank enriched for case counts of many disease outcomes, making it well suited for a disease genetics cohort. A comparison to the much larger UK Biobank, which uses population representative sampling, reveals that MGI has higher prevalence for nearly all diagnosis- code-based phenotypes, and larger absolute numbers of cases for many phenotypes. GWAS of these phenotypes replicate many known findings, validating the genetic and clinical data and their proper linkage. Our results illustrate that single health-system biobanks that recruit participants through opportunistic sampling, such as surgical encounters, produce distinct patient profiles that provide an ideal resource for exploring the genetics of complex diseases.Competing Interest StatementGoncalo Abecasis is currently employed by Regeneron Pharmaceuticals. Ellen M Schmidt is currently employed by Serqet Therapeutics. Both contributed to this work while employed at the University of Michigan.Funding StatementThis study was funded by the Precision Health Initiative of the University of Michigan. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:MGI study participant consent forms and protocols were reviewed and approved by the University of Michigan Medical School Institutional Review Board (IRB IDs HUM00071298, HUM00148297, HUM00099197, HUM00097962, and HUM00106315).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesIndividual level genetic and clinical data are not available due to patient privacy. However summary statistics from Genome Wide Association Studies of 1,547 clinical traits are publicly available through an interactive web tool described in the Resources section.