PT - JOURNAL ARTICLE AU - Fritz Obermeyer AU - Martin Jankowiak AU - Nikolaos Barkas AU - Stephen F. Schaffner AU - Jesse D. Pyle AU - Lonya Yurkovetskiy AU - Matteo Bosso AU - Daniel J. Park AU - Mehrtash Babadi AU - Bronwyn L. MacInnis AU - Jeremy Luban AU - Pardis C. Sabeti AU - Jacob E. Lemieux TI - Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness AID - 10.1101/2021.09.07.21263228 DP - 2022 Jan 01 TA - medRxiv PG - 2021.09.07.21263228 4099 - http://medrxiv.org/content/early/2022/02/16/2021.09.07.21263228.short 4100 - http://medrxiv.org/content/early/2022/02/16/2021.09.07.21263228.full AB - Repeated emergence of SARS-CoV-2 variants with increased fitness necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization.One Sentence summary A Bayesian hierarchical model of all SARS-CoV-2 viral genomes predicts lineage fitness and identifies associated mutations.Competing Interest StatementThe authors have declared no competing interest.Clinical TrialStudy is based on SARS-CoV-2 genetic sequences publicly available at GISAID.org.Clinical Protocols https://github.com/broadinstitute/pyro-cov Funding StatementThis work was sponsored by the U.S. Centers for Disease Control and Prevention (BAA), as well as support from the Doris Duke Charitable Foundation (J.E.L.), the Howard Hughes Medical Institute (P.C.S.), and the Evergrande COVID-19 Response Fund Award from the Massachusetts Consortium on Pathogen Readiness (J.L.).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study was conducted using data from a public database (GISAID). No IRB approval is necessary.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data was gathered from other public resources. Data preprocessing scripts are open source. https://gisaid.org https://github.com/CSSEGISandData/COVID-19 https://cov-lineages.org/