Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Large-scale population analysis of SARS-CoV-2 whole genome sequences reveals host-mediated viral evolution with emergence of mutations in the viral Spike protein associated with elevated mortality rates

Carlos Farkas, Andy Mella, Jody J. Haigh
doi: https://doi.org/10.1101/2020.10.23.20218511
Carlos Farkas
1Research Institute in Oncology and Hematology (RIOH), CancerCare Manitoba, Winnipeg, MB, Canada
2Department of Pharmacology and Therapeutics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Carlos.FarkasPool@umanitoba.ca jody.haigh@umanitoba.ca
Andy Mella
3Departamento de Física, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Blanco Encalada 2008, Santiago, Casilla 487-3, Chile
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jody J. Haigh
1Research Institute in Oncology and Hematology (RIOH), CancerCare Manitoba, Winnipeg, MB, Canada
2Department of Pharmacology and Therapeutics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Carlos.FarkasPool@umanitoba.ca jody.haigh@umanitoba.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background We aimed to further characterize and analyze in depth intra-host variation and founder variants of SARS-CoV-2 worldwide up until August 2020, by examining in excess of 94,000 SARS-CoV-2 viral sequences in order to understand SARS-CoV-2 variant evolution, how these variants arose and identify any increased mortality associated with these variants.

Methods and Findings We combined worldwide sequencing data from GISAID and Sequence Read Archive (SRA) repositories and discovered SARS-CoV-2 hypermutation occurring in less than 2% of COVID19 patients, likely caused by host mechanisms involved APOBEC3G complexes and intra-host microdiversity. Most of this intra-host variation occurring in SARS-CoV-2 are predicted to change viral proteins with defined variant signatures, demonstrating that SARS-CoV-2 can be actively shaped by the host immune system to varying degrees. At the global population level, several SARS-CoV-2 proteins such as Nsp2, 3C-like proteinase, ORF3a and ORF8 are under active evolution, as evidenced by their increased πN/πS ratios per geographical region. Importantly, two emergent variants: V1176F in co-occurrence with D614G mutation in the viral Spike protein, and S477N, located in the Receptor Binding Domain (RBD) of the Spike protein, are associated with high fatality rates and are increasingly spreading throughout the world. The S477N variant arose quickly in Australia and experimental data support that this variant increases Spike protein fitness and its binding to ACE2.

Conclusions SARS-CoV-2 is evolving non-randomly, and human hosts shape emergent variants with positive fitness that can easily spread into the population. We propose that V1776F and S477N variants occurring in the Spike protein are two novel mutations occurring in SARS-CoV-2 and may pose significant public health concerns in the future.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Powered@NLHPC: This research was partially supported by the supercomputing infrastructure of the NLHPC (ECM-02). This research was partially funded by research funding from the CIHR, Research Manitoba and the CancerCare MB Research Foundation.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

not apply

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Data and Code Availability 76,553 FASTA genomes and associated sequencing metadata were downloaded from GISAID database from January 1, 2019 until August 3, 2020, specifying human as source host (https://www.gisaid.org/). The associated sequencing metadata including major variants per sample are available at Supplementary Table 1. Aggregated variants in VCF format for the latter genomes and associated consequence predictions are available here: https://usegalaxy.org/u/carlosfarkas/h/sars-cov-2-variants-gisaid-august-03-2020. 974 Brazilian FASTA sequences were downloaded from GISAID database from January 1, 2019 until September 25, 2020, specifying human as source host and South America / Brazil as location. These FASTA sequences and associated aggregated variants are available here: https://usegalaxy.org/u/carlosfarkas/h/brazil-genome-sequences-from-gisaid-sept25-2020. FASTA sequences from GISAID genomes containing associated metadata until September 28, 2020, including the results from snpFreq program (containing Deceased-Released SNP associations) are available here: https://usegalaxy.org/u/carlosfarkas/h/gisaid-patient-metadata-sept28-2020. Acknowledgements to all laboratories/consortia involved in the generation of GISAID genomes used in this study are listed in Supplementary Table 2. 17,560 sequencing datasets were downloaded from Sequence Read Archive Repository (SRA, https://www.ncbi.nlm.nih.gov/sars-cov-2/) From December 1, 2019 until July 28, 2020. Associated sequencing run accessions, sequencing metadata and related BioProjects are listed in Supplementary Table 3. The code generated during this study to replicate most of the computational calculations performed in this manuscript is available at the following github repository: https://github.com/cfarkas/SARS-CoV-2-freebayes.

https://github.com/cfarkas/SARS-CoV-2-freebayes

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted October 27, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Large-scale population analysis of SARS-CoV-2 whole genome sequences reveals host-mediated viral evolution with emergence of mutations in the viral Spike protein associated with elevated mortality rates
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Large-scale population analysis of SARS-CoV-2 whole genome sequences reveals host-mediated viral evolution with emergence of mutations in the viral Spike protein associated with elevated mortality rates
Carlos Farkas, Andy Mella, Jody J. Haigh
medRxiv 2020.10.23.20218511; doi: https://doi.org/10.1101/2020.10.23.20218511
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Large-scale population analysis of SARS-CoV-2 whole genome sequences reveals host-mediated viral evolution with emergence of mutations in the viral Spike protein associated with elevated mortality rates
Carlos Farkas, Andy Mella, Jody J. Haigh
medRxiv 2020.10.23.20218511; doi: https://doi.org/10.1101/2020.10.23.20218511

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Infectious Diseases (except HIV/AIDS)
Subject Areas
All Articles
  • Addiction Medicine (271)
  • Allergy and Immunology (556)
  • Anesthesia (135)
  • Cardiovascular Medicine (1766)
  • Dentistry and Oral Medicine (238)
  • Dermatology (173)
  • Emergency Medicine (313)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (661)
  • Epidemiology (10806)
  • Forensic Medicine (8)
  • Gastroenterology (593)
  • Genetic and Genomic Medicine (2958)
  • Geriatric Medicine (288)
  • Health Economics (534)
  • Health Informatics (1932)
  • Health Policy (836)
  • Health Systems and Quality Improvement (745)
  • Hematology (293)
  • HIV/AIDS (632)
  • Infectious Diseases (except HIV/AIDS) (12522)
  • Intensive Care and Critical Care Medicine (696)
  • Medical Education (300)
  • Medical Ethics (86)
  • Nephrology (324)
  • Neurology (2803)
  • Nursing (151)
  • Nutrition (433)
  • Obstetrics and Gynecology (559)
  • Occupational and Environmental Health (597)
  • Oncology (1472)
  • Ophthalmology (444)
  • Orthopedics (172)
  • Otolaryngology (257)
  • Pain Medicine (190)
  • Palliative Medicine (56)
  • Pathology (381)
  • Pediatrics (868)
  • Pharmacology and Therapeutics (366)
  • Primary Care Research (338)
  • Psychiatry and Clinical Psychology (2645)
  • Public and Global Health (5379)
  • Radiology and Imaging (1015)
  • Rehabilitation Medicine and Physical Therapy (597)
  • Respiratory Medicine (726)
  • Rheumatology (330)
  • Sexual and Reproductive Health (289)
  • Sports Medicine (279)
  • Surgery (327)
  • Toxicology (47)
  • Transplantation (150)
  • Urology (126)