A common TMPRSS2 variant protects against severe COVID-19

Infection with SARS-CoV-2 has a wide range of clinical presentations, from asymptomatic to life-threatening. Old age is the strongest factor associated with increased COVID19-related mortality, followed by sex and pre-existing conditions. The importance of genetic and immunological factors on COVID19 outcome is also starting to emerge, as demonstrated by population studies and the discovery of damaging variants in genes controlling type I IFN immunity and of autoantibodies that neutralize type I IFNs. The human protein transmembrane protease serine type 2 (TMPRSS2) plays a key role in SARS-CoV-2 infection, as it is required to activate the virus spike protein, facilitating entry into target cells. We focused on the only common TMPRSS2 non-synonymous variant predicted to be damaging (rs12329760), which has a minor allele frequency of 25% in the population. In a large population of SARS-CoV-2 positive patients, we show that this variant is associated with a reduced likelihood of developing severe COVID19 (OR 0.87, 95%CI:0.79-0.97, p=0.01). This association was stronger in homozygous individuals when compared to the general population (OR 0.65, 95%CI:0.50-0.84, p=1.3x10-3). We demonstrate in vitro that this variant, which causes the amino acid substitution valine to methionine, impacts the catalytic activity of TMPRSS2 and is less able to support SARS-CoV-2 spike-mediated entry into cells. TMPRSS2 rs12329760 is a common variant associated with a significantly decreased risk of severe COVID19. Further studies are needed to assess the expression of the TMPRSS2 across different age groups. Moreover, our results identify TMPRSS2 as a promising drug target, with a potential role for camostat mesilate, a drug approved for the treatment of chronic pancreatitis and postoperative reflux esophagitis, in the treatment of COVID19. Clinical trials are needed to confirm this.


Summary
Infection with SARS-CoV-2 has a wide range of clinical presentations, from asymptomatic to life-threatening. Old age is the strongest factor associated with increased COVID19-related mortality, followed by sex and pre-existing conditions. The importance of genetic and immunological factors on COVID19 outcome is also starting to emerge, as demonstrated by population studies and the discovery of damaging variants in genes controlling type I IFN immunity and of autoantibodies that neutralize type I IFNs. The human protein transmembrane protease serine type 2 (TMPRSS2) plays a key role in SARS-CoV-2 infection, as it is required to activate the virus' spike protein, facilitating entry into target cells. We focused on the only common TMPRSS2 non-synonymous variant predicted to be damaging (rs12329760), which has a minor allele frequency of ~25% in the population. In a large population of SARS-CoV-2 positive patients, we show that this variant is associated with a reduced likelihood of developing severe COVID19 (OR 0.87, 95%CI:0.79-0.97, p=0.01).
This association was stronger in homozygous individuals when compared to the general population (OR 0.65, 95%CI:0.50-0.84, p=1.3×10 -3 ). We demonstrate in vitro that this variant, which causes the amino acid substitution valine to methionine, impacts the catalytic activity of TMPRSS2 and is less able to support SARS-CoV-2 spike-mediated entry into cells.
TMPRSS2 rs12329760 is a common variant associated with a significantly decreased risk of severe COVID19. Further studies are needed to assess the expression of the TMPRSS2 across different age groups. Moreover, our results identify TMPRSS2 as a promising drug target, with a potential role for camostat mesilate, a drug approved for the treatment of chronic pancreatitis and postoperative reflux esophagitis, in the treatment of COVID19. Clinical trials are needed to confirm this.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Main
The severe acute respiratory syndrome like coronavirus (SARS-CoV-2) has infected over 107 million individuals globally and has caused more than 2.3 million deaths. SARS-CoV-2 infection has a broad clinical spectrum, ranging from asymptomatic or mild symptomatic, to a life-threatening presentation requiring admission to intensive care. Age, and to a much lesser extent male gender and underlying clinical conditions, such as cardiovascular disease, obesity and diabetes, are known risk factors associated with an increased COVID19 morbidity and mortality. 1,2 The role of an individual's genetic background has recently emerged as an additional, yet not clearly understood, risk factor for COVID19 3,4,5 . Rare genetic variants in genes involved in the regulation of type I interferon (IFN) immunity, including autosomal recessive IRF7 and IFNAR1 deficiencies, have been identified in patients with lifethreatening COVID19 5 . Autoantibodies to type I IFNs also account for at least 10% of cases of critical COVID19 pneumonia 6 . Moreover, genome-wide association studies (GWAS) have discovered genetic haplotypes spanning several genes that are associated with COVID19 severity 2,7,3 .
The transmembrane protease serine type 2 (TMPRSS2) protein has a key role in coronavirus infections, including SARS-CoV-2, as it is required for priming the virus' spike (S) glycoprotein through its cleavage, thus facilitating endosome-independent entry into target cells 8,9 . TMPRSS2, which is part of the type 2 transmembrane serine proteases (TTSPs) family, is characterized by androgen receptor elements located upstream to its transcription site 10 . As well as cleaving and activating viral glycoproteins of coronaviruses and influenza A and B viruses 11 , TMPRSS2 is subjected to autocleavage, which results in the liberation of its soluble catalytic domain 12 . The conditions under which autocleavage of TMPRSS2 and other members of the TTSPs family occurs are yet to be elucidated.
TMPRSS2 is expressed in lung and bronchial cells 13 , but also in the colon, stomach, pancreas, salivary glands and numerous other tissues 14 . Moreover, it is co-expressed in bronchial and lung cells with the angiotensin-converting enzyme 2 (ACE2) 13 , which is the best described SARS-CoV-2 cellular receptor 15 . In the olfactory epithelium of mice, the expression of TMPRSS2, but not ACE2, appears to be age-related and greater in old compared to young animals 16 . Similarly, a recent study showed that expression of TMPRSS2 in mouse and human lung tissue is also age-related 17 . Studies in TMPRSS2 knock out (KO) mice reported reduced SARS-CoV and MERS-CoV replication in the lungs compared to wild-type mice, and a reduced proinflammatory viral response, especially cytokine and chemokine response via the Toll-like receptor 3 pathway 18,19 . We have recently shown that TMPRSS2 expression permits cell surface entry of SARS-CoV-2, allowing the virus to bypass potent endosomal restriction factors 20 . In vitro studies have shown that TMPRSS2 inhibitors prevent primary . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252931 doi: medRxiv preprint airway cell and organoid infection by SARS-CoV and SARS-CoV-2 21,20,22 . In animal studies, mice infected with SARS-CoV and treated with the serine protease inhibitor camostat mesilate had a high survival rate 23 . Recently, camostat mesilate (which, in Japan, is already approved for patients with chronic pancreatitis and postoperative reflux esophagitis) was shown to block SARS-CoV-2 lung cell infection in vitro 8,20 .
Based on the above data from animal models and cell-based studies supporting a protective role of a knock out TMPRSS2 on coronavirus infection (including SARS and MERS), we hypothesized that naturally-occurring TMPRSS2 genetic variants affecting the structure and function of the TMPRSS2 protein may modulate the severity of SARS-CoV-2 infection (here defined by the presence of respiratory symptoms severe enough to require a minimum of hospital admission).
We analysed 378 TMPRSS2 genetic variants reported in GnomAD (v2.1.1), the database of population genetic variations. We studied the evolutionary conservation of TMPRSS2 amino acids and the impact of amino acid substitution on TMPRSS2 protein structure (described in Methods). As no experimental structure of TMPRSS2 is yet available, we generated a 3D structural model using homology modelling ( Figure 1). We identified the chemical and physical bonds that stabilize the TMPRSS2 structure (i.e. hydrogen bonds, cysteine and salt in Finnish Europeans). This highly conserved valine occurs in the scavenger receptor cysteine-rich (SRCR) domain, whose function within TMPRSS2 is still not fully understood, although a role in ligand and/or protein interaction has been proposed 24 . Indeed, this domain . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252931 doi: medRxiv preprint (n=1,668,938) and pooled individuals with a laboratory-confirmed SARS-CoV-2 infection (including hospitalized and life-threatening COVID19 cases from the metanalyses previously described) or with a self-reported or physician-confirmed COVID diagnosis (total n=36,590 cases).
Although an overlap in the control sets used in these meta-analysis may be present, these results are consistent with our hypothesis that the TMPRSS2 rs12329760 variant has a protective effect against severe and/or life-threatening COVID19. However, studies examining the prevalence of this variant in SARS-CoV-2 infected asymptomatic or pauci- between East Asia and Europe 28 ). Indeed, a recent study showed a lower T allele frequency in a small cohort of Chinese patients with life-threatening COVID19 compared to the population frequency 29 . Although the differences in the proportion of SARS-CoV-2 patients who develop severe COVID19 across different populations 28 are more likely to be explained by social behaviour, public health measures to curb outbreaks, exposure to other viruses and immunological factors, human genetic variation across different populations may also marginally contribute to the observed differences.
To investigate the phenotypic effect of the TMPRSS2 V160M variant, we co-transfected 293Ts cells with ACE2 and either TMPRSS2 wild type (TMPRSS2 WT ) or V160M (TMPRSS2 V160M ), as previously described 20 . We and others previously observed that coexpression of TMPRSS2 and ACE2 results in rapid cleavage of ACE2. We, therefore, used a mutant ACE2 that cannot be degraded by TMPRSS2 30 . Two additional TMPRSS2 variants were included as controls: the catalytically inactive S441A (TMPRSS2 S441A ) and the catalytically active R255Q (TMPRSS2 R255Q ), that is unable to autocleave 12  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Wild type TMPRSS2 is expressed as roughly equal amounts of full-length and fully cleaved forms, with a small amount of intermediately cleaved product. As expected, the catalytically inactive TMPRSS2 S441A and the non-autocleavable TMPRSS2 R255Q resulted in only the fulllength TMPRSS2 being expressed. However, TMPRSS2 V160M resulted in a significantly higher proportion of full-length (55 kDa), and significantly lower proportion of fully cleaved protein (20 kDa) (p<0.05, Student's t-test), suggesting that the V160M substitution exerts a partial inhibitory effect on the proteolytic autocleavage of TMPRSS2 (see Figure 3A-C).
Subsequently, we investigated the effect of TMPRSS2 V160M on promoting viral entry, using a previously described SARS-CoV-2 pseudovirus entry assay 20 . Pseudovirus expressing the glycoprotein from the vesicular stomatitis virus (VSV-G) was used as a control, as this virus enters cells in a TMPRSS2-independent manner 20 . Briefly, cells co-transfected with ACE2 and TMPRSS2 wild type or variants were incubated with the pseudovirus (as described in 20,31 ) and after 48h, luminescence was measured. TMPRSS2 WT enhanced viral entry by ~5-fold compared to empty vector, while the catalytically dead TMPRSS2 S441A showed no enhancement ( Figure 3D). The non-autocleavable mutant TMPRSS2 R255Q showed a similar enhancement, suggesting that autocleavage is dispensable for optimal TMPRSS2-mediated enhancement. TMPRSS2 V160M showed no significant difference in viral entry compared to the TMPRSS2 WT . Overall, expression of catalytically active TMPRSS2 proteins slightly inhibited VSV-G mediated entry ( Figure 3E).
The partial inhibitory effect exerted by the V160M variant on the proteolytic autocleavage of TMPRSS2 resulted in a far greater proportion of uncleaved, surface-expressed TMPRSS2 V160M compared to TMPRSS2 WT . We, therefore, re-assessed whether TMPRSS2 V160M affects SARS-CoV-2 S-expressing pseudovirus entry by using the double mutant TMPRSS2 R255Q/V160M (which cannot autocleave) to control for protein cell-surface expression. Under these conditions, and across a range of plasmid concentrations, TMPRSS2 R255Q/V160M showed a significantly reduced ability to promote SARS-CoV-2 Sexpressing pseudovirus compared to TMPRSS2 R255Q alone, despite equal protein expression ( Figure 3F, H). Again, TMPRSS2 R255Q/V160M had no effect on VSV-G-mediated entry ( Figure   3G).
Overall, our results suggest that the V160M substitution results in a moderately less catalytically active TMPRSS2, which is less able to autocleave and prime the SARS-CoV is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252931 doi: medRxiv preprint that PAR2 is one of these substrates 32,33 . PAR2 is expressed in several tissues, including lung, vascular endothelial and vascular smooth muscle cells 34,35 and its protease-mediated activation promotes inflammation by inducing prostaglandin synthesis and cytokine production in the lungs and other organs 36,37,38,39,40 . An intriguing hypothesis is that, similar to other soluble serine proteases, such as the human airway trypsin-like protease HAT (also known as TMPRSS11D), the soluble wild type TMPRSS2 protease may also have a role in promoting inflammation in the lungs and other tissues. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

TMPRSS2 three-dimensional structure and variant analysis
A 3D structural model of the TMPRSS2 protein was generated using our in-house Phyre homology modelling algorithm 41 . The FASTA sequence of TMPRSS2 was obtained from the UniProt protein knowledge database 42  DNA extraction, genotyping and quality control have been described in detail previously 3 .
Genetic ancestry was inferred using ADMIXTURE and reference individuals from the 1000 Genomes project. Imputation was performed using the TOPMed reference panel.
Lentiviral pseudotype production was performed as previously described 20,31 . ACE2 FLAG was used as previously described 20  RRID:Addgene_53887 52 . Non-cleavable ACE2-FLAG and TMPRSS2 mutants were generated by overlap extension PCR or site-directed mutagenesis.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 8, 2021.

Phenotypic assays
293Ts were co-transfected with FLAG-tagged, non-cleavable ACE2 and TMPRSS2 as previously described 20 . Briefly, confluent 10cm 2 dishes of 293T cells were co-transfected with 1µg each of TMPRSS2 and ACE2-FLAG. 24 hours later, cells were resuspended in fresh media and either spun down for lysis and western blot or added to 96 well plates along with pseudovirus. 24 hours later, media was refreshed and a further 24 hours later, cells were lysed with reporter lysis buffer (Promega) and luminescence (measured as relative luminescence units, RLU) was read on a FLUOstar Omega plate reader (BMF Labtech) using the Luciferase Assay System (Promega).

Statistical analysis
The association between the TMPRSS2 rs12329760 variant and COVID19 severity was assessed using logistic regression. Genetic associations in the GenOMICC/ISARIC 4C cohort were analysed as previously described 7 . Briefly, logistic regression with additive and recessive models was performed in PLINKv1.9, adjusting for sex, age, mean-centered agesquared, top 10 principal components (principal component analysis [PCA] performed to adjust for population stratification) and deprivation index decile based on UK postcode. Each major ancestry group alternative in the 100,000 Genomes control group was performed with mixed model association tests in SAIGE 53 (v0.39), including age, sex, age-squared, age-sex interaction and the first 20 principal components as covariates. Trans-ethnic meta-analysis of GenOMICC data for different ancestries was performed by METAL using an inversevariance weighted method and the P-value for heterogeneity was calculated with Cochran's Q-test for heterogeneity implemented in the same software 54 .
Data are presented as mean±standard deviation. Log-normality was assessed using the Shapiro-Wilk test and QQ plot. A two-tailed Student's t-test was used to compare the means of two groups. One-way ANOVA was used to compare the means of more than two groups.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The TMPRSS2 protein is composed of a cytoplasmic region (residues 1-84), a transmembrane region (TM, residues 85-105) and an extracellular region (residues 106-492).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

13
The latter is composed of three domains: the LDLR class A (residues 112-149), the scavenger receptor cysteine-rich domain (SRCR) (residues 150-242) and the Peptidase S1 (residues 256-489), which contains the protease active site: residues His296, Asp345 and Ser441. The 3D model of the extracellular region residues 145-491 corresponding to domains SRCR-2 (in green) and Peptidase S1 (in blue) is presented. Valine 160 (Val 160, depicted as a red sphere on the cartoon), which harbours variant p.Val160Met, occurs in the SRCR domain and spatially far from the TMPRSS2 catalytic site (mapped onto the surface of TMPRSS2).
The transmembrane serine protease hepsin was used as a template to generate the model (PDB: 1Z8G, chain A, X-ray structure with 1.55Å resolution; model confidence 100%, sequence identity to target sequence= 35%).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 8, 2021. ; https://doi.org/10.1101/2021.03.04.21252931 doi: medRxiv preprint