Ethnically diverse mutations in PIEZO1 associate with SARS-CoV-2 positivity

COVID-19, caused by the SARS-CoV-2 virus, carries significant risk of mortality and has spread globally with devastating societal consequences. Endothelial infection has been identified as a feature of the disease and so there is motivation to determine the relevance of endothelial membrane mechanisms affecting viral entry and response. Here, through a study of patient data in UK Biobank released on 16 April 2020, we suggest relevance of PIEZO1, a non-selective cation channel protein that both mediates endothelial responses to mechanical force and unusually indents the cell membrane. PIEZO1 notably has roles that may also be relevant in red blood cell function, pulmonary inflammation, bacterial infection and fibrotic auto-inflammation. We provide evidence that single nucleotide polymorphisms (SNPs) in the gene encoding PIEZO1 are more common in individuals who test positive for SARS-CoV-2 regardless of pre-existing hypertension, myocardial infarction, stroke, diabetes mellitus or arthritis. Some of these SNPs are more common in African and Caribbean populations, which are groups that were recently shown to have greater susceptibility to infection. One of the SNPs is a missense mutation that results in an amino acid change in an evolutionarily conserved and previously unexplored N-terminal region PIEZO1. The data support the notion of genetic factors influencing SARS-CoV-2 infection and suggest a specific role for PIEZO1.


INTRODUCTION
Coronavirus disease  is caused by a new virus called SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) 1 . In 2020 this virus triggered a global health crisis, spreading rapidly with no vaccine or treatment yet available 2,3 . One way to address the problem could be to understand how the virus enters host cells because such knowledge could usefully inform repurposing of existing therapeutics and enable development of new therapeutics that reduce the dangers of the virus. There is already evidence that surface membrane proteins such as ACE2 and TMPRSS2 are important in SARS-CoV-2 entry 4 but there is little information on the roles of other membrane proteins, membrane lipid components or membrane structure. Research on other viruses has suggested that mechanisms of this type affect viral entry 5,6 . Particular membrane proteins of interest are ion channels 7,8 , which embed in the membrane and enable flux of ions such as Ca 2+ , a key intracellular signal 9 and regulator of coronavirus mechanisms [10][11][12] .
A recent report on blood vessels of COVID-19 patients pointed to importance of endothelial cell infection and endotheliitis 13 . The study included investigation of the mesenteric vasculature because of circulatory collapse in which there was mesenteric ischaemia requiring surgical resection to remove part of the small intestine 13 . Viral elements were found in endothelial cells and there were accumulations of inflammatory cells alongside endothelial and inflammatory cell death 13 . The authors suggested that therapeutic strategies aimed at stabilising the endothelium may be useful, especially in patients who are vulnerable to COVID-19 because of pre-existing endothelial dysfunction, including males and smokers and people who are obese or have hypertension or diabetes 13 . An independent study of lungs from patients who died from COVID-19 found severe endothelial injury and angiogenesis, contrasting with the lungs of patients who died from acute respiratory distress syndrome secondary to influenza 14 . In support of these conclusions, SARS-CoV-2 has been found to directly infect human blood vessel organoids 15 . Therefore the idea exists that vascular endothelium is particularly vulnerable to SARS-CoV-2 and important in COVID-19 severity.
For these reasons it could be helpful to learn specifically about the membrane mechanisms of endothelial cells that confer SARS-CoV-2 susceptibility and determine downstream consequences of the virus in the vasculature. In 2014, an intriguing ion channel protein called PIEZO1 was reported to be important in endothelium 16,17 . This protein forms Ca 2+ -permeable non-selective cation channels that have extraordinary capability to respond to membrane tension 18 and shear stress caused by fluid flow along the endothelial membrane surface 16 . Unusually for membrane proteins, PIEZO1 indents the membrane in an inverted dome-like fashion and therefore modifies the overall structure of the membrane 19,20 . The channel shows marked activity in mesenteric endothelium 21 and there is increasing evidence of its importance in many aspects of endothelial function, such as angiogenesis 16,22 and pulmonary vascular permeability 23,24 . There are also roles in cardiovascular health and disease more generally 25 , including in the regulation of interleukin-6 26 , which is a key inflammatory mediator of COVID-19 27 . We know there is importance in humans because naturally occurring loss-of-function mutations in PIEZO1 are known to be associated with lymphatic endothelial dysfunction 28,29 and varicose veins 30 and gain-of-function mutations associate with anaemia and malarial protection due to importance of PIEZO1 in red blood cells 31-33 . Preliminary genetic analysis has hinted at numerous additional conditions that may be related to or exacerbated by mutation in PIEZO1 25 .
Therefore we hypothesised that mutations in PIEZO1 might relate to COVID-19 and so we tested this hypothesis using newly available data in the UK Biobank. UK Biobank is a health research resource that recruited over 500,000 people aged 40 to 69 years between 2006 and 2010 across the UK 34 . It recently incorporated SARS-CoV-2 infection data.

RESULTS
On 16 April 2020 the UK Biobank released the data of 1409 individuals tested for SARS-CoV-2 across its 22 assessment centres. A candidate gene association study was conducted to establish the link between SARS-CoV-2 infection (COVID-19) and PIEZO1 gene. Table 1 shows the demographic properties of the 1409 study participants who were tested. Of those, 636 (45%) tested positive. The mean age of all individuals was 69.5 years and 756 individuals (54%) were men while 653 (46%) were women. 48% and 42% of men and women tested were found to be positive respectively. In the downstream analysis, we focused on the individuals who self-identified as "British" (82%). We evaluated the co-morbidities of these individuals obtained from non-cancer illness code (ID 20002). Of the total study sample of 1409 individuals, 521 (37%) self-reported that they had a diagnosis of hypertension, 77 (5%) myocardial infarction, 39 (3%) stroke, 123 (9%) diabetes mellitus and 214 (15%) arthritis. Table 1. Descriptive statistics and demographic data for COVID-19 in UK Biobank. N = Total number of individuals tested for COVID-19 in UK Biobank; SD = standard deviation. British ethnicity refers to the individuals who identified or reported themselves as "British". Others represent individuals who did not report as "British".
Two general models of logistic regression analysis assuming an additive genetic approach were conducted without adjusting for self-reported illnesses. The first general model was conducted on all individuals released by UK Biobank (N = 1409). To account for genetic heterogeneity amongst different ethnic populations in UK Biobank, the second logistic analysis was restricted to individuals who self-reported as "British". Both analyses were adjusted for age, sex, duration of moderate physical activity and the first ten principal components. Associations with COVID-19 risks were quantified using odds ratios (ORs) derived from logistic regression. Table 2 shows the synonymous and missense variants that were detected from the general model analysis.
The first general model included all the individuals and was adjusted for 13 covariates. Twenty variants were found to be statistically significant (P-value <0.05), indicating a degree of association. Of those, one was missense (rs1803328, OR = 0. To reduce the influence of different genetic backgrounds in the association analysis, the second general model was restricted to self-reported British among the 1409 individuals. In total, 1158 individuals were eligible for the next step of analysis using logistic regression and adjusted for 13 covariates. After adjusting for the covariates, 35 variants achieved the P-value threshold <0.05, indicating a significant association.  (Table 2 and Supplementary file). The missense variant rs1803382 was significant when the association analysis included the entire population (n = 1409) but became non-significant (P-value = 0.254) when the analysis was confined to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.06.01.20119651 doi: medRxiv preprint British. In contrast, three synonymous variants became statistically significant when the analysis was confined to British population (rs1061228, rs8043924, rs4782430). To extend our analysis of the "British" cohort, the association tests were adjusted to five underlying types of disorder; hypertension, myocardial infarction, stroke, diabetes mellitus and arthritis. The purpose of adjusting the analyses to these disorders was to account for covariate effects, to test if the SNPs reflected true associations between PIEZO1 and COVID-19 status. The COVID-19 case status was the response variable while the imputed genotype was the predictor variable using PLINK 2.0 logistic regression model. All the models were adjusted to age, sex, duration of moderate activity, along with the first ten principal components to control population stratification. Across the five models of logistic regression in Table 3, the results revealed similar missense and synonymous variants that achieved statistical significance (Pvalue <0.05). Additionally, the analysis provided evidence supporting the missense variant (rs7184427), which we observed in General Model 2. The missense variant was consistently significant across the five models with mean odds ratio of 0.76. Besides, six synonymous variants persisted across the analyses after adjusting for the covariates (rs8057031, mean OR = 0.7276; rs6500491, mean OR = 0.621; rs2290902, mean OR = 0.653; rs1061228, mean OR = 1.403; rs8043924, mean OR = 0.718; rs4782430, mean OR = 0.76). The odds ratio and P-value of each variant across the five models did not deviate significantly. Table 3. PIEZO1 mutations (single nucleotide polymorphisms, SNPs) significantly associated with COVID-19 after adjusting the self-reported illnesses. REF = reference allele; ALT = alternate allele; A1 = tested allele; OR = odds ratio; all models data were adjusted for age, sex, duration of moderate activity, principal component 1 to 10 in addition to the specific disease condition as given below. a Adjusted for hypertension status. b Adjusted for myocardial infarction status. c Adjusted for stroke status. d Adjusted for diabetes mellitus status. e Adjusted for arthritis status.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.06.01.20119651 doi: medRxiv preprint The relationships of 16 PIEZO1 orthologous protein sequences were inferred by constructing a maximum likelihood tree using RAxML ( Figure 1A). Amino acid sequence alignment of PIEZO1s revealed a high degree of conservation from residue 243 to 258 ( Figure 1B). The missense variant, rs7184427, encodes an amino acid located at position 250 in human PIEZO1 (highlighted using the brown box in Figure 1B). This residue is completely conserved across diverse species. It is important to note that the analysis used the reference genome, which does not always employ the major allele, as in this case for humans. Our data show that in many humans, alanine (A) is encoded by the major allele and is therefore common at this site. Following the logistic regression analyses, we focused on the missense variant (rs7184427) that remained significant in all analyses. The major (G) allele was associated with greater susceptibility to COVID-19. The variant was located in the N-terminal region of the PIEZO1 protein ( Figure 2). Although there are 3D structural data for the mouse PIEZO1 19,35 , none of these structures includes the N-terminal region of PIEZO1 in which A or V at position 250 is located. For this reason, using the available structural data we modelled the first three 4-helical bundles of PIEZO1 in one of the experimentally derived 3D structures of PIEZO1. Our model enables us to suggest that V250 (V257 in mouse PIEZO1) is located near the N-terminal region of the blades, at the cytosolic side of the protein.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 3, 2020. The frequency of the major allele (the pro-COVID-19 G allele resulting in V250A mutation) was 84.4%, 82.0% and 82.9% in global populations of UK Biobank, 1000 Genome and Genome Aggregation database (gnomAD) ( Table 4). The frequency in the South Asian populations was lower than in the global population (65.0-70.8%) but higher in the African and East Asian populations (87.1-90.1%). Pro-disease allele frequencies of all of the identified synonymous mutations also varied with ethnicity ( Figure 3). In two of these cases (rs8057031 and rs4782430) there was again higher frequency in African and Chinese populations ( Figure  3).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 3, 2020.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

DISCUSSION
Our data suggest genetic variation in PIEZO1 that increases susceptibility to SARS-CoV-2 infection. Such variation could increase the risk of other diseases, thereby indirectly affecting SARS-CoV-2 infection, but our analysis addressed major relevant diseases and pointed instead to PIEZO1 as a direct determinant of infection rate. In total, 1 nonsynonymous missense SNP and 6 synonymous SNPs were found to be linked to SARS-CoV-2 infection rates. To the best of our knowledge these data are the first to provide evidence of genetic linkage to SARS-CoV-2 infection and the first to implicate PIEZO1 as a factor in this disease.
The prevalence of many of these SNPs varies with ethnicity and this could be a factor underlying diversity seen in SARS-CoV-2 infection rates in different populations. The missense SNP (rs7184427) associated with more infections in the White British group is the G allele. It is the major allele and most common generally in the White British population and human populations in general (82-84% of people). It encodes alanine at position 250 in PIEZO1 protein, whereas the remaining people with the minor allele have valine at this position. The major allele is more frequent in African and Chinese (East Asian) groups and less frequent in South Asian groups. Some of the synonymous SNPs also show such variability, with rs8057031 and rs4782430 being more frequent in African / Caribbean and Chinese groups. Again lower frequency is sometimes seen in South Asian groups.
Data are emerging on the relevance of ethnicity to SARS-CoV-2 infection rates and the main conclusion so far is that infection is more common in Black compared with White people, regardless of socioeconomic conditions and other confounding factors 36,37 . This suggests greater genetic susceptibility of Black people. Therefore, the greater frequency of some of the pro-disease PIEZO1 SNPs in African and Caribbean groups could be an explanation, or at least contributory factor. We are not aware of evidence that infection rates are different in Asians. South Asian groups in the UK have experienced greater lethality from COVID-19 38 but the factors determining this effect may be different from those determining infection rates.
We do not yet know if alanine in place of valine at position 250 in PIEZO1 causes less, more or no change in the properties of the PIEZO1 channels or whether the altered DNA sequence affects PIEZO1 expression. It will take time and investment to determine such matters. The valine amino acid is similar to the alanine amino acid but the chemical difference may be crucial, for example in -helical integrity 39 . Evolutionary preservation of this residue from fish to human (Figure 1) suggests importance, otherwise variations at this site would likely have occurred and been tolerated and therefore led to divergence over millions of years.
While we have insight into how some parts of the PIEZO1 molecular machine operate 19,35,40 , the particular region of PIEZO1 incorporating V or A at position 250 is uncharted territory 20 . Progress with cryo-EM techniques has led to structural information 19,35 but the N-terminal region has not been resolved, possibly because it has highly flexible structural conformations. Nor has this region been studied by laboratory mutagenesis studies. Based on our molecular modelling we speculate that V250A is important in how the tip of the propeller blade curves round to abut the next blade (Figure 2). It may also be important in how the blades and the entire channel interact with membrane lipids and nearby proteins and therefore how the membrane structure is shaped and the sensitivity to mechanical perturbation is determined 20 . The molecular and functional studies required to unravel the roles of this blade tip and specifically the effect of V250A will be challenging. Moreover, our studies have shown the importance of understanding PIEZO1 in its native membrane environment; the native endothelial and red blood cell environments profoundly alter the gating characteristics of PIEZO1 16,25,41,42 . We do not yet know whether endothelial cell dysfunction affects the properties of PIEZO1, but any changes could be relevant to SARS-CoV-2.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 3, 2020. There is a public database entry indicating a link of V250A to hereditary lymphedema in one patient 43 and we know that generalised lymphatic dysplasia can be caused by loss of PIEZO1 expression 28,29 . Therefore we tentatively speculate that V250A reduces PIEZO1 function and that this is the reason for increased susceptibility to SARS-CoV-2 infection. If this is the case, PIEZO1 agonists may be a route to rescuing normal channel activity and protecting against infection. This may be a safe approach because we know that PIEZO1 gain-of-function is tolerated in people and even confers protective advantage in large populations where malaria is endemic 31-33 . A screen of 3.25 million small-molecules identified a PIEZO1 agonist called Yoda1 44 . It is increasingly used successfully as a tool compound in experimental studies of PIEZO1 25 and there is on-going work to elucidate the chemical structure-activity requirements and improve the physicochemical properties while retaining efficacy; some of these studies have been published 45 . To the best of our knowledge there is currently no PIEZO1 agonist that would be suitable for administration to people but the available data suggest that the principle of chemical enhancement of PIEZO1 is possible and thus that a therapeutic drug targeted to PIEZO1 is a realistic consideration.
In addition to a possible link to lymphedema, our study raises the question of whether COVID-19 is linked to varicose veins. A genome-wide association study identified mutations in PIEZO1 as determinants of varicose veins 30 and a recent study specifically linked V250A to varicose veins 46 . These observations further encourage the idea of an important relationship between vascular integrity and COVID-19.
PIEZO1 may act in this context through impact on membrane structure, thus affecting viral entry, but we should also consider the property of PIEZO1 to confer Ca 2+ permeability on the membrane because transmissibility of both SARS-CoV and MERS-CoV is Ca 2+ -dependent and both are closely related to SARS-CoV-2 10,11,47 . E protein of SARS-Co-V forms Ca 2+ permeable ion pores on ER/Golgi membrane and its activity can drive the inflammation and severe respiratory distress syndrome by enhancing IL-1 production 48 . One of the accessary proteins of SARS-CoV undergoes conformational changes upon binding to Ca 2+ and significantly contributes to disease progression 12,49 . Future studies should address these possibilities.
Synonymous mutations do not change the amino acid sequence but they could still, in theory, have effects by altering expression of the PIEZO1 gene, thereby affecting the total amount of PIEZO1 available for functional consequence. We note that one of the synonymous mutations we identified (rs2290902) has also been associated with HIV-1 infection 50 but we are not aware of further investigation of this matter. It is nevertheless consistent with our general idea that PIEZO1 is relevant to viral infection.
In addition to its endothelial and red blood cell functions, PIEZO1 is expressed in other cell types and implicated in other aspects of mammalian biology 25,40 . It may be particularly important to note in this context the recent persuasive evidence for a role of PIEZO1 in pulmonary inflammation, bacterial infection and fibrotic auto-inflammation 51 . In this study it was concluded that stimulation of PIEZO1 in immune cells by cyclical mechanical force is essential for innate immunity. Therefore we speculate that any loss of PIEZO1 function would increase susceptibility to lung infection and perhaps other infections and that PIEZO1 agonism might be beneficial. Cyclical activation of PIEZO1 might also be important in the protective benefits of physical exercise where PIEZO1 has been suggested to have an important role in blood pressure elevation of exercise that increases performance 21 . PIEZO1 might be relevant to the speculation about a connection between malarial protection and SARS-CoV-2 infection 52,53 because heterozygosity for PIEZO1 gain-of-function mutation has been suggested to be a major factor mediating protection from malaria 33 .
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.06.01.20119651 doi: medRxiv preprint We recognise that a potential limitation of this study is the relatively small size of the SARS-CoV-2 data set on which we could base the analysis at this time. Analysis of additional data, as they becomes available, will be important for deeper understanding and exploration of PIEZO1 specifically in non-White ethnic groups. Reopening of research laboratories will enable experimental investigation of underlying mechanisms and the testing of therapeutic strategies.
In summary, this early study suggests genetic association that is relevant to COVID-19 and ethnic variation in infection rates and a previously unrecognised relationship between COVID-19 and PIEZO1, a gene that encodes an important mechanically-activated ion channel.

Demographic information
The UK Biobank resource recruited over 500,000 people aged 40 to 69 years between 2006 and 2010 across the UK 34 . Participants completed a detailed clinical, demographic and lifestyle questionnaire, underwent clinical measures, provided biological samples (blood, urine and saliva) for future analysis and agreed to have their health records accessed. In July 2017, the genetic information from 501,708 samples was released to UK Biobank research collaborators. The UK Biobank has now released the first cohort of individuals tested for COVID-19 (N = 1474). These individuals were used in the present analysis to study the association between COVID-19 and PIEZO1. Baseline assessments of these individuals were recorded when they attended one of the 22 research centres located across the United Kingdom. Co-morbidity details of these subjects was taken from the self-reported non-cancer illness codes in the UK Biobank (Data-field 20002), which includes self-reported hypertension, myocardial infarction, stroke, diabetes mellitus and arthritis. Arthritis status consists of selfreported rheumatoid arthritis and osteoarthritis.

Statistical analysis
Imputed genotypes of chromosome 16 for all 1474 individuals were used for the association study. Imputation was conducted by UK Biobank using IMPUTE2 54 . As this is a candidate gene association study, we focused on the coding region of PIEZO1 gene. Variants with imputation quality (Info) score above 0.4 were retained in the analysis. The variants were further filtered based on minor allele frequency (MAF) >5%, missingness >10%, Hardy-Weinberg equilibrium (HWE) >1x10 -6 and sample missingness >10%. Sixty-five individuals were removed due to missing genotype, leaving 1409 individuals. A total number of 197,730 variants spanning chromosome 16 were eligible for downstream analysis. A further association study was restricted to the self-reported British population from the cohort of 1409 individuals. This derived 1158 individuals who self-reported as British and the filtering parameters were applied as above, resulting in 196,314 variants. Given that the present study is a candidate gene association study, we analysed 334 variants spanning the coding region of PIEZO1 and the statistical significance was set at p<0.05 (0.05/1 gene).
Step 1 of association analysis. The first step was conducted on the association between COVID-19 and PIEZO1 without adjusting for other underlying disorders. Two general models were performed using logistic regression implemented in PLINK v2.0 (whole genome data analysis toolset) 55 . The first general model was conducted using all the first cohort of individuals tested for COVID-19 (n = 1409), while the second model was limited to those individuals who self-reported as British (n = 1158). The COVID-19 status was used as a categorical variable in the logistic regression analysis. To account for population structure, the principal components were calculated for two cohorts using "--pca approx" implemented in PLINK 2.0 and used as covariates in the logistic regression. The duration of moderate activity was obtained from the UK Biobank (Field-ID 894) and adjusted as a covariate since PIEZO1 acts as an exercise sensor, mediating optimised redistribution of blood flow to sustain activity 21 . The activity level was determined using a questionnaire on duration of exercise/activity each . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.06.01.20119651 doi: medRxiv preprint day. Sex, age, duration of moderate activity and first ten principal components were used as covariates.
Step 2 of association analysis. The step 2 of the analysis focused on British population (n = 1158) with adjustment of non-cancer illnesses. The UK Biobank self-reported non-cancer illness codes (Field-ID 20002) were used to define hypertension, myocardial infarction, stroke, diabetes mellitus and arthritis status. Five logistic regression models were used to investigate the association of COVID-19 and PIEZO1 independently with adjustment of each of the illness. To control for the population structure, first 10 principal components of the 1158 individuals were included as covariates. After logistic regression analysis, ANNOVAR 56 was used to annotate the variants to predict missense variants, frameshift mutations and intronic variants.
The amino acid sequences of these 16 species were aligned using CLUSTAL OMEGA 58 . The resulting alignments were manually curated and trimmed using alignment editor in MEGA 7 59 . Next, the amino acid sequences were subjected to phylogenetic analysis using RAxML 60 based on maximum likelihood algorithm. The phylogenetic tree was generated with 100 rapid bootstrap replicates. The tree was visualised in Fig Tree version 1.4.3 61 and the bootstrap support values labelled on the nodes.

Molecular modelling of PIEZO1
This methodology has been reported previously 20 .

DATA AVAILABILITY
The data analysed are available to the registered users through UK Biobank. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

CONFLICTS OF INTEREST
The copyright holder for this preprint this version posted June 3, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 3, 2020. . https://doi.org/10.1101/2020.06.01.20119651 doi: medRxiv preprint