Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistanis and Bangladeshis

Background: Individuals with South Asian ancestry have higher risk of heart disease than other groups in Western countries; however, most genetic research has focused on European-ancestry (EUR) individuals. It is unknown whether reported genetic loci and polygenic scores (PGSs) for cardiometabolic traits are transferable to South Asians, and whether PGSs have utility in clinical settings. Methods: Using data from 22,000 British Pakistani and Bangladeshi individuals with linked electronic health records from the Genes & Health cohort (G&H), we conducted genome-wide association studies (GWAS) and characterised the genetic architecture of coronary artery disease (CAD), body mass index (BMI), lipid biomarkers and blood pressure. We applied a new technique to assess the extent to which loci from GWAS in EUR samples were transferable. We tested how well existing findings from EUR studies performed in genetic risk prediction and Mendelian randomisation in G&H. Results: Trans-ancestry genetic correlations between G&H and EUR samples for the tested traits were not significantly lower than 1, except for BMI (rg=0.85, p=0.02). We found evidence for transferability for the vast majority of loci from EUR discovery studies that were sufficiently powered to replicate in G&H. PGSs showed variable transferability in G&H, with the relative accuracy compared to EUR (ratio of incremental r2/AUC) [≥]0.95 for HDL-C, triglycerides, and blood pressure, but lower for BMI (0.78) and CAD (0.42). We observed significant improvement in categorical net reclassification in G&H (NRI=3.9%; 95% CI 0.9-7.0) when adding a previously developed CAD PGS to clinical risk factors (QRISK3). We used transferable loci as genetic instruments in trans-ancestry Mendelian randomisation and found evidence of an increased CAD risk for higher LDL-C and BMI, and for lower HDL-C in G&H, consistent with our findings for EUR samples. Conclusions: The genetic loci for CAD and its risk factors are largely transferable from EUR studies to British Pakistanis and Bangladeshis, whereas the transferability of PGSs varies greatly between traits. Our analyses suggest clinical utility for addition of PGS to existing clinical risk prediction tools for this population.


Introduction 1
Individuals with South Asian ancestry (SAS) account for more than a fifth of the global 2 population and experience a higher risk of coronary artery disease (CAD) than other 3 ancestries. For example, British South Asians have three-to four-fold higher CAD risk than 4 White British people 1 . Understanding the determinants of excess CAD burden in SAS 5 populations and improving prediction to enable preventive interventions represent important 6 public health priorities. 7 8 Common genetic variation is an important determinant of CAD and of upstream risk factors 9 such as blood pressure, lipids, and body mass index (BMI). The genetic component of disease 10 risk can be harnessed to identify underlying disease genes and pathways, to estimate the 11 unconfounded effects of risk factors by Mendelian randomisation (MR), and to improve risk 12 prediction through the application of polygenic scores (PGS). However, the genetic basis of 13 CAD risk is not well characterised in SAS populations because genome-wide association 14 studies (GWAS) have been mostly limited to European-ancestry (EUR) populations 2 . 15 16 Fundamental questions remain about the extent to which the genetic determinants of 17 cardiometabolic traits are shared by EUR and SAS populations. These have important 18 implications to translational applications of genetic data such as causal inference with MR 19 which could prioritise different prevention strategies or drug targets between ancestries, and 20 clinical risk prediction. Whilst the predictive performance of PGSs derived from EUR 21 populations in non-EUR individuals decreases with genetic distance [3][4][5][6] , the extent to which 22 this attenuation is due to genetic drift (differences in linkage disequilibrium and allele frequency 23 7 ) versus heterogeneity of causal genetic effects remains unclear. Furthermore, the potential 24 clinical utility of a CAD PGS in a real-world healthcare system is largely unknown, since 25 previous studies have mostly examined research cohorts composed of volunteers who are 26 healthier and wealthier than average (e.g. UK Biobank [8][9][10][11][12]. 27 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 follows: corrected LDL-C = uncorrected LDL-C + 0.2*adjusted TC. LDL-C/0.7 was used for 32 23 individuals for whom we couldn't find a TC measurement on the same date. Rank-based 24 inverse normal transformation was applied to the lipid levels. 25 26 We extracted the latest systolic blood pressure (SBP) and diastolic blood pressure (DBP) 27 measurements and adjusted for blood pressure medication use by adding 15 and 10 mmHg 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 to SBP and DBP, respectively, if the measurement coincided with any prescription date 25 . 1 Sample sizes are shown in Table 1 (all individuals) and Table S2 (unrelated).  2   3 To calculate a standard clinical risk score to compare with the PGS, we calculated the QRISK3 4 10-year predicted risk for CAD 26 in G&H using the R package "QRISK3" v0.3.0 27 . QRISK3 5 was calculated based on the data available up until 1 January, 2010, which is about 10 years 6 prior to the latest data extraction. We excluded about one third of CAD cases whose diagnosis 7 was made earlier than this assessment date (prevalent cases) and used incident cases who 8 developed CAD later. Follow-up varied for cases and was fixed at 10 years for controls. We 9 used clinical data that were extracted earlier than the assessment date (1 January 2010) to 10 calculate QRISK3. The QRISK3 algorithm has variables that indicate whether a patient has a 11 variety of other diseases, and these were defined using the codes shown in Table S3, 12 following 10 . Medication use (hypertension treatment, corticosteroid, and atypical antipsychotic 13 medication) was defined as two or more prescriptions, with the most recent one having been 14 issued within 28 days prior to the assessment. We used the most recent measurements taken 15 prior to the assessment date, and kept individuals with at least three non-missing 16 measurements out of four (height, weight, SBP, and TC). Pattern of missingness is shown in 17 Figure S3. Townsend index was not available in G&H, so we used the mean value (3.307) of 18 the lowest two quintiles from the 2011 census data in the UK 28 . HDL-C levels were all 19 measured later than 2010 in G&H, so for TC/HDL-C ratio, we used 3.905 and 4.882 (averages 20 calculated using later data) for females and males, respectively. To deal with missing data, we 21 applied multiple imputation which accounts for sex, age, and genetically-defined ancestry 22 (Bangladeshi versus Pakistani; identified using PCA-UMAP), using the R package "mice" 23 v3.13.0 to impute height, weight, SBP, SD of SBP measurements within 2 years, and smoking 24 status. 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Phenotype definitions from electronic health-record data in 1 eMERGE 2 Phenotype data in eMERGE were downloaded from dbGaP (phs001584.v1.p1, 3 phs000888.v1.p1, phs001584.v2.p2). Individuals younger than 16 years old were excluded. 4 BMI was provided and we took the median value from adult measurements. Lipid and blood 5 pressure measurements were taken from dataset phs000888.v1.p1. Data on medications 6 affecting lipid and BP measurements were not available, so the highest measurements for 7 LDL, TC, SBP, and DBP were used when comparing PGSs with G&H in order to minimise the 8 effects of medications. CAD was ascertained using ICD9/10 codes which were available in 9 the updated eMERGE Phase III dataset (phs001584.v2.p2). Coronary artery disease (CAD) 10 cases and controls were defined based on secondary care ICD10 codes as described above 11 for G&H (Table S1). 12 Genome-wide association analyses in Genes & Health 13 GWAS was performed with SAIGE 29 and adjusted for age, age 2 , sex and the first twenty 14 principal components. For total cholesterol and LDL-C, adjustments were made for use of 15 statins as described above. We followed the QC procedure in 30  Heritability and trans-ancestry correlations 21 Datasets that were used in analyses are provided in Table S4. We used GCTA to estimate 22 SNP heritability in G&H and eMERGE 31 . We excluded one sample in each pair of 3 rd -degree 23 relatives (kinship coefficient >0.0442 calculated using KING v2.2. 4 14 ). We used SNPs with 24 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 INFO >0.9 and MAF >0.01 in each cohort separately. We also calculated SNP heritability using 1 the intersection of these SNP sets in both cohorts. For CAD, we estimated SNP heritability on 2 the liability scale using 6.7% as the prevalence estimate in the US 32 , and 3.33% for the UK 3 background population from which G&H is sampled, defined as all people from South Asian 4 ethnicities (N=255,066 aged ≥20 years) registered with a primary health physician/GP in four 5 east London boroughs. 6   7 For the genetic correlation analyses, we used GWAS summary statistics generated in EUR 8 individuals from UK Biobank (UKBB), since we needed a larger sample size of ancestrally 9 homogeneous individuals than is available through eMERGE to obtain accurate estimates. 10 We used Popcorn (https://github.com/brielin/Popcorn) to estimate the trans-ancestry genetic 11 correlations between G&H and UKBB EUR individuals while accounting for differences in LD 12 structure 33 (i.e. the correlation of causal-variant effect sizes across the genome at SNPs 13 common to both populations). Variant LD scores were estimated for ancestry-matched 1000 14 Genomes v3 data for each study combination (i.e. SAS-EUR). The estimation of LD scores 15 failed for chromosome 6 for some groups, so we left out the major histocompatibility complex 16 (MHC) region (positions 28,477,797 to 33,448,354)  Previous studies that evaluated reproducibility of GWAS loci in SAS individuals did not formally 21 account for differences in power or LD patterns 34-36 . We assessed whether established trait-22 associated loci were reproducible in G&H by performing a lookup of loci identified in non-SAS 23 ancestry GWAS (Table S4). Credible sets for established loci were generated and consisted 24 of lead (independent) variant plus proxy SNPs (r 2 >=0.8) within a 50kb window (based on the 25 EUR 1000 Genomes data) of the sentinel variant and with p-value <100 ✕ psentinel. The locus 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021.
was defined as being 'transferable' if at least one variant from the credible set was associated 1 at p <0.05 with the relevant trait in G&H, and the direction of effect matched in both datasets. 2 For loci harbouring multiple signals, we only kept the most strongly associated variant (i.e. 3 smallest p-value). Expected power for replication was calculated using alpha=0.05, the effect 4 size estimated in the EUR GWAS, and the allele frequency of the variant and sample size in 5 G&H. The power of lead variants per locus was summed up and divided by the number of loci 6 to give an estimate of the number of expected significant loci per trait, which was compared 7 with the observed number of such loci; to our knowledge, this is a novel approach for 8 assessing reproducibility of GWAS findings. Loci were only deemed to be 'non-transferable' if 9 they contained at least one variant in the credible set with >80% power and yet none of the 10 variants in the credible set had p <0.05 and no variant within 50kb of locus had p <1x10 -3 in 11 G&H. LocusZoom (http://locuszoom.org/) was to create regional association plots. 12 13 Trans-ancestry colocalisation 14 We used the Trans-ethnic colocalisation method (TEColoc) 15 (https://github.com/KarolineKuchenbaecker/TEColoc) 37 which tests whether a specific locus 16 has the same causal variant in two groups with different ancestry, and applied it to G&H and 17 UKBB EUR individuals. This method adopts the joint likelihood mapping (JLIM) statistic 18 developed by Chun and colleagues 38 that estimates the posterior probabilities for 19 colocalisation between GWAS signals and compares them to probabilities of distinct causal 20 variants while explicitly accounting for LD structure. For this, LD scores were estimated using 21 a subset of samples from the 1000 Genomes Project v3 that had matching ancestry to all 22 Europeans for UK Biobank. For G&H we used raw genotype data and LD was estimated 23 directly for these samples. JLIM assumes only one causal variant within a region in each study. 24 We therefore used small windows of 50Kb for each known locus to minimise the risk of 25 interference from additional association signals. Distinct causal variants were defined by 26 separation in LD space by r 2 ≥0.8 from each other. We excluded loci where the overlap 27 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 between UKBB and G&H was <10 SNPs and the proportion of well-imputed SNPs overlapping 1 between cohorts (SNP coverage) was <10%; this left no loci to consider for CAD, SBP and 2 DBP. We used a significance threshold of p <0.05 to determine evidence of sharing. 3 LocusZoom (http://locuszoom.org/) was to create regional association plots. 4 Construction of polygenic risk scores 5 We evaluated the performance of PGSs in G&H and eMERGE. We first assessed PGSs that 6 were previously constructed (mostly optimised in EUR samples) from the PGS Catalog 39 . We 7 restricted to 7,353,388 bi-allelic SNPs that had INFO ≥0.3 and MAF ≥0.1% in both eMERGE 8 and G&H. Variant information in existing PGS was harmonised to GRCh37 using dbSNP 9 mappings from Ensembl Variation and liftover. We calculated PGSs as weighted sums of 10 imputed allele dosages using plink2.0 --score function. There were often multiple PGSs that 11 were previously developed from different studies available for each trait, and below we report 12 the one that had the highest accuracy in each cohort. The best PGS (defined as described in 13 the next section of the Methods) for BMI was derived from GWAS conducted in primarily EUR 14 samples and optimised in EUR individuals, and those for lipids and BP contained genome-15 wide significant variants identified in EUR GWASs. We selected different PGSs for CAD in 16 eMERGE and G&H, with the former optimised in EUR individuals and the latter in SAS 17 individuals; in both cases these were based on GWAS conducted in primarily EUR samples. 18 The details of each PGS are in Table S5. 19 20 Next we calculated PGSs using the clumping and p-value thresholding method (C+T) and 21 optimised PGSs in G&H and eMERGE separately. We used GWAS summary data from 22 primarily EUR samples (Table S4). We used LD estimated using EUR samples (N=503) from 23 the 1000 Genomes project for clumping using PRSice2 v2.2.11 40 . We calculated multiple 24 scores using combinations of various LD r 2 thresholds (0.1, 0.2, 0.5, 0.8) and p-value 25 thresholds (5✕10 -8 , 1✕10 -7 , 5✕10 -7 , 1✕10 -6 , 5✕10 -6 , 1✕10 -5 , 5✕10 -5 , 1✕10 -4 , 5✕10 -4 , 0.001, 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint 0.005, 0.01, 0.02, 0.05, 0.08, 0. 1, 0.2, 0.3, 0.4, 0.5, 0.8, 1) for each trait, and reported the PGS 1 with the best predictive performance within each target cohort. 2 3 Lastly, we calculated meta-PGSs proposed by  that incorporate GWAS 4 summary data from the target populations. We downloaded GWAS summary data that were 5 generated in SAS samples of the UKBB from the Pan-UK Biobank website 6 (https://pan.ukbb.broadinstitute.org), and constructed scores (PGSSAS) using the C+T method 7 described above and using SAS samples from the 1000 Genomes project for the LD 8 reference. We combined the scores derived from EUR GWASs (PGSEUR) and PGSSAS in linear 9 regression to construct meta-PGSs. 10 Assessment of PGS accuracy and clinical performance 11 We excluded one sample in each pair of 2 nd -degree relatives (kinship coefficient >0. 0884 12 calculated using KING v2.2.4 14 ). Individuals with the highest number of relatives (and controls, 13 if the trait is binary) were removed first. Sample sizes for each trait are in Table S2. 14 Quantitative traits were inverse normal transformed. Age at recruitment was used as a 15 covariate for analysis of disease status, and age at measurement for analysis of quantitative 16 traits. PGSs were standardised to a mean of 0 and SD of 1. We fitted the following two models: 17 (1) the full model which had PGS and covariates namely sex, age, age 2 , and the first 10 genetic 18 PCs, and (2) the reference model which accounted for the covariates only. For continuous risk 19 factors, linear regression was fitted, and the gain in R 2 when adding PGS as an additional 20 predictor, or incremental R 2 , was calculated as the difference between the R 2 of the full model 21 and the reference model. Logistic regression was used to assess the associations between 22 PGSs and CAD. The area under the receiver operating characteristic curve (AUC) was 23 estimated for both models with the R package "pROC" v1.16.2 and incremental AUC was 24 calculated similarly. We performed bootstrap resampling of individuals 1,000 times to estimate 25 the 95% confidence intervals for incremental R 2 and incremental AUC. The best PGS per trait 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 was the one with the highest incremental R 2 for continuous risk factors and the one with the 1 highest incremental AUC for CAD. We estimated the effect size (or odds ratio for binary traits) 2 per SD of PGS from the full model. Effect size or odds ratio for quintiles, and for top 10% 3 versus middle 40-60% were reported as well. Relative accuracy was calculated as the ratio of 4 incremental AUC (or incremental R 2 for continuous traits) in G&H to that in eMERGE. 5 6 QRISK3 scores were calculated for 8,112 unrelated individuals as described above (420 CAD 7 cases and 7,702 controls). To integrate QRISK3 scores with PGS for CAD, we followed 8 Riveros-Mckay et al. 10 and calculated an integrated score by multiplying the odds converted 9 from the QRISK3 score with the odds ratio given an individual's PGS, where the odds ratio 10 per SD of PGS was estimated using a logistic regression in which QRISK3 and their interaction 11 were accounted for. The logistic regression was performed in males and females separately. 12 We used the most accurate PGS for CAD in SAS from the PGS Catalog, which was developed 13 by Wang et al. 42 ; this score was derived from EUR GWAS using LDpred and tuned in SAS 14 individuals in UKBB. We regressed out 10 PCs from the PGS, and used the scaled residuals 15 in the Cox regression analysis. Cox regression was performed using the R package "survival" 16 v3.2-7. The concordance indices (C-indices) of the following models were compared: (1) age 17 at assessment + gender, (2) PGS + age at assessment + gender, (3) QRISK3, and (4) the 18 integrated score. We calculated the continuous net reclassification index (NRI) and categorical 19 NRI (using 10% as the threshold to classify high-risk individuals) for the integrated score 20 compared to QRISK3 alone. NRI was calculated as the sum of NRI for cases and NRI for 21 controls (noncases): 22 For continuous NRI, P(up|case) and P(down|case) indicate the proportions of cases that had 24 higher or lower risk estimates using the integrated score, respectively. For categorical NRI, 25 P(up|case) indicates the proportions of cases that were reclassified as high-risk individuals 26 (i.e. with <10% risk by QRISK3 but >10% by the integrated scores). We calculated NRI in two 27 age groups (25-54 versus 55-84 years old at baseline, chosen since the average age of onset 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 in this cohort was 55.3 years old), as well as in sex-by-gender subgroups. Bootstrap 1 resampling (1,000 times) was used to estimate confidence intervals for NRI. 2 Mendelian randomisation analysis 3 We modelled liability to CAD as our outcome within a two-sample Mendelian randomisation 43 4 (MR) framework using the risk factors (BMI, SBP, DBP, LDL-C, HDL-C, TG) as exposures. To 5 identify genetic instruments for the exposure, we explored three alternative approaches: (a) 6 established loci significant at p<5x10 -8 in the original EUR GWAS; (b) transferable loci defined 7 as described above, taking the effect size from the original EUR GWAS; and (c) loci significant 8 at p<5x10 -8 in the SAS ancestry group of the Pan-UKBB GWAS, LD-clumped to an r 2 <0.2 with 9 a LD window of 50kb, based on SAS 1000 Genomes project LD reference. Where insufficient 10 genome-wide significant instruments were identified, we used a more permissive p-value 11 threshold of p<5x10 -5 for instrument selection in UKBB SAS. The primary MR analysis was 12 performed using, as outcome, summary association data from the G&H CAD GWAS 13 performed as described above, using the inverse-variance weighted method under a random 14 effect model, implemented with the TwoSampleMR R package 44 . For comparison, a two-15 sample MR approach was also performed using summary data for CAD from eMERGE and 16 established loci significant at p<5x10 -8 in the original EUR GWAS. We also undertook several 17 sensitivity analyses. In brief, we evaluated the MR-Egger intercept to assess directional 18 pleiotropy and Cochran's Q statistic 45 as an indicator of heterogeneity. MR analysis using 19 weighted median 46 and weighted methods 47 models were additionally performed in the 20 presence of heterogeneity. 21

22
In G&H, 4.9% (N=1,110) of the individuals had coronary artery disease (CAD), with the age of 23 onset ranging from 17 to 97 years old (median 55). A quarter of the G&H participants were on 24 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint active statin prescriptions, 23% on BP medications, 29% had high TC levels (>5 mmol/L), and 1 30% had high LDL-C levels (>3 mmol/L; Table S6) 48 . Datasets that were used in each analysis 2 are provided in Table S4. 3 Shared genetic architecture of cardiometabolic traits 4 We compared the genetic architecture of coronary artery disease (CAD) and upstream risk 5 factors, namely HDL-C, LDL-C, triglycerides (TG), total cholesterol (TC), systolic and diastolic 6 blood pressure (SBP & DBP), between British Pakistanis and Bangladeshis (BPB) from G&H, 7 and European-ancestry populations (EUR) (Figure 1). We used EUR individuals from the 8 EHR-based eMERGE cohort to estimate heritability, since phenotypes had been ascertained 9 in a similar way to G&H (i.e. EHRs). All traits were found to have significant SNP heritability 10 (h 2 = 0.03-0.23) in G&H, with estimates similar to those in eMERGE ( is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table S4. 8 9 10 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021.  High transferability of cardiometabolic loci 11 We assessed whether published trait-associated genomic loci identified in predominantly EUR 12 populations were shared by the BPB population represented by G&H. To account for 13 differences in LD patterns, our assessment of transferability was based on the credible sets 14 of variants per locus, likely to contain the causal variant, rather than the sentinel variants alone. 15 Low numbers of transferable loci may be due to limited statistical power rather than lack of 16 causal variant sharing. Therefore, we compared the number of observed transferable loci with 17 the number expected given the sample size and allele frequency in G&H if all causal variants 18 were shared. The number of expected transferable loci varied widely between traits (e.g. we 19 expected to be able to detect significant associations for 56% of HDL-C loci but only for 18% 20 of SBP loci), highlighting the importance of accounting for power when assessing 21 transferability. Across most traits examined, the observed number of transferable loci closely 22 matched the loci we expected ( Table 1 and Table S8). For example, for BMI we expected to 23 be able to find evidence for transferability for 20% of loci and we did indeed observe 24 transferability for 21% of loci. However, the exception was CAD for which the number of 25 observed transferable loci (13%) was below the expected number (21%), although this 26 difference was only marginally significant (binomial p-value = 0.05). 27 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021.  6   7 We also assessed whether there were any specific loci that were not transferable despite 8 being well powered to observe an association (power >80%). Out of a total of 184 well-9 powered loci tested across all traits, only nine were non-transferable; that is, no variant in the 10 credible set was significant at p<0.05 and no variant within 50kb of locus was significant at 11 p<1x10 -3 ( Figure S4). These nine loci were all associated with lipid traits: EVI5, NBEAL1, 12 GPAM, CETP, STAB1, TTC39B, SH2B3, ACP2 and NECAP2 ( Table 2). Of these loci, CETP, 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 which was previously associated with LDL-C levels in Europeans (established variant in 1 Europeans -rs7499892), was strongly associated with HDL-C in G&H (p=7.08x10 -56 ), but not 2 with LDL-C levels (p=0.23) (Figure S5)  Even when there are associations in the same region in two ancestry groups, it is possible 9 that they are driven by different causal variants, as previously seen 49 . To assess the extent 10 of sharing of causal variants between ancestries at previously reported loci with evidence of 11 transferability, we applied trans-ancestry colocalisation for G&H with UKBB EUR samples as 12 the reference. We found evidence for the most extensive sharing of causal variants for 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint transferable lipid loci: total cholesterol (61% of loci had significant colocalisation), followed by 1 TG (56%), HDL-C (48%) and LDL-C (47%) ( Table 1). For BMI we found evidence for sharing 2 of causal variants for only 26% of transferable loci assessed ( Table 1 and Table S9). Causal 3 variants in major lipid loci such as PCSK9 were among variants that were consistently not 4 shared (pJLIM>0.05) between the two populations ( Figure S6 and Table S9). 5   6 Variable transferability of polygenic scores 7 Polygenic scores (PGSs) for CAD have been shown to have predictive value over risk scores 8 based on clinical factors alone 10,11,42,[50][51][52][53][54] . To assess the transferability of PGSs for 9 cardiometabolic traits derived from EUR populations into BPB individuals, we compared 10 predictive performance in G&H to that in EUR individuals from eMERGE. We quantified 11 predictive accuracy using the "incremental AUC" statistic for CAD and the "incremental R 2 " 12 statistic for continuous risk factor traits; these are the gain in AUC or R 2 when adding the PGS 13 to the regression of phenotype on the baseline covariates (sex, age and genetic PCs). 14 15 We first evaluated the previously published PGSs from the PGS Catalog (Table S5). The 16 PGSs for risk factors were developed using data from primarily EUR individuals, and the CAD 17 PGSs that proved to have the best performance in G&H and eMERGE were two different 18 scores optimised in SAS 42 and EUR samples 53 , respectively. PGSs for all traits assessed 19 were significant predictors in G&H (Table S5, Figure 3A). For prediction in G&H, the 20 incremental R 2 for BP was low (~1.8%), but it was higher for lipids and BMI, ranging from 3.9% 21 to 6.7%. Relative accuracy of PGS in eMERGE and G&H, determined by the ratio of 22 incremental AUC or R 2 , was close to 1 for HDL-C, TG, SBP and DBP, and lower for CAD 23 (42%, 95% CI: 30%-59%) and BMI (78%, 95% CI: 68%-88%; Figure 3B). Amongst the risk 24 factors, prediction of LDL-C had the lowest relative accuracy (66%, 95% CI: 53%-79%), 25 probably due to the fact that we did not adjust for statin usage since medication data were not 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint available in eMERGE, and BPB individuals were more likely to be treated with statins 55 . 1 Incremental R 2 for the PGS for LDL-C increased from 3.9% (3.3%-4.5%) to 6.2% (5.3%-2 7.1%) when using statin-adjusted LDL-C in G&H (Table S5, Figure 3A), although the 3 heritability was not significantly different (Figure 2A). In a sensitivity analysis, the relative 4 accuracy of the CAD PRS in eMERGE versus G&H was consistent when defining CAD based 5 on diagnostic codes only, rather than with the inclusion of procedure codes in the G&H 6 definition ( Table S5). CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint calculated for coronary artery disease (CAD), and incremental R 2 was calculated for its continuous risk To assess whether the performance of PGS based on EUR GWAS could be improved in BPB, 16 we next constructed PGS using the clumping and P-value thresholding (C+T) method and 17 optimised them separately within G&H and eMERGE. The numbers of SNPs in the best C+T 18 PGSs are similar between eMERGE and G&H, and PGSs for lipids contained fewer SNPs 19 (194 to 454) than other traits (>20,000; Table S10, Figure S7). C+T PGSs and PGSs from 20 the PGS Catalog showed similar performance in G&H across traits, although they were 21 optimised in different ancestry populations (BPB and primarily EUR, respectively; Figure 3A). 22 For BMI, triglycerides and HDL-C, we observed slightly larger differences in predictive 23 accuracies between G&H and eMERGE for C+T PGSs than observed with the PGS Catalog 24 scores ( Figure 3B). 25

26
We then assessed whether PGS methods that account for ancestry differences improved 27 predictive accuracy in G&H. PGSs were constructed using a meta-score strategy 41 , combining 28 the EUR-derived PGS (described above) and that from UKBB SAS samples. The improvement 29 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint in accuracy was modest (5-11%) ( Figure S8). This may be due to the low sample sizes in the 1 UKBB SAS GWASs.  Table S5). Individuals in the top quintile of PGS were 9 predicted to have a 2.2-fold increase (95% CI: 1.78-2.76) in disease risk relative to the middle 10 quintile (quintiles were determined in controls; Figure 3D). We investigated the additional 11 predictive power of PGS on top of established clinical risk factors for CAD, and the net 12 reclassification improvement (NRI) achieved by adding the PGS to a clinical risk score. 13

14
To calculate the clinical risk score, we used the QRISK3 algorithm to estimate 10-year risk of 15 cardiovascular disease at a baseline time point, selected so that the participants in G&H had 16 about 10 years of follow-up. QRISK3 was a strong predictor of CAD events and had a 17 concordance index (C-index) of 0.843 (95% CI: 0.828-0.858; Figure 4A, Table S11). 18 Consistent with previous findings in EUR individuals 10 , the CAD PGS was uncorrelated with 19 QRISK3 (Pearson's correlation coefficient r=-0.0056 and p-value=0.62). We followed  to construct an integrated score combining QRISK3 and the CAD PGS. The 21 integrated score had a non-significant improvement in the C-index (0.853, 95% CI: 0.838-22 0.867). However, compared with QRISK3 alone, the integrated score showed significant 23 improvement in reclassification (categorical NRI: 3.9%; 95% CI: 0.9%-7.0%) using a 10-year 24 risk threshold of 10% based on the threshold for preventive intervention with statin treatment 25 recommended by National Institute for Health and Care Excellence 56 . The integrated score 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 reclassified 3.2% of the population as high risk and 2.5% as low risk (Table S11). This 1 improvement was mostly driven by the enhanced identification of CAD cases in people at 25-2 54 years old (NRI in cases being 7.0% versus NRI in controls being -1.2%), and of controls in 3 people at 55-84 years old (NRI in cases being 0.0% versus NRI in controls being 6.8%) 4 ( Figure 4B, Table S11). The QRISK3 classified most (91.4%) of the individuals at 55-84 years 5 old as high risk. Using the integrated score, 7.6% of the individuals older than 55 years were 6 down-classified from high to low risk (Table S11). Using continuous NRI, the integrated score 7 showed significant improvement (27.0%; 95% CI: 17.7%-36.2%) and similar trends in age 8 groups ( Figure S9, Table S11). CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 26 Causal effects of CAD risk factors largely consistent across 1 ancestries 2 We carried out two-sample Mendelian randomisation (MR) analyses to assess the causal 3 effects of the risk factors on CAD in G&H and compared findings with EUR samples from 4 eMERGE. For G&H, we used transferable loci as genetic instruments to benefit from the 5 precision of large EUR discovery GWAS whilst ensuring only valid instruments are used. In 6 eMERGE, causal effects for BMI, BP and lipids, except TG, were statistically significant 7 ( Figure 5). Consistent with this, we found that higher BMI (OR=1.73, p-value=0.01), higher 8 LDL-C (OR=1.55 p-value=4x10 -4 ) and lower HDL-C levels (OR=0.75, p-value=8x10 -3 ) were 9 causally associated with increased risk of CAD in G&H. The OR for LDL-C was larger than 10 the one in eMERGE (OR=1.15) although with overlapping confidence intervals (CI: 1.03-1. 29 11 in eMERGE, CI: 1.22-2.00 in G&H). The effects for SBP and DBP were not statistically 12 significant in G&H. However, both had relatively small numbers of loci as instruments and 13 confidence intervals of the effect estimates were wide. 14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021.  We also assessed different strategies for instrument selection in G&H, such as using all loci 14 associated at genome-wide significance in EUR GWAS for the risk factors ( Figure S11). When 15 following the standard approach of using an independent ancestry-matched sample (UKBB 16 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10.1101/2021.06.22.21259323 doi: medRxiv preprint SAS) to derive the instruments, an insufficient number of genome-wide significant instruments 1 (p<5x10 -8 ) were identified ( Figure S10). To address this, we also tested a less stringent p-2 value threshold (p<5x10 -5 ) for selecting instruments. For the lipid biomarkers, the results were 3 consistent regardless of which loci were chosen as instruments ( Figure S11). However, the 4 association of BMI with CAD was significant only for transferable loci (Figure S11). 5 6 We found evidence of heterogeneity between causal estimates based on Cochran's Q statistic 7 for DBP when using the established loci as instruments (p-value=0.04), LDL-C when using the 8 UKBB SAS-ascertained loci (p-value=0.02) and HDL-C for transferable loci (p-value=1x10 -3 ). 9 However, the results of the weighted median and weighted mode models were consistent with 10 those obtained by the inverse-variance weighted MR model (Table S12). 11 12 Discussion 13 14 We conducted the first study to systematically assess the transferability of genetic loci and 15 PGSs for cardiometabolic traits in SAS individuals with real-world clinical data, using ~22,000 16 individuals from the G&H cohort. For lipids and blood pressure, we found evidence that causal 17 genetic variants at known loci and beyond are widely shared with EUR. The prediction 18 accuracy of PGSs derived from EUR GWASs for these traits was similar between G&H and 19 EUR samples. However, the predictive performance of BMI and CAD PGS was reduced by 20 22 and 58%, respectively (for the PGS Catalog scores), in G&H, and CAD also had fewer 21 transferable loci. A CAD PGS optimised for South Asians nonetheless yielded an appreciable 22 improvement in risk reclassification when combined with the QRISK3 clinical risk score. 23

24
Other genetic studies of CAD and related traits that have evaluated reproducibility of 25 established loci in SAS populations have either been limited by small sample sizes or have 26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 restricted their comparisons to the index SNP identified in the GWAS, which does not take LD 1 into account [34][35][36] . A recent study compared genetic determinants of >200 lipid metabolites in 2 5,000 South Asians from Pakistan and 13,000 Europeans and found high overlap in the 3 detected associations 57 . Using a new method, our paper goes further by empirically 4 demonstrating that, in most cases where loci do not replicate, it is due to lack of power. These 5 findings suggest that, in large part, the genes and pathways that influence risk of CAD are 6 shared between these ancestrally divergent populations. One surprising finding was that the 7 major LDL-C locus at CETP was not associated with this biomarker in G&H but exhibited 8 pleiotropic effects particularly on HDL-C. Abnormalities in CETP are linked to accelerated 9 atherosclerosis and might play an important role in increasing risk in SAS 58 . 10

11
Of those previously reported cardiometabolic loci that contained variants significantly 12 associated in G&H, 30-74% did not show evidence of shared causal variants. This suggests 13 that, although the genes and pathways are likely to be shared between ancestral groups, there 14 is heterogeneity with respect to the causal alleles. BMI had the lowest proportion of 15 transferable loci with shared causal variants as well as lower transferability of the PGS in G&H 16 and a genetic correlation significantly lower than one. SAS individuals are known to have 17 higher visceral fat at the same BMI compared to EUR individuals in Western countries 59,60 . 18 Consistent with this, the causal effect of BMI was significant only when using the transferable 19 loci as instruments in the Mendelian randomisation analysis. Visceral adiposity is a strong risk 20 factor for cardiometabolic diseases, independent of total fat mass; these findings warrant 21 further study and may suggest that BMI may not be an optimal biomarker of adiposity in SAS 61 . 22 23 Mendelian randomisation has emerged as a powerful tool to explore the causal effects of risk 24 factors on disease outcomes. Statistical power can be the limiting factor when extending these 25 analyses to non-EUR populations because independent ancestry-matched GWAS for risk 26 factors of interest may not be sufficiently large. To increase power to estimate the causal 27 effects of risk factor traits on CAD in BPB, we used genetic instruments derived from large 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 EUR GWAS. Some of the loci may be invalid instruments for other populations. However, 1 restricting the established loci to the ones that were transferable in this population successfully 2 addressed this issue for BMI and shows promise as a new approach for trans-ancestry 3 Mendelian randomisation. An assumption that requires further study is whether the effect sizes 4 of transferable loci are the same for each ancestry group. 5 6 We observed variable levels of PGS transferability from EUR into BPB individuals for the 7 cardiometabolic traits that were investigated in this work, with relative accuracy in G&H versus 8 eMERGE ranging from 131% for DBP to 42% for CAD. Consistent with previous studies 37,62 , 9 PGSs for HDL-C and triglycerides had similar predictive accuracy between the two ancestry 10 groups. We explored the factors that may impact relative accuracy of PGSs. Based on a 11 recently proposed theory, relative accuracy is proportional to the product of the trans-ethnic 12 genetic correlation and the ratio of heritability estimates 7 . We considered the effect on the 13 relative accuracy of the trans-ethnic genetic correlation, ratio of heritability estimates in G&H 14 versus eMERGE, as well as the product of the previous two terms. However, none of them 15 showed a significant association with the relative PGS performance (Figure S12). This may 16 be because the theory was derived for PGSs based on genome-wide significant SNPs 17 (whereas our PGSs include many SNPs with less significant p-values), and because the 18 relative accuracy also depends on differences in allele frequencies and LD patterns at these 19 SNPs between populations, which we have not factored in and may differ between traits. 20 21 Based on findings in lipid traits, the Global Lipids Genetics Consortium recently claimed that 22 GWASs with high enough sample sizes could lead to PGSs with equally high accuracy across 23 ancestry populations, even if the GWASs were conducted in predominantly EUR samples 62 . 24 However, we do not fully agree that this claim can be generalised beyond lipid traits, since it 25 depends on the extent to which the causal variants are shared across ancestry groups. For 26 example, the accuracy of C+T PGS for BMI decreased by 38% in G&H, whereas that for TG 27 decreased by only 17% and that for HDL-C did not decrease, although the sample size of the 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 input GWAS for BMI was much larger than that for lipids (about 700,000 versus 300,000; 1 Table S4). This is likely due to the relatively lower fraction of shared causal variants (26%) at 2 transferable loci for BMI and the relatively lower genetic correlation (significantly lower than 1 3 for BMI while close to 1 for lipids), which will not be ameliorated with larger sample sizes of 4 Europeans. there is more to be gained from more powerful EUR GWASs, even without adding samples of 14 the target ancestry. However, increasing diversity in GWASs will greatly improve the resolution 15 of fine-mapping and the power to identify the causal variants by leveraging the LD differences 16 across ancestries 64 . 17

18
We assessed the clinical value of the PGS for CAD on top of the traditional clinical risk factors 19 captured in the QRISK3 algorithm. Similar work has been done previously in research cohorts 20 [9][10][11][12] ; our study represents an important addition since it captures the noise with which QRISK3 21 is actually measured within a real-world clinical setting (as opposed to using comprehensive 22 measures taken for research purposes), which may affect performance of integrated risk 23 models combining these factors with PGSs. We note that only about 4% of the ~8 million 24 individuals used for developing QRISK3 were of South Asian ancestry 26 , and the weights for 25 each conventional risk factor might not be optimal for SAS individuals. QRISK3 was developed 26 to predict cardiovascular disease (CVD), which is a composite outcome of CAD and stroke. 27 However, our analysis focused on CAD, which is an important component of CVD and the 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021.  42 . The integrated score 3 combining PGS and QRISK3 showed significant reclassification improvement against QRISK3 4 alone (NRI 3.9% (95% CI: 0.9-7.0%)). Previous studies in UKBB EUR samples reported 5 similar improvement, with NRI estimates of 3.5% (95% CI: 2.4-4.5%) 10 and 3.7% (95% CI: 6 3.0-4.4%) 9 in two different analyses using CAD as the outcome. However, these NRI 7 estimates are probably inflated by using UKBB samples that are healthier than the general UK 8 population without recalibrating risk to a primary care setting 11 . In G&H, the PGS improved 9 identification of high-risk individuals in people younger than 55 years, and correctly down-10 classified low-risk individuals in people older than 55 years, both of which are important in a 11 clinical setting. We anticipate that, like EUR individuals [9][10][11] , the British Pakistani and 12 Bangladeshi community (and potentially other SAS populations) would also benefit from the 13 use of integrating PGS in primary prevention settings. 14 15 Our study has several limitations. Firstly, due to the limited sample size in each age-by-sex 16 subgroup, we could not recalibrate risk prediction models in G&H to what would be expected 17 in an unbiased primary care setting 11 . Secondly, while the G&H cohort has enabled us to 18 assess the potential utility of genetics in an under-represented population using data from 19 electronic records, each of the cohorts examined here is unique. Differences in ascertainment 20 (including the age distribution) and clinical measurements within different cohorts and 21 healthcare systems may have impacted the genetic associations. Ideally future studies would 22 compare populations with different ancestries collected in the same real-world healthcare 23 setting, but with sufficient sample sizes in each ancestry group to enable well-powered 24 comparisons. The BioMe biobank in New York contains individuals from multiple ancestries 25 with linked EHR data, but the number of self-reported SAS individuals is very limited (N=622) 26 65 . 27 28 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 In conclusion, our work provides the first comprehensive assessment of the transferability of 1 cardiometabolic loci to a non-EUR population and its impact on two key applications of 2 genetics, causal inference and risk prediction. Our protocol and our new approach for 3 transferability can serve as methodological standards in this developing field. We have shown 4 high transferability of GWAS loci across several cardiometabolic traits between EUR and BPB 5 populations. The transferability of PGSs is trait-specific. Our results suggested there would be 6 clinical value in adding PGS to conventional risk factors in the prediction of CAD in primary 7 care settings to improve the more efficient use of preventive interventions, such as lipid-8 lowering medications. Our investigation contributes to the increasing representation of 9 individuals of non-European ancestry and lower socio-economic status in research studies, 10 which we hope will help to decrease health disparities. 11 12 Acknowledgements 13 We thank Social Action for Health, Centre of The Cell, members of our Community Advisory 14 Group, and staff who have recruited and collected data from volunteers. We thank the NIHR 15  Tables 1-12  21 22 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat 23 Commun. 2020;11:3865. 24 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 Network, Crosslin DR. The eMERGE genotype set of 83,717 subjects imputed to ~40 23 million variants genome wide and association with the herpes zoster medical record 24 phenotype. Genet Epidemiol. 2019;43: 63-81. 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 Samani NJ, Watkins H, Deloukas P. Association analyses based on false discovery rate 23 implicate new loci for coronary artery disease. Nat Genet. 2017;49:1385-1391 24. Liu DJ, Peloso GM, Yu H, Butterworth AS, Wang X, Mahajan A, Saleheen D, Emdin C, 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 36. Chambers JC, Elliott P, Zabaneh D, Zhang W, Li Y, Froguel P, Balding D, Scott J, 23 Kooner JS. Common genetic variation near MC4R is associated with waist 24 circumference and insulin resistance. Nat Genet. 2008;40:716-718. 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 Swaminathan K, Gupta R, Mullasari AS, Sigamani A, Kanchi M,Peterson AS,23 Butterworth AS, Danesh J, Di Angelantonio E, Naheed A, Inouye M,Chowdhury R,24 Vedam RL, Kathiresan S, Gupta R, Khera AV. Validation of a Genome-Wide Polygenic 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101 South Asians be so susceptible to central obesity and its atherogenic consequences? 23 The adipose tissue overflow hypothesis. Int J Epidemiol. 2007;36:220-225. 24 60. Shah AD, Kandula NR, Lin F, Allison MA, Carr J, Herrington D, Liu K, Kanaya AM. Less 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 24, 2021. ; https://doi.org/10. 1101