Abstract
Genotype imputation is crucial for GWAS, but reference panels and existing benchmarking studies prioritize European individuals. Consequently, it is unclear which publicly available reference panel should be used for Pakistani individuals, and whether ancestry composition or sample size of the panel matters more for imputation accuracy. Our study compared different reference panels to impute genotype data in 1814 Pakistani individuals, finding the best performance balancing accuracy and coverage with meta-imputation with TOPMed and the expanded 1000 Genomes (ex1KG) reference. Imputation accuracy of ex1KG outperformed TOPMed despite its 30-fold smaller sample size, supporting efforts to create future panels with diverse populations.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
J.X. and L.M.H. are both supported by the National Institute of Mental Health grant R01MH118278. L.M.H. also acknowledges funding from NIMH (R01MH124839, RM1MH132648, R01MH125938) and National Institute of Environmental Health Sciences (R01ES033630). D.L., B.F., E.C., and A.W.C. were supported for this work by NIH R01MH109536. A.H., J.A.K., M.A., and T.B.B. are supported by NIMH grants R01MH112904 and R01MH123775. T.B.B. and G.G. were both supported by R01MH123451. G.G. was also supported for this work by NIH R01MH104964. R.E.P., T.B.B., and L.M.H. are supported by NIMH grant R01MH125938. R.E.P. also received support from the Brain & Behavior Research Foundation NARSAD grant 28632 PS Fund. The funding body is not involved in the study design, data collection, data analysis, result interpretation, and writing of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was approved by the Ethics committee at University of Peshawar, University of Health Sciences, Lahore and Lahore Institute of Research and Development in Pakistan.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Datasets used in this study are not publicly available because it contains private patient data in Pakistan, but they are available from the corresponding author on request. The source code for the analyses is available and deposited in GitHub (https://github.com/xuj18/).