Abstract
Background Combining individual-level data in genetic association studies (mega-analyses) enhances statistical power for identifying gene-trait associations. However, batch effects from combining variants of different arrays pose a major limitation. Here, we developed a two-step imputation workflow to overcome the array type bias.
Methods Genotype data of 10,647 individuals generated using five different arrays were included. Intermediate array-specific panels were generated and subsequently imputed against the 1000 Genomes Project Phase3 reference panel. Genetic principal component (PC) analysis assessed batch effects in the cohort-combined imputed data. The workflow’s performance was evaluated by comparing imputation quality r2 and allele frequency difference of the proposed two-step imputation to the conventional array-specific imputation as well as its matching with a whole-genome sequenced subgroup for further validation. We performed a genome-wide association study (GWAS) to test for genetic associations with goiter risk and thyroid gland volume, comparing summary statistics of both approaches.
Results The proposed workflow eliminated the batch effect from the first twenty genetic PCs. The outcome of the workflow also showed high correlation with the conventional approach for allele frequencies (r2 > 0.99). GWAS results from the two-step imputation confirmed known associations on thyroid traits and revealed novel loci for thyroid volume (TG, PAX8, IGFBP5, NRG1), and one novel locus for goiter (XKR6), which was not statistically significant following the GWAS meta-analysis of conventional imputation.
Conclusion Our imputation workflow provides high-quality imputation results without technical batch effects, fostering mega-analysis involving multiple genotyping arrays for different genetic association analysis.
Competing Interest Statement
The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript. HJG received travel grants and speakers honoraria from Neuraxpharm, Servier, Indorsia and Janssen Cilag not related to the current project. HV received travel grants and speakers honoraria from Sanofi-Aventis not related to the current project.
Funding Statement
SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania, and the network "Greifswald Approach to Individualized Medicine (GANI_MED)" funded by the Federal Ministry of Education and Research (grant 03IS2061A). Genome-wide data have been partly supported by the Federal Ministry of Education and Research (grant no. 03ZIK012) and a joint grant from Siemens Healthineers, Erlangen, Germany and the Federal State of Mecklenburg-West Pomerania. The project is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 455978266 (A.T.).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study followed the recommendations of the Declaration of Helsinki. The medical ethics committee of the University of Greifswald approved the study protocol, and oral and written informed consents were obtained from each of the study participants.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data and code availability
Developed scripts for the workflow are available on github (https://github.com/GenEpi-psych-UMG/Two_Step_Imputation). The data of the SHIP study cannot be made publically available due to the informed consent of the study participants, but it can be accessed through a data application form available at https://transfer.ship-med.uni-greifswald.de/ for researchers who meet the criteria for access to confidential data. The full results of the GWAS summary statistics are available on the ThyroidOmics Consortium website (http://www.thyroidomics.com).