Abstract
Advancements in long-read sequencing technology have accelerated the study of large structural variants (SVs). We created a curated, publicly available, multi-ancestry SV imputation panel by long-read sequencing 888 samples from the 1000 Genomes Project. This high-quality panel was used to impute SVs in approximately 500,000 UK Biobank participants. We demonstrated the feasibility of conducting genome-wide SV association studies at biobank scale using 32 disease-relevant phenotypes related to respiratory, cardiometabolic and liver diseases, in addition to 1,463 protein levels. This analysis identified thousands of genome-wide significant SV associations, including hundreds of conditionally independent signals, thereby enabling novel biological insights. Focusing on genetic association studies of lung function as an example, we demonstrate the added value of SVs for prioritising causal genes at gene-rich loci compared to traditional GWAS using only short variants. We envision that future post-GWAS gene-prioritisation workflows will incorporate SV analyses using this SV imputation panel and framework.
Competing Interest Statement
Boehringer Ingelheim, a privately-owned pharmaceutical company, funded this initiative. DD and LS are independent contractors and declared no conflicts of interest. GMB, JHL, and JKP are employees of Gencove and declared no conflicts of interest.
Funding Statement
This study was funded by Boehringer Ingelheim, a privately-owned pharmaceutical company.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study used sequencing data from "1000 genomes" project individuals and genetic and phenotype data from participants of UK Biobank. Genetic data of "1000 genomes" project individuals is publicly available, and access to the UK Biobank data was granted under Application Number 57952.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵+ Joint senior authors
Revised benchmarking of SV calls against existing SV datasets. Extended the method description. Corrected typos and clarified the language.
Data availability
Long-read sequencing imputation panel is available via the OpnMe initiative of Boehringer Ingelheim GmbH (details: https://opnme.com/genomiclens). Imputed SVs of UK Biobank participants will be made available via UKB RAP. Full summary statistics for the (SV- and SNV-based) GWASs carried out in UK Biobank are available upon request.