TY - JOUR T1 - Comprehensive analysis of <em>GBA</em> using a novel algorithm for Illumina whole-genome sequence data or targeted Nanopore sequencing JF - medRxiv DO - 10.1101/2021.11.12.21266253 SP - 2021.11.12.21266253 AU - Marco Toffoli AU - Xiao Chen AU - Fritz J Sedlazeck AU - Chiao-Yin Lee AU - Stephen Mullin AU - Abigail Higgins AU - Sofia Koletsi AU - Monica Emili Garcia-Segura AU - Esther Sammler AU - Sonja W. Scholz AU - Anthony HV Schapira AU - Michael A. Eberle AU - Christos Proukakis Y1 - 2021/01/01 UR - http://medrxiv.org/content/early/2021/11/13/2021.11.12.21266253.abstract N2 - GBA variants cause the autosomal recessive Gaucher disease, and carriers are at increased risk of Parkinson’s disease (PD) and Lewy body dementia (LBD). The presence of a highly homologous nearby pseudogene (GBAP1) predisposes to a range of structural variants arising from either gene conversion or reciprocal recombination, the latter resulting in copy number gains or losses, complicating genetic testing and analysis. To date, short-read sequencing has not been able to fully resolve these or other variants in the key homology region, and targeted long-read sequencing has not previously resolved reciprocal recombinants. We present and validate two independent methods to resolve recombinant alleles and other variants in GBA: Gauchian, a novel bioinformatics tool for short-read, whole-genome sequencing data analysis, and Oxford Nanopore long-read sequencing after enrichment with appropriate PCR. The methods were concordant for 42 samples including 30 with a range of recombinants and GBAP1-related mutations, and Gauchian outperforms the GATK Best Practices pipeline. Applying Gauchian to Illumina sequencing of over 10,000 individuals from publicly available cohorts shows that copy number variants (CNVs) spanning GBAP1 are relatively common in Africans. CNV frequencies in PD and LBD are similar to controls, but gains may coexist with other mutations in patients, and a modifying effect cannot be excluded. Gauchian detects a higher frequency of GBA variants in LBD than PD, especially severe ones. These findings highlight the importance of accurate GBA mutation detection in these patients, which is possible by either Gauchian analysis of short-read whole genome sequencing, or targeted long-read sequencing.Competing Interest StatementXC and MAE are employees of Illumina Inc. S.W.S. serves on the Scientific Advisory Council of the Lewy Body Dementia Association. S.W.S. is an editorial board member for the Journal of Parkinson Disease and JAMA Neurology. AHVS is supported by the UCLH NIHR BRC.Funding StatementThis study was supported in part by the Intramural Research Program of the National Institutes of Health (National Institute of Neurological Disorders and Stroke, project numbers: 1ZIANS003154) and the JPND through the MRC grant code MR/T046007/1.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:National Research Ethics Service London–Hampstead Ethics Committee gave ethical approval for research involving RAPSODI samples. National Research Ethics Service Committee central–London gave ethical approval for research involving samples obtained from Queen Square Brain Bank. UCL Ethics Committee gave ethical approval for research involving PPMI samples.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesGauchian will be a part of Version 3.10 of the Illumina DRAGEN (Dynamic Read Analysis for GENomics) Bio–IT platform. ONT and UNCALLED scripts used will be downloadable at https://github.com/marcotoffoli. Individual–level genome sequence data for the PD patients, LBD patients, and neurologically healthy controls are available at AMP–PD (https://amp–pd.org). The datasets of DNA from QSBB brain samples and NHGRI samples generated during this study (Illumina WGS and targeted ONT sequencing) will be made available on the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home), ascession number PRJEB48317. The datasets only include read alignments to GBA/GBAP1 regions (other regions of the genome have been removed or masked) to minimize the amount of genetic information made available for public access. The datasets of DNA from PPMI samples generated during this study (targeted ONT sequencing) will be made available on the PPMI repository (https://www.ppmi–info.org/). ONT sequencing data on living individuals are not available due to consent / IRB restrictions. Additional data produced will be available upon reasonable request to the authors ER -