Abstract
Background Transcriptomic studies usually focus on gene and exon-based annotations, and only limited experiments have reported the changes in the reads mapping to introns. The analysis of intronic reads allows the detection of nascent transcription that is not influenced by steady -state RNA levels and provides information on actively transcribed genes.
Methods Here we describe substantial intronic transcriptional changes in Parkinson’s Disease (PD) patients compared to healthy controls (CO) at two different timepoints, at the time of diagnosis (BL) and three years later (V08). We used blood RNA-seq data from the Parkinson’s Progression Markers Initiative cohort subjects and identified significantly changed transcription of intronic reads only in PD patients during this follow up time.
Results In CO subjects only nine transcripts demonstrated differentially expressed introns comparing BL and V08. However, in PD patients 4,873 transcripts had differentially ex-pressed introns between visits BL and V08. In addition, at the time of diagnosis (BL visit) we identified 836 transcripts and at visit V08 2,184 transcripts with differential intronic expression specific to PD patients. In contrast, reads mapping to exonic regions demonstrated little variation indicating high specificity of intronic transcription.
Conclusion Our study shows that Parkinson’s disease is characterized by substantial changes in the nascent transcription and understanding these changes could help to understand the molecular pathology underpinning this disease.
1. Introduction
The analysis of the transcriptome is usually based on reads matching exons and gene-based annotations. A large part of reads from RNA-sequencing map to the intronic sequences and this is valid for both ribosomal depleted and poly-A selected RNA protocols [1]. It has been described that 38% of total RNA seq reads and only 8% of poly A RNA reads map to the introns [1]. The significance of the intronic reads has remained controversial and this part of the transcriptome is mostly neglected in transcriptomic analyses [2]. However, recent studies have shown that the reads mapping to introns are genuine and reflect the immediate regulatory responses in transcription compared to post-transcriptional or steady-state changes [2, 3]. The intronic reads reflect the presence of the newly transcribed RNA and as such are useful to explore the complexity of the nascent RNA transcription and co-transcriptional splicing [4-6]. The analysis of intronic reads has been used to develop a detailed transcriptional model within a single sample showing the utility of intronic reads to estimate the genome-wide pre-mRNA synthesis rate [7]. It’s also been reported that intronic coverage was related to nascent transcription and co-transcriptional splicing [1]. The levels of intronic reads precede the change in exonic reads by 15 minutes, making them very useful to detect immediate responsive changes in the transcriptome [5]. Therefore, analysis of the intronic reads allows the separation of nascent transcription from post-transcriptional changes during the formation of the genome-wide transcriptome.
Parkinson’s Disease (PD) is one of the commonest neurodegenerative diseases with several clearly identified genetic mutations and variations associated with the disease [8-11]. The exact mechanisms of the disease are still not clear and given the complexity of the neuropathology of PD, the genomic network leading to the pathology are similarly likely to be both complex and multifactorial [12-14]. Several studies have attempted transcriptomic analysis correlating with PD and identified transcriptional signatures specific to the disease or involved in the regulation of the splicing of genes expressed in the basal ganglia [15, 16]. The analysis of peripheral tissue transcriptomes (blood and skin) overlaid on CNS PD data has identified significant differences between PD and control transcriptomic profiles [12, 17, 18]. These studies underlined the importance of peripheral tissue analysis to not only determine biomarkers correlating with the disease but also gain insight into neurodegenerative diseases. In this study we compared longitudinally blood intronic transcriptome of PD patients and healthy controls (Table 1) at two different timepoints of disease progression, at diagnosis (baseline, BL) and after three years follow up (V08). To address the immediate changes in the transcriptome reflected by the changes in nascent RNA we used only reads mapping to the intronic sequences. Intronic data were compared with the exonic reads to differentiate steady-state transcription from nascent transcription. We used blood whole transcriptome data from the Parkinson’s Progression Markers Initiative (PPMI) cohort that contains data of PD and CO subjects at these two different timepoints in the trajectory of PD.
Materials and Methods
Datasets
In this study we utilized the Parkinson’s Progression Markers Initiative (PPMI) cohort data that were downloaded from www.ppmi-info.org/data (16 May 2021). The PPMI is a longitudinal cohort of Parkinson’s patients with the aims to describe the progression and biomarkers of Parkinson’s disease. The dataset contains whole transcriptome data from the blood together with genetic and clinical data. For the RNA-seq 1μg of RNA isolated from PaxGene tubes was used and sequencing was performed at Hudson’s Alpha’s Genomic Services Lab on an Illumina NovaSeq6000. All samples went through rRNA and globin reduction, followed by directional cDNA synthesis using the NEB kit. Following second-strand synthesis, the libraries samples were prepared using the NEB/Kapa (NEBKAP) based library kit. Fastq files were merged and aligned to GRCh38p12 by STAR (v2.6.1d) on GENCODE v29.
Workflow
Bam files were imported to the R environment and intronic reads were called using packages “GenomicFeatures” and “GenomicAlignments”. Differential expression was detected using “DESeq2” and functionally annotated using “ReactomePA”, “DOSE” and “clusterProfiler” packages.
Results
We analysed the intronic expression profiles in PD and CO subjects at the time of diagnosis (BL visit) and three years later (V08). There was little differential expression in exons in all our comparisons with lots of changes observed in intronic variation. Few active transcription changes in the CO subjects were observed if we compared intronic transcripts at BL and V08 timepoints (Table 2). In PD patients we detected highly significant and widespread active transcription in the blood from BL to V08. Remarkably, the differences in transcripts were identified almost exclusively only in introns and very few in exons (Table 2, Figure 1). More detailed description of our findings follows.
CO subjects
We analysed the longitudinal changes in the expression of the intronic reads of control subjects at BL and V08 timepoints and identified only limited differences. Nine transcripts showed differential intronic expression below FDR 0.05 (Supplementary Table 1). We also analysed the differential expression of the exonic transcripts, and nine transcripts were differentially expressed with exonic reads (Supplementary Table 2). This shows that on both intronic and exonic levels longitudinal differential expression in control subjects is quite limited and we can conclude that the transcriptome for control subjects is longitudinally very stable.
PD subjects
Next, we analysed only PD patients and compared their longitudinal intronic and exonic transcriptional profile. We identified 4,873 differentially expressed intronic reads in PD patients that differed between the time of diagnosis and after three years follow up (Supplementary Table 3). These 4,873 introns reflect the longitudinal change that is specific for PD patients during the three-year period. This is in stark contrast to the CO cohort that exhibited a very stable transcriptome with very limited changes. At the same time, we detected only 8 exonic reads differentially expressed, showing the specificity of the intronic changes (Supplementary Table 4). Taken together, we can conclude here that PD patients express overwhelming longitudinal changes in the nascent transcription.
Baseline differences
We decided to analyse PD versus CO differences at different timepoints to have two separate cross-sectional snapshots on transcriptional changes. At the time of diagnosis, PD patients had 836 introns differentially expressed compared to the controls (Supplementary Table 5). Only one exon was differentially expressed showing again that the enhanced intronic expression is very specific feature of PD (Supplementary Table 6). These changes are at the time of diagnosis and indicate already existing active transcriptional changes for the moment of clinical presentation.
Differences at the three-year follow up visit
While it is important to have a cross section snapshot at the time of diagnosis, longitudinal study design allows us to explore the time-dependent alterations in the transcriptome. After three years of diagnosis PD patients had 2,184 intronic transcripts differentially expressed (Supplementary Table 7). At the same time only 17 transcripts were from exon sequences indicating again very high specificity toward intronic transcription (Supplementary Table 8). From all the detected 2,184 differential transcripts 329 were identical to the transcripts at the BL timepoint demonstrating major changes in nascent transcriptional changes in the PD group. Also, one transcript, RP11-403I13.4-002, was identical in the exonic and intronic analysis of V08 timepoint.
Taken together, differential transcriptome analysis of introns indicated specific and overwhelming intronic transcription in PD patients compared to CO subjects, this difference was evident at the time of diagnosis and escalated during the three-year progression of the disease.
Pathway analysis of longitudinal changes
We performed functional pathway analysis to identify the enriched pathways linked to the activated intronic transcription and to identify the common theme of these activated transcripts. As longitudinal PD changes were the most remarkable, we used for pathway analysis only PD data and compared the intronic transcriptome profiles between the visits V08 and BL.
Reactome enrichment analysis, although the analysis was performed in blood RNAseq, identified the nervous system related themes to be enriched in the differential profile of the intronic transcription (Figure 2, Table 3). The pathways we identified were neuronal system, protein-protein interactions at synapses, transmission across chemical synapses and muscle contraction related pathways to name some (Table 2). Remarkably, the activation of the neuronal pathways was clear and is illustrated on the Figure 2 as bar plot and dot plot showing the number of genes identified and gene ratio. To illustrate this enrichment further, we used heat plot and tree plot (Figure 3) showing again large number of genes mapping to the pathways. This indicates that the longitudinal changes in active transcription changes in the blood maybe reflecting changes in the nervous system and most likely reflect the longitudinal changes related to PD progression. This statement is supported by the longitudinal comparison of PD and CO groups separately. As very limited transcriptional alterations in the longitudinal CO subgroup compared to PD subgroup were found, we can quite confidently conclude that these pathways and biological functions are directly related to PD.
KEGG enrichment analysis was performed to validate our previous pathway analysis and to identify potential PD specific pathways. Indeed, when using KEGG annotation, the most significantly changed pathways were ubiquitin mediated proteolysis, protein processing in endoplasmic reticulum, mitophagy and autophagy. All these pathways are directly involved in the pathogenesis of PD.
While we analyzed the blood whole transcriptome data, using the intronic reads helped to identify genes with active transcriptional change reflecting the dynamic alterations in the transcriptional balance of the cells. By using blood data on clinical samples, we were able to identify PD specific pathogenetic networks that have been identified to be involved in PD using cellular models and biochemical experiments. These results show that blood transcriptome can reflect nascent transcription specific for the disease condition.
Taken together, analysis of the intronic reads in the whole genome transcriptome study allows to detect active nascent transcription and gives much more detailed information compared to the exon centric steady-state analysis of transcription. We were able to show large scale intronic activation that helped to detect actively transcribed transcripts that we were able to map into functionally and clinically relevant genetic networks. Intronic transcriptome analysis provides information about the condition-specific transcripts that can help to identify genomic patterns specific for the condition being studied.
Discussion
Whole transcriptome analysis is typically based on gene- or exon-based annotation, quite seldom transcript-based annotation is used [19, 20]. Gene-based annotation is an amalgamated approach where the reads mapping to different exons and transcripts will be merged under single functional identifier, the gene [21]. This approach is the most widely used and therefore most of the whole transcriptome studies provide this aggregated information. Inevitably, this approach leads to the loss of power to detect precise and detailed changes in transcription leading to the loss of the sensitivity of the tool. The avoidance of transcript-based annotation is understandable as the bioinformatics tools to call transcripts accurately from short reads data are limited. Only recently Kallisto and Salmon were developed to provide quasi-mapping approach and accurate transcript calling to overcome the issue with transcript detection [22, 23]. Nevertheless, we cannot distinguish nascent from the steady state transcription.
Recently intronic mapping was suggested to be an alternative to identify the transcripts that were recently transcribed [2]. This approach, based on the co-transcriptional splicing and detection of introns, indicates that the transcripts still have introns included. Detecting this event in the transcriptional process gives important additional power to measure the genes that are actively and newly transcribed compared to the steady state stable transcription in the background. The analysis of intronic transcripts has successfully been applied for human basal ganglia data with clear evidence for the reproducibility of the intronic eQTLs (i-eQTLs) and their utility to analyze the rate of transcription [15]. Interestingly, in this paper authors also identified highly specific enrichment of disease-specific transcription suggesting the suitability of the intron based transcriptional analysis. The authors also confirmed the mapping of intronic signals to novel transcripts and validated annotation-independent approach in transcriptomic analysis.
We have identified the longitudinal changes in the intronic transcription in the PD patients. The PD changes could be specific for the disease condition as only minor differences were found in the analysis of the controls. Indeed, functional annotation of the intronic transcripts revealed activation of pathways closely related to the pathophysiology of PD although only blood RNAseq was analyzed. The functional significance of that data awaits further analysis.
Comparison of the intronic and exonic signals has recently been systemically analyzed and the reflection of the nascent transcription by the intronic reads confirmed [2]. Several experimental studies indicate that intronic transcription is a reliable proxy to measure the nascent transcription [7]. Moreover, comparing exonic reads to intronic signals helps to differentiate transcriptional changes from post-transcriptional changes. Therefore, the differences we have identified are caused by the changes in the active transcription and not due to changes related to the normal physiological processes.
Conclusions
In conclusion, we identified highly specific longitudinal nascent transcriptional profile in the blood of Parkinson patients that possibly reflects the changes caused by the molecular pathological processes of the disease and are relevant to improve our understanding about the progression of the disease.
Data Availability
All data produced are available online at www.ppmi-info.org.
Supplementary Materials
Table S1: SupplementaryTable1.xls, Table S2: SupplementaryTable2.xls, Table S3: SupplementaryTable3.xls, Table S4: SupplementaryTable4.xls, Table S5: SupplementaryTable5.xls, Table S6: SupplementaryTable6.xls, Table S7: SupplementaryTable7.xls, Table S8: SupplementaryTable8.xls.
Author Contributions
Conceptualization, S.K.; methodology, A.L.P. and S.K.; formal analysis, S.K.; data interpretation, A.L.P., V.J.B., J.P.Q. and S.K.; writing—original draft preparation, S.K.; writing— review and editing, A.L.P., V.J.B., J.P.Q. and S.K.; funding acquisition, A.L.P. and S.K. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by MSWA, The Michael J. Fox Foundation, Shake It Up Australia and Perron Institute for Neurological and Translational Science.
Institutional Review Board Statement
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Human Ethics Research Office of The University of Western Australia (protocol code RA/4/20/5308 approved on 05.08.2019)
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study by PPMI.
Data Availability Statement
Raw data are available from the PPMI website (www.ppmi-info.org/data (accessed on 19 January 2021)).
Conflicts of Interest
The authors declare no conflict of interest related to this study. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Acknowledgments
This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. Data used in the preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) database (www.ppmi-info.org/data (accessed on 19 January 2021)). For up-to-date information on the study, visit www.ppmi-info.org. PPMI is sponsored and partially funded by The Michael J. Fox Foundation for Parkinson’s Research. PPMI—a public-private partnership—Is funded by the Michael J. Fox Foundation for Parkinson’s Research and funding partners, including Abbvie, Allergan, Amathus Therapeutics, Avid Radiopharmaceuticals, Biogen Idec, Biolegend, Briston-Myers Squibb, Celgene, Denali, GE Healthcare, Genentech, GlaxoSmithKline, Janssen neuroscience, Lilly, Lundbeck, Merck, Meso Scale Discovery, Pfizer, Piramal, Prevail Therapeutics, Roche, Sanofi Genzyme, Servier, Takeda, Teva, UCB, Verily and Voyager Therapeutics.