Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
We would like to thank all reviewers for their highly valuable comments, reviewing the article, and suggesting changes to improve its overall structure, clarity, and comprehension. Please find below our point-by-point responses to each reviewer’s comments. Lines, figures, and tables we refer to in the responses correspond to the clean copy of the revised manuscript.
Reviewer #1:
- In the introduction the authors list 4 databases of ultra-conserved non-coding elements, and choose to work with the UCNEbase that detects UCNE regions by comparing human and chicken, as they state that it is one of the most comprehensive resources. Can the authors comment on the extent of overlap of UCNE regions identified in this database compared to other databases. It would also be helpful to add an explanation in the Introduction on the advantage of comparing the human genome to the chicken genome, e.g., is chicken the most distant vertebrate with a high quality genome, or another reason? And can the authors comment on the extent of conservation of the UCNEs compared to other species - in the UCNEbase paper, orthologues for the UCNEs are identified in 18 vertebrate species including reptiles, amphibians and fish.
We have addressed these comments and incorporated them in the introduction (lines 46-47; 49-52). As stated, it is not straightforward to evaluate to which extent UCNE regions overlap with those collected in other databases due to the different scopes and methods of these resources. We clarified why we selected this database, which was precisely based on the comments mentioned by the reviewer, i.e., sufficient evolutionary distance to identify functional regions confidently and high quality genome assemblies. Regarding the extent to which UCNEs are conserved in other species, the UCNEdatabase indeed provides additional information with respect to UCNE orthologues in other vertebrate species, including reptiles, amphibians, and fish. As we consider that this comparison is beyond the scope of the present study, it was not included in the main manuscript. However, to address this interesting point raised by the reviewer, we evaluated the proportion of UCNEs found in different species with respect to those annotated for human-chicken. As show in the figure below, UCNEs are conserved to a larger extent up to Xenopus tropicalis, for which most of the UCNEs annotated for human-chicken have corresponding orthologues in this species. Although to a minor extent, UCNEs are also conserved across more distant species (e.g., fish), for which approximately half of the UCNEs annotated for human-chicken have orthologues.
- The authors briefly describe how potential target genes were assigned to UCNEs in the Method section (section begins on page 21, line 407 and on lines 379-381). They note that they used the Genomic Regions Enrichment of Annotations Tool (GREAT). Can the authors provide additional details on what this method does and also provide a high level sentence in the Results section on page 8, lines 144-146, on how target genes were assigned to UCNEs, as well as in the Discussion on page 16, line 289. Based on the Methods description on lines 409-414, a UCNE was associated with any gene within 1Mb of the UNCE that was expressed in retina and whose curated regulatory domains overlapped the UNCE. Is this correct? How were the regulatory domains curated (line 411-412)? Can the authors please clarify this important point.
Additional details about the GREAT algorithm have been included in the Methods section (lines 410-424) as well as a high-level sentences in the Results (lines 140-141) and Discussion (lines 276-280). It is correct that a UCNE was associated with any gene within 1Mb of the UCNE that was expressed in retina and whose curated regulatory domains overlapped the UCNE. As stated now (lines 420-424), these regulatory domains are supported by experimental evidence demonstrating that a gene is directly regulated by an element located beyond of its putative regulatory domain. We specified these domains for the utilized version of GREAT as well.
- There are other approaches for linking putative transcriptionally active genomic regions with target genes in specific tissues or cell types through correlation analysis of ATAC-seq peaks (chromatin accessibility) and gene expression in RNA-seq data. A commonly used peak-to-gene linkage method is implemented in ArchR (Granja et al, 2021, PMID: 33633365). The authors note that they used scATAC-seq gene scores and scRNA-seq gene expression to further characterize the proposed target genes of UCNEs. It is not clear what the authors mean by "scATAC-seq gene scores". Please define "scATAC-seq gene score" and add a reference. Can the authors compare the target gene mapping to UCNEs using GREAT and gene expression filtering to that using peak-to-gene linkage based on retina tissue or single cell ATAC-seq and RNA-seq data?
We have defined explicitly what scATAC-seq gene scores are in the Methods section (lines 436-437) along with its reference. We have also addressed this important point and compared the overlap between the set of target genes predicted by GREAT and those assigned by the peak-to-gene linkage method implemented by ArchR. Details of this analysis, its results, interpretation are included in the Methods (lines 441-446), Results (lines 158-163; Supplementary Table 4), and Discussion (lines 280-282) sections.
- For gene expression filtering (line 419) the authors quantify transcript expression of retina from FASTQ samples using the Kallisto method, and then note that they did the filtering on gene expression levels (TPM<0.5). Please add details on how you went from transcript expression levels to gene expression levels for the filtering, or was the filtering performed at the transcript level?
We have added details on how transcript-level quantification estimates were summarized at gene level, for which the filtering was performed (lines 429-430).
- The authors use the words "active UCNE", first mentioned in the Results on line 144. Can the authors define what they mean by "active UCNE". What information/evidence is used to ascertain that a UCNE is indeed active. Overlap of a UCNE with a chromatin accessibility region from ATAC-seq or DNAase-Seq would only suggest that the UCNE may be active. Intersection with enhancer activity measured with in vivo enhancer reporter assays in transgenic mice from the VISTA enhancer browser provides stronger evidence of transcriptional activity. The authors might want to distinguish between putatively actively and active based on the functional support.
We thank the reviewer for this relevant comment to address the nuances of defining active UCNEs. The reviewer’s assumptions are correct and hence these terms were clarified throughout the entire text. The term functional is now only used when referring to UCNEs for which there is functional support (e.g., PAX6-associated UCNE in line 193) .
- The authors assessed the significance of depletion of common variation (MAF>1%) in the UCNE regions compared to a background of randomly selected genomic regions. In generating the random distribution of regions, did the authors match on the distribution of distances of the UNCEs to the TSS of genes in the randomly selected regions? This may be a confounder. Also, in the legend of Figure 3, lines 191-192, it is stated: "Variant population frequencies within putative retinal UCNEs normalized to a background composed of randomly selected sequences (see Methods).", but we did not find a description of this analysis in the Methods section.
Evaluating the potential confounding effect of the genomic background was indeed a very important point. We have now incorporated the details showing the suitability of such background well as a detail description of how such background was generated (lines 479-481; Figure S1). Additionally, to further support our analysis demonstrating the depletion of common variation within UCNEs, we have included an evaluation of the distribution of genome-wide residual variation intolerance score (gwRVIS) values (PMID: 33686085) compared to this background of randomly selected genomic regions in a human reference cohort (lines 173-178; 487-495; Fig. 3C).
- In regards to intersecting UCNEs with epigenetic marks that detect active or repressed enhancers in retina, the authors used data from Aldiri et al 2017 that measured epigenetic changes during retinal development. Did this dataset contain epigenetic measurements in adult retina? The authors might want to consider using the epigenetic marks/ChIP-seq data from adult human retina in Cherry et al. PNAS 2020 (PMID: 32265282)
We have incorporated the adult-stage data suggested by the reviewer to provide a more comprehensive characterization. Details about the integration of this dataset as well as the results and their corresponding interpretation have been incorporated in the Methods (lines 372-374), Results (lines 115-117; Supplementary Table 2), and Discussion (lines 271-273) sections accordingly.
- With respect to the examination of rare variants that may be associated with rare eye disease in retina active UCNEs, for the interpretation of the results, it would be helpful to get more information on the distribution of rare variants found in UCNEs associated in this study to known IRD genes in all affected individuals in families, if this information is available in the 100,000 Genomes Project.
Although it is indeed a relevant point, this information cannot be retrieved in the 100,000 Genomes Project. As it is a restricted research environment, we are only allowed to query sequencing data corresponding to participants enrolled within the framework of our specific sub-domain, namely “Hearing and Sight”, and therefore evaluating the distribution of rare variants in all affected individuals is not feasible.
- In the Methods section on lines 450-451, the authors mention that they performed variant screening of retinal disease genes, referencing the Genomics England Retinal Disorder panel and Martin et al., 2019. Can the authors add to the Methods and Results sections how many retinal disease genes were initially tested. Also, to get a sense of the specificity of the overlap of rare variants in the 100k Genome Project cohort with UNCE regions, it would be informative to show a distribution of the number of rare variants <0.5% that passed the filtering in gnomAD per eye disease gene before the overlap with UNCEs.
We specified the number of retinal genes that were tested in the Method section (line 471). In addition, as suggested by the reviewer, we generated allele frequency distributions for all variants retrieved within a selection of 25 disease-gene associated loci and their corresponding UCNEs in order to assess the specificity of the overlap between rare variants and UCNE regions (lines 181-182, 496-501; Figure S2).
- The authors found "an ultrarare SNV (chr11:31968001T>C) within a candidate cis-regulatory UCNE located ~150kb upstream of PAX6. This variant was found in four individuals of a family segregating autosomal dominant foveal abnormalities". They tested the functional effect of this element with a reporter assay in zebrafish and found that the UNCE affects expression in the eye, forebrain, and neural tube. It would add further value if the authors were to test the effect of this SNV in the UCNE on the reporter expression pattern, using CRISPR/cas9?
That is a very relevant point. We have tested the effect of this SNV in the UCNE on the reported expression pattern using the same experimental setup that we used for testing the wild-type construct, namely transgenic enhancer zebrafish assays. However, we could not obtain conclusive results, most likely due to the limitations posed by testing these regions outside their native genomic context. Therefore, additional experimental work (e.g., CRISPR-based) should be performed to address this question. This is, however, beyond the scope of the present study, for which the main focus was the identification and functional annotation of ultraconserved cCREs. We have incorporated the details, results, and interpretation of the assays performed mutant construct in the Methods (lines 525-527; Supplementary Table 12), Results (lines 235-238; Supplementary Table 10), and Discussion (lines 350-353) sections.
- The authors found rare variants in UCNEs linked to 45 IRD genes. Can the authors provide additional information on the functional genomic annotations of these UCNEs and distance to the target genes. The UCNEs were characterized with respect to their genic features in the original paper (UCNEbase, Dimitrieva et al., NAR, 2013), e.g., intergenic, intronic and 3'/5' UTR. Also, it would be useful for clinical applications to provide the start and end positions of the UCNEs that contain the rare variants associated with their 45 IRD genes in Supplementary Table 6.
Additional functional genomic annotations, genic features following those of the original UCNE paper, and the distance to the TSS of these 45 target disease-associated genes have been incorporated in (new) Supplementary Table 5. The start and end positions of the UCNEs that contain the rare variants have also been indicated in new Supplementary Table 7.
- A total of 724 target genes were assigned to 1,487 UCNEs that displayed candidate cis-regulatory activity. Given the interest in using UCNEs to help identify potential pathogenic mutations that lead to IRDs, can the authors note in the Results section how many of the 724 target genes are IRDs.
We thank the reviewer for this important point. From the total of 724, a total of 27 genes are annotated as IRD genes, of which (interestingly) 23 were kept as found to be expressed in the retina. This has been clarified in the Results section (line 166-168).
- In the Discussion on page 15 line 259, can the authors clarify if variation found in UCNEs were only associated with rare disease or also with common diseases.
We have clarified that variation found in UCNEs has only been associated with rare diseases (line 247).
Minor edits:
- In abstract, the authors might consider changing the words "rare eye diseases" on lines 20 and 22 to "rare retina degeneration diseases", and on lines 88-89.
We thank the reviewer for this comment. However, we consider that rare eye diseases is a more suitable term for our purpose as it includes diseases primarily characterized by stationary and non-progressive phenotypes such as North Carolina Macular Dystrophy and fovea hypoplasia.
- In the Introduction on line 49, there seems to be a typo in the number of UCNE regions reported. 4,135 UCNE regions is supposed to be 4,351, based on the original paper (https://academic.oup.com/nar/article/41/D1/D101/1057253).
We have corrected this typo accordingly.
- In the introduction on lines 75-76, these references: Lyu et al., Cell Reports 2021, PMID: 34788628, and Zhang et al., Trends in Genetics 2023, PMID: 37423870, could be added to the following sentence to provide additional: "This cellular complexity is the result of spatiotemporally controlled gene expression programs during retinal development”.
We have now included these relevant references.
- On lines 77 and 84, I would write IRD as plural: IRDs.
This has been amended in the new version.
- In introduction on lines 89-90, it can be further added that you provide experimental support for an ultra rare SNV in a cis regulatory UCNE affecting PAX6.
We have explicitly stated that we provided functional evidence for this UCNE.
- On line 98, the authors refer to Figure 1A when noting that the integration of UCNEs with multi-omics data in human retina allows to evaluate the regulatory capacity of UCNEs across the major developmental stages of human retina. However panel A in figure 1 does not seem to show this point. It shows the comparison of elements across species. Please make appropriate changes to the main text and figure legend.
We have made the appropriate changes and located the reference to this figure in a more relevant part of the text (line 87).
- Please explain what the names appended to the gene symbols in the first column "UCNE ID" in Supplementary Tables 1 and 2 refer to.
We have clarified what these refer to.
- On line 145, can the authors clarify what they mean by "active gene expression in the retina". Is this just another way of referring to genes found to be expressed in retina? If so, it might be clearer just to write: "We annotated the identified active UCNEs to assign them potential target genes and thus assess their association with genes expressed in the retina"
We indeed meant genes found to be expressed in retina. As this phrasing might not be completely clear, we have now changed to the wording suggested by the reviewer.
- One line 156, I would write "regulation of transcription" as listed in the gene ontology terms in Figure 2C, instead of "regulation of gene expression". The authors might want to add this to the Discussion. Can the authors include the full gene set enrichment results from Enrichr in a supplementary table at Padj<0.05 since only the top gene sets are shown in Fig. 2C (at P<1E-13)?
We changed the term to “regulation of transcription” to keep the nomenclature consistent to that of Figure 2C. We have also provided a full gene set enrichment from Enrichr as well (Supplementary Table 3).
- On page 12, line 214, what does "EH38E1530321" Stand for? It seems to specify a distal enhancer-like signatures in bipolar neurons, but I couldn't find this ID in any database.
This refers to the identifier of ENCODE:
https://screen-v2.wenglab.org/search/?q=EH38E1530321&assembly=GRCh38
Additionally, when mentioning a specific UCNE, VISTA enhancer, or ENCODE cCRE (as in this case), we have explicitly included its corresponding identifier.
- In the Methods section on lines 391-392, can the authors give some high-level explanation of the unconstrained integration method: "Single-nucleus RNA-seq of the same tissue and timepoints (GSE183684) were integrated using the unconstrained integration method". Also, can they comment on how retinal cell class identities were assigned (line 393). Was it based on known markers or on previous identification of cell classes and highly variable genes between clusters?
We have included a high-level explanation of the unconstrained method in the Methods section (lines 387-392). We also clarified that the assignment of cell class identities was based on known markers (line 394).
- In the integration of UCNEs with bulk and single cell ATAC-seq and Dnase hypersentitivity regions, can the authors note in the Methods section (lines 400-404) what peak width was used to test for overlap with the UCNEs.
We have specified the peak widths that were used for the overlap with UCNEs (lines 397 and 403).
- On line 436, the word 'and' is missing between "(SNVs, and indels < 50bp)" and "large structural variants".
This has been corrected.
- On lines 443-444, please provide references to the computational tools listed. Please note if default settings/parameters were used.
We have specified that default parameters were used in the analysis (line 464).
- In the following sentence in the Methods section on lines 447-449, it is not clear in the Results section how this was used in the flow of the analysis, and how many cases showed such a similarity in phenotype: "For each candidate variant, we compared the similarities between the participant phenotype (HPO terms) and the ones known for its target gene through literature search and clinical assessment by the recruiting clinician when possible." Can the authors add more detail to the Results section.
As the evaluation of the candidate variants was essentially performed on a case-by-case basis, we opted to include a rather general description of the workflow, which indeed included a comparison of the reported phenotypes with those associated with the putative target gene. An example of such comparison has been included in the Results (lines 186-187) section regarding the cases for which a NCMD-like phenotype was suspected.
- It would be helpful to have a table that describes the different omics datasets used in the paper, with some basic annotations (tissue type, sample size, reference).
This has now been incorporated in Supplementary Table 11.
- Can the authors add references to their sentence in the Discussion on page 17 lines 299-301: "As has been shown before, the phenotype caused by a coding mutation of a developmental gene can be different from the phenotype caused by a mutation in a CRE controlling spatiotemporal expression of this gene."
We clarified that this phrase referred to the case of PRDM13, for which bi-allelic coding pathogenic variants are linked to hypogonadotropic hypogonadism and perinatal brainstem dysfunction in combination with cerebellar hypoplasia (Whittaker et al., 2021), while non-coding variants within its regulatory regions are associated with NCMD.
Reviewer #2:
Minor discretionary suggestions for improving the presentation:
- Wherever a specific UCNE, Vista enhancer or ENCODE cCRE is mentioned, the element should be identified by name or accession code: For instance (Iine 212): _"this variant is located within a UCNE (PAX6Veronica) that is catalogued as a cCRE in ENCODE (EH38E1530321)". UCNE names are particularly important, because they are systematically used as identifiers in the supplemental Tables and thus would enable the reader to easily find additional information about the element mentioned in the main text.
We have now explicitly included all corresponding identifiers throughout the text.
- I also recommend inclusion of the UCNE, Vista and ENCODE cCRE tracks in all genome browser screen shots. The UCNE track is currently included only in Figure 1. Vista and ENCODE cCRE tracks are missing in all browser views.
We have now included UCNE, VISTA, and ENCODE cCREs tracks in the main genome browser figures (Figures 1 and 4).
- Supplementary Table 6: It would be useful to indicate for each variant, the type of ophthalmological disorder (Table S5, column C) it is associated with.
We agree this is a relevant point. However, due to limitations in the (bulk) export of clinical information from the protected Research Environment of Genomics England, inclusion of this type of information is not feasible.
- Fig S2 and supplementary Table 3 are not referred to in the main Text.
We have corrected this and updated the figure and table accordingly.
- Supplementary Table 8: The Table caption should be expanded.
_The contents of each column should be explained. For instance, column F: what means Homo_sapiens|M01298_1.94d|Zoo01|2337? Where does this information come from, what data and software resources were used?
We have expanded the caption of this table to clarify this output, which is derived from the QBiC-Pred tool, a software used for predicting quantitative TF binding changes due to nucleotide variants.
- Line 401 probable typo: 103-105 days (103-125?)
Indeed, this typo has now been corrected.
Reviewer #3:
1) Given that UCNE only accounts for a small fraction of gene regulatory elements, this study is likely with low sensitivity in terms of identifying potential regulatory mutations. Although one would expect that variants in UCNE are more likely to be pathogenic, it is hard to extrapolate from the results to assess the contribution of gene regulatory variant to the disease.
We agree that restricting our analysis to these particular regions is one of the limitations of the study, as stated in the Discussion section (line 304-311). However, one of our main aims was to provide a strategy to reduce the search space for pathogenic variants with a potential regulatory effect. Given the substantial body of literature supporting a regulatory role for these regions and, particularly, the availability of already-existing functional data, we considered that this set of regions could represent a suitable target for such analysis. Indeed, the features evaluated, and the methods presented in this study could be extrapolated and applied in other settings involving other candidate regulatory regions and/or tissues of interest, and their associated disease-phenotypes, for which, in any case, the overall contribution of regulatory pathogenic variation to disease might vary greatly.
- I am wondering how many UCNE overlaps with open chromatin regions specific to the fetal retina and how many UCNE overlaps with adult only. Are UCNEs enriched for developmental genes? If so, how many patients are due to developmental defect?
We have now integrated into our analysis epigenetic measurements in adult retina, in particular the candidate cis-regulatory elements nominated by Cherry et al. PNAS 2020 (PMID: 32265282) based on accessible chromatin and enrichment for active enhancer-related histone modifications in adult human retina. Details about the integration of this dataset as well as the results and their corresponding interpretation have been incorporated in the Methods (lines 372-374), Results (115-117; Supplementary Table 2), and Discussion (lines 271-273) sections accordingly. In particular, out of the 111 UCNEs identified to display the active enhancer mark H3K27ac, 33 were found to maintain this signature at adult stage. Regarding the specific question from the reviewer, the majority of UCNEs overlapping with open chromatin regions are specific to the fetal retina (1,346), with only 7 UCNEs overlapping with open chromatin regions exclusively in adult state. This indeed further supports the major role of the identified candidate cis-regulatory UCNEs in the regulation of developmental gene expression programs, which was already suggested by the Gene Ontology enrichment analysis performed using the set of UCNE target genes as input (Figure 2C; new Supplementary Table 3). As far as the number of patients with development defects that were included in this study, these included: corneal abnormalities (n=62), Leber congenital amaurosis (n=142), ocular coloboma (n=111), developmental foveal and macular dystrophy (n=230), developmental glaucoma (n=94), anophthalmia or microphthalmia (n=120).
- I am wondering if the 431 ultrarare variants found in the UCNEs is higher than expected. This can be tested by comparing patients without visual disorders.
Although it is indeed a relevant point, retrieving sequencing data from patients without visual disorders is not feasible for us. As it is a restricted research environment, we are only allowed to query sequencing data corresponding to participants enrolled within the framework of our specific sub-domain, namely “Hearing and Sight”, and therefore evaluating additional patients from other sub-domains is not doable. Based on previous studies and our observations, common variants are precisely the ones depleted within UCNEs, while ultrarare variation seems to occur at levels comparable to those observed elsewhere in the genome. Therefore, it is reasonable to speculate that this amount of ultrarare variants is not higher than expected as compared to patients without visual disorders. To further demonstrate the high intolerance of UCNEs to common variation, we have included an evaluation of the distribution of genome-wide residual variation intolerance score (gwRVIS) values compared to a set of randomly selected genomic regions in a human reference cohort (lines 173-178; 487-495; Fig. 3C). Additionally, to address this question further, we have also generated allele frequency distributions for all variants retrieved within a selection of 25 disease-gene associated loci and their corresponding UCNEs in order to assess the specificity of the overlap between rare variants and UCNE regions (lines 181-182, 496-501; Figure S2).
- It seems that the ultrarare variants listed in sup table 6 are more abundant in a small number of genes. Is this due to the number/size of UCNEs is larger in these genes?
Indeed, the clustering of UCNEs in genomic regions containing genes coding for transcription factors and developmental regulators (e.g., OTX2, PAX6, ZEB2) seems to be one of their intrinsic properties, hence the observed enrichment for a small number of genes. One reason can be that these neighboring UCNEs cooperate to achieve higher degrees of tissue-specific regulatory accuracy needed for these genes.
- The variant in the putative Pax6 gene regulatory element is intriguing. It would be much more informative if the enhancer with and without the variant is tested in parallel in fish.
That is a very relevant point. We have now tested the effect of this SNV in the UCNE on the reported expression pattern using the same experimental setup that we used for testing the wild-type construct, namely transgenic enhancer zebrafish assays. However, we could not obtain conclusive results, most likely due to the limitations posed by testing these regions outside their native genomic context. We have incorporated the details, results, and interpretation of the assays performed mutant construct in the Methods (lines 525-527; Supplementary Table 12), Results (lines 235-238; Supplementary Table 10), and Discussion (lines 350-353) sections.
- (optional) it would be quite interesting to check the phenotype in fish or mice with the element repressed via technique such as CRISPRi.
Indeed, we fully agree that CRISPR-based techniques would be the ideal experimental approaches to further validate the functionality of the PAX6-associated UCNE and the identified variant in their native genomic context. Conducting these detailed and focused mechanistic studies is, however, beyond the scope of the present work, for which the main focus was the identification and functional annotation of ultraconserved cCREs.