Alzheimer's disease variant portal (ADVP): a catalog of genetic findings for Alzheimer's disease

Background: Alzheimer's disease (AD) genetic findings span progressively larger genome-wide association studies (GWASs) for various outcomes and populations. These genetic findings are obtained from a single GWAS, joint- or meta- analyses of multiple GWAS datasets. However, no single resource provides harmonized and searchable information on all AD genetic associations obtained from these analyses, nor linking the identified genetic variants and reported genes with other supporting functional genomic evidence. Methods: We created the Alzheimer's Disease Variant Portal (ADVP), which provides unified access to a uniquely extensive collection of high-quality GWAS association results for AD. Records in ADVP are curated from the genome-wide significant and suggestive loci reported in AD genetics literature. ADVP contains curated results from all AD GWAS publications by Alzheimer's Disease Genetics Consortium (ADGC) since 2009 and AD GWAS publications identified from other public catalogs (GWAS catalog). Genetic association information was systematically extracted from these publications, harmonized, and organized into three types of tables. These tables included structured publication, variant, and association categories to ensure consistent representation of all AD genetic findings. All extracted AD genetic associations were further annotated and integrated with NIAGADS Genomics DB in order to provide extensive biological and functional genomics annotations. Results: Currently, ADVP contains 6,990 AD-association records curated from >200 AD GWAS publications corresponding to >900 unique genomic loci and >1,800 unique genetic variants. The ADVP collection contains genetic findings from >80 cohorts and across various populations, including Caucasians, Hispanics, African-Americans, and Asians. Of all the association records, 46% are disease-risk, 13% are related to expression quantitative trait analyses, and 27% are related to AD endophenotypes and neuropathology. ADVP web interface allows accessing AD association records by individual variants, genes, publications, genomic regions of interest, and genome-wide interactive variant views. ADVP is integrated with the NIAGADS Alzheimer's Genomics Database. Researchers can explore additional biological annotations at the genetic variant or gene level and view cross-reference functional genomics evidence provided by other public resources. Conclusions: ADVP is the largest, most up-to-date, and comprehensive literature-derived collection of AD genetic associations. All records have been systematically curated, harmonized, and comprehensively annotated. ADVP is freely accessible at https://advp.niagads.org/.


Introduction
Alzheimer's disease (AD) is a devastating neurological disorder affecting millions of people worldwide and is the most common cause of dementia (Association, 2019).
There are no approved drugs that can slow or treat the disease. The disease is complex and is highly heritable (Gatz et al., 1997). The strongest known genetic risk factor for AD is the ε4 allele of the Apolipoprotein E gene (APOE ε4) (Corder et al., 1993;Genin et al., 2011), but more than one-third of AD cases do not carry any APOE ε4 alleles.
Large-scale genome-wide association studies (GWASs) have been performed to find more genetic risk factors. These led to the discovery of additional common genetic loci associated with the late-onset AD (LOAD) (Harold et al., 2009;Hollingworth et al., 2011;Lambert et al., 2009;Naj et al., 2011;Seshadri et al., 2010). Yet, the identification of genetic contributors to LOAD remains a challenge as LOAD is likely caused by multiple low penetrance genetic variants (Naj & Schellenberg, 2017), with the small sample sizes further complicating the identification of these causal variants.
The Alzheimer's Disease Genetics Consortium (ADGC) was founded in 2009 and funded by National Institute on Aging (NIA), to conduct large sample GWAS to identify genes associated with an increased risk of developing LOAD. ADGC co-founded IGAP (International Genomics of Alzheimer's Project) with three other AD genetics consortia: Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the European Alzheimer's Disease Initiative (EADI), and the Genetic and Environmental Risk in Alzheimer's Disease (GERAD) Consortium. IGAP assembled large Caucasian samples for better statistical power and was able to identify 19 genome-wide significant loci in 2013 (Lambert et al., 2013), and five more loci using more than 30,000 samples in 2019 (Kunkle et al., 2019).
In addition to GWAS studies focused on association with disease risk, recently many genetics studies have focused on related phenotypes including, e.g., neuroimaging biomarkers (Biffi et al., 2010), circulating biomarkers in (Cruchaga et al., 2013;Kauwe et al., 2014), cognitive decline (Barral et al., 2012(Barral et al., , 2014 neuropathology .0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020 al., 2014), family history (Jansen et al., 2019). GWAS on Hispanic, African-American, Asian, and other minority populations also led to new variants not observed in Caucasians (Cukier et al., 2016;Hirano et al., 2015;Mez et al., 2017) In order to help researchers better explore the rich and diverse literature of genetic findings, it is important to have a single resource with harmonized, unified, searchable information on identified genetic variants and genes across a variety of AD studies and populations, along with supporting functional genomic evidence.
To meet this need, we have cataloged genetic association results (both genome-wide significant and suggestive) from all major GWAS studies published by ADGC from 2009 to 2019 and other AD GWAS studies identified from publicly available catalogs (GWAS catalog, Buniello et al., 2019). All the collected data from each of the association studies are made publicly available on a continuously updated and freely accessible Alzheimer's Disease Variant Portal (ADVP) (https://advp.niagads.org/). To date, ADVP provides the largest, most updated, and comprehensive collection of systematically curated, harmonized, and annotated AD-specific genetic associations. This first release contains information on 6,990 genetic associations, >900 genomic loci curated from >125 AD publications categorized into nine harmonized phenotype categories. All AD associations in ADVP are annotated with genomic and functional information.
Comprehensive biological annotations are available via integration with the NIAGADS Alzheimer's Disease Genomics database (GenomicsDB, 2020). ADVP will serve as an invaluable resource for the research community to explore and decipher the genetic architecture of AD and other neurodegenerative diseases.

Methods
An overview of the ADVP study design is shown in Figure 1.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020September 30, . . https://doi.org/10.1101September 30, /2020 doi: medRxiv preprint Figure 1. ADVP study design. AD GWAS publications are first collected (Section "Data collection"), genetic variant and association data are then systematically extracted (Section "Data extraction"), harmonized (Section "Meta-data design"), annotated (Section "Annotation"), subjected to quality control steps (Section "Quality control steps") and stored into ADVP.

Collection and curation of AD-related GWAS publications (Data collection)
This ADVP V1.0 release consists of curated and harmonized genetic associations from the genome-wide significant and suggestive loci collected from AD genetic studies conducted primarily by the ADGC. All AD GWAS publications by ADGC (2009-2019, http://www.adgenetics.org ) and all other AD GWAS studies in GWAS catalog (Buniello et al., 2019) (MeSH D000544, curation date: Dec 2019) were included. All publications (total N=205; ADGC: N=134; Citations from ADGC: N=20; GWAS catalog: N=51) fulfilling the above search criteria were first screened to identify publications reporting GWAS findings. All reported genetic associations in the main text (table format) were systematically extracted. We curated from the 125 publications that met the above criteria (https://advp.niagads.org/publications ). Supplementary Table S1 provides . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020 details on all of the curated AD publications in ADVP V1.0. Note the ADGC familybased analyses results will be included in the next release.

Extraction of genetic variants and associations from publications (Data extraction)
We applied the following systematic data extraction and curation procedure for each publication to organize all the extracted variant and association information into a structured tabular format according to the ADVP data schema (see Section "Meta-data design" for details about the columns). In each publication, we identified all the tables in the main text with reported association p-values. All the information for these associations was then saved into a standardized template document using the corresponding meta-data schema. The completed document for all curated publications is composed of the three predefined worksheets: 1. The publication's meta-data 2. Association meta-data Lastly, the document is parsed by customized scripts to normalize, validate, annotate, and store the publication, variant, and association data in the relational database (https://dl.acm.org/doi/book/10.5555/560480 ). Collected AD variants and association records are further integrated with the NIAGADS Alzheimer's Genomics Database (GenomicsDB, 2020) providing, comprehensive genomic annotation and functional genomic information.

Publication meta-data
Meta-data for all curated publications in ADVP was extracted from PubMed (https://pubmed.ncbi.nlm.nih.gov/ ) using the NCBI EDirect interface (https://www.ncbi.nlm.nih.gov/books/NBK179288/ ) based on the publication's PubMed . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint identifier (PMID). For each publication, we record its PMID, PubMed Central identifier (PMCID), first and last authors, journal, and publication year. We also store the abstract, article URLs, as well as information on the curated tables for each article in the Publication meta-data ( Figure 1).

Association meta-data
ADVP association meta-data consists of 28 data fields, of which 19 are extracted directly from the paper contents and nine fields are the additional, harmonized (based on extracted original information) and programmatically generated fields. Altogether, for each AD association, association meta-data provide 1) variant information (see Section Description of Variants); 2) association information (see Section Description of association records); 3) annotation information (see Section Annotation of genetic variants and associations). For a detailed explanation of these curated and harmonized/derived data fields, see Supplementary Table S2.

Description of Variants
Each genetic variant in ADVP is described using dbSNP rsID, genomic coordinates (chromosome: basepair), genomic reference, and alternative alleles. Both the values reported in the publication (if available) and the values derived from the reference databases such as dbSNP (Sherry et al., 2001) and NIAGADS Alzheimer's Genomics database (GenomicsDB, 2020), are included in the variant description. Genomic location in the current version of ADVP is stored using GRCh37/hg19 reference genome build as the majority of GWAS publications conducted analyses using GRCh37/hg19. For quality assurance, reported rsID, coordinate, and allele information for each variant were cross-checked against dbSNP b151 ((GenomicsDB, 2020;Sherry et al., 2001)) and referenced with 1000 genome data (Auton et al., 2015) to help resolve reported alleles and complete any missing variation information (e.g., genomic coordinates, allele information).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint

Description of association records
Association information was systematically extracted from each source table and recorded as a part of ADVP association record. They were further recoded and categorized, so that association records are described consistently across publications/studies. For each reported association ADVP first collected a pre-defined set of data attributes commonly reported by genetic association studies (See "Extracted" columns under Supplementary Table S2). These include p-value and statistics related to the effect of the genetic variant (regression beta coefficients and variance, odds ratios, confidence intervals), reported effect allele, and its frequency in the studied population.
In addition to the information directly extracted from publications, each association in ADVP is described with the nine meta-information data fields: 1) "Record type": association record type is set based on whether the reported association is for a single SNP ("SNP-level"), a single gene ("Gene-level"), SNP interactions ("Interaction (SNP)", or gene interactions ("Interaction (Gene)").
2) "Population": study population information was first copied from the publication ("Population (detailed)"). Then, the reported population information was further mapped using standard population vocabulary to normalize population information across studies ("Population" column).
For example, if the cohort is one of the ADGC GWAS cohorts (see Supplementary Table S3), ADVP appends "ADGC" to the data entry. 4) "Sample size": Original sample size (number of cases, number of controls when available/applicable) were recorded. ADVP V1.0 webserver only reports the total number. 5) "Subset analyzed": description of the subset of samples used in association analysis. When a subset of samples was used to perform association analysis . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint (e.g., "e2/e4", "e4 carriers only", "Female only"), this field records description of the subset as described in the publication. 6) "Phenotype": the outcome variable (i.e., phenotype/trait) of the association analysis. Original outcome ("Phenotype (detailed)") was assigned to one of the nine categories including "AD", ADRD", "Cognitive", "Expression", "Fluid biomarker", "Imaging", "Neuropathology", Non-ADRD", and "Other" (representing "Age of onset" or "AD survival"). 7) "Association Type": the type of association analysis. Based on the "Phenotype" column, we classified each association test into six categories: "Age at onset (AAO)/ Survival", "Cross phenotype", "Disease-risk", "Endophenotype", expression quantitative trait locus "eQTL", "Pleiotropy". 8) "Stage": the stage of the analysis as described in the publication, e.g., "Stage n" (n=1,2,3), "Discovery", "Validation", "Meta-analysis". The stage information is reported as given in the paper if this information was available. If the stage information was not explicitly provided in the text, we derived the stage information as follows (Supplementary Figure S1). If the association analysis was done using a single cohort, the "Stage" of the record was set as: "Stage n" (n=1,2,3 or others); "Discovery", "Replication/Validation". "Discovery" was used if the paper was the first to report such findings using the specific combination of cohort + phenotype information, otherwise, the stage was set as "Replication/Validation". If the association analysis was done using multiple cohorts, the stage was set as "Meta-analysis" if the analyses were performed using methods such as "inverse-variance weighting", "fixed effects", "random effects" model or METAL R package; if not, the stage was set to "Joint-analysis". 9) "Imputation": imputation panel information. The imputation panel version, software tool and version were mapped to broader categories such as 1000 Genome project ("1000G"), International HapMap Project ("HapMap") or Haplotype Reference Consortium ("HRC").
Note, population, Cohort, and Phenotype information are displayed in ADVP using both the original (reported) and the derived, harmonized data columns.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint

Functional genomics evidence for genetic variants and associations (Annotation)
All variants and association in ADVP were systematically annotated with genomic context (closest upstream/downstream genes), genomic element (promoter, UTR, intron, exon, intergenic, repeat), functional information (variant most severe consequence), and cross-reference to NIAGADS Alzheimer's Genomics database (GenomicsDB, 2020).

Variant and association data verification (Quality control steps)
Quality control for the variant and association information in ADVP is carried out at multiple levels: 1) We ensured records are not double-counted/re-reported across studies. Each association record in ADVP is uniquely identified by a combination of reported gene/SNP/interaction name, cohort/analyzed subset, the model used, phenotype, and association p-value and effect size.
2) We cross-checked recorded positional information (chromosome: base pair), rsID, and allele information against reference databases including, dbSNP . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint (Sherry et al., 2001) and NIAGADS Alzheimer's Genomics database (GenomicsDB, 2020) to ensure variant information is correct.
3) We identified and removed records solely representing variants annotated by publicly available functional resources such as GTEx (Lonsdale et al., 2013).

ADVP front-end and back-end architecture and implementation
ADVP is designed with ease of update and modularity in mind. Contents of ADVP are derived from collection, curation, harmonization, processing and integration of ADrelated publications and reported genetic associations using a meta-table scheme (see Section Meta-data curation). The ADVP web server runs on Amazon Web Services (AWS) cloud computing instance (m5.4xlarge) using MySQL (Widenius et al., 2002) relational database management system as a back-end and a PHP/JQuery-based web front-end. All the publication, variant and association information stored in ADVP relational database is organized into multiple tables (Figure 1). The web front-end provides multiple data views for publications, genes, variants, and association records ( Figure 4).

Results
ADVP is more comprehensive than the NHGRI-EBI GWAS catalog (Buniello et al., 2019) (Table 1) in terms of the number of curated AD-related associations and publications, and more recent than another major database AlzGene (Bertram et al., 2007). In order to focus on association findings with the highest confidence, we decided to focus on large-scale association studies at the genomic level, with the majority of studies included in ADVP (65%) reporting associations reaching genome-wide significance, the gold standard for GWAS discoveries. Furthermore, ADVP collected extensive meta-data, including consortiums and cohorts, which were not available in the other two databases and are important for relating the results reported across publications. Finally, ADVP provides convenient links for researchers to explore . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint biological significance via an annotation in NIAGADS Alzheimer's Genomics database (GenomicsDB, 2020) (Figure 1).   Following the ADVP curation criteria (see "Data collection", Figure 1), we first identified and screened 205 AD-related publications from 2009-2019. Out of these, we identified 125 publications with genetic associations reported in the main text tables (N=225 tables). Genetic variant and association data were then systematically extracted (Section "Data extraction"), harmonized (Section "Meta-data design") (converted into standard variant/association descriptors), annotated (Section "Annotation"), subjected to quality control steps (section "Quality control steps") and stored in ADVP ( Figure 1).

ADVP data summary
The ADVP V1.0 release contains high-quality genome-wide and suggestive AD-related genetic associations extracted from GWAS publications. It contains 6,990 genetic associations for variants, genes, and SNP interactions. Figure 2 shows the distribution of ADVP genetic associations by harmonized meta-information data fields: a) Nine harmonized phenotypes; b) Six harmonized analyses type; c) Population, and d) Cohorts/Consortiums.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Figure 2: Summary of genetic association records in ADVP by A) Phenotype, B)
Analyses type, C) Population, and D) Cohorts/Consortiums.
All ADVP association records are uniquely standardized into different categories: 1. As shown in Figure 2A, ADVP records are associated with nine different phenotype categories, with roughly half of them related to AD diagnosis. 15% of the records are related to fluid biomarkers, 7% with imaging and 6% with cognitive measures.
2. With respect to analysis type categories, ADVP includes 3,199 (45.8%) association records reported in disease-risk analyses, of which 1,342 and 934 associations are reported by meta-and joint-analyses, respectively. 1,887 (26.9%) of the records are related to AD endophenotype and 924 (13.2%) eQTL AD associations ( Figure 2B).
4. ADVP records present analyses results from seven populations as well as those from transethnic analyses. ~88% of the records are for Caucasian ( Figure 2C). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020 Others include African American, Arab, Asian, Caribbean Hispanic, Hispanic and Non-Hispanic Caucasian. 5. ADVP records span analyses results from over 80 cohorts. We summarize here data from the few largest AD consortiums ( Figure 2D) Furthermore, ADVP provides annotation information for each genetic association (Section "Annotation"). In summary, all the genetic association records in ADVP were represented by >1,800 unique variants (based on genomic position) and >900 genomic loci (based on computed normalization). ADVP associations are mostly located in noncoding regions including intronic (52.9%), intergenic (15.2%), and promoter (5.9%) ( Figure 3A). ADVP records are also cross-referenced to NIAGADS Alzheimer's Genomics database (GenomicsDB, 2020). Figure 3B shows the impact of genetic variants in ADVP as determined by ADSP functional annotation pipeline (Butkiewicz et al., 2018;Cingolani et al., 2012;GenomicsDB, 2020).

A)
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. variants is determined using ADSP functional annotation pipeline (Butkiewicz et al., 2018;McLaren et al., 2016) and is provided by NIAGADS Alzhemer's Genomics database (GenomicsDB, 2020).

ADVP features -search, browse and visualize
ADVP aims to provide a simple and unified resource to the scientific community, allowing researches to search and browse AD genetic association information more easily. This is first done by displaying association records using a pre-selected set of most important data fields described in Methods. Users can further select additional data fields via the column selector ( Figure 4A). All records are integrated with the NIAGADS Alzheimer's Genomics database, allowing users to explore various kinds of biological annotations (e.g. CADD score (Rentzsch et al., 2019)) and functional genomics evidence, including overlaps with FANTOM5 (Andersson et al., 2014), . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint ENCODE histone modification (Dunham et al., 2012), and gene ontologies from KEGG (Kanehisa et al., 2016) and UniProt (Huntley et al., 2015).
The ADVP search interface was designed based on focus group use cases. ADVP provides several ways to search for genetic association records: 1) By publication -users can quickly identify and retrieve all association records curated by ADVP for a particular study using PMID/PMCID, first or last author names, year of publication or article title (https://advp.niagads.org/publications ).
2) By variant or gene of interest -investigators can search for the variant (https://advp.niagads.org/variants ) or gene (https://advp.niagads.org/genes ) of interest and browse all the ADVP records associated with these. Additionally, ADVP provides an interface for users to easily discover top variant or a gene with most association records or most publications via the summary counts for association records and papers ( Figure 4B, 4C).

3) By region of interest -users can search and retrieve all genetic associations
within the genomic regions of interest via this interface (https://advp.niagads.org/search ). 4) By integrative genome-wide plots -investigators can navigate the landscape of AD genetics associations using the graphical display of genetic association data via interactive chromosome ideogram (https://advp.niagads.org/ideogram , Figure 4D) or interactive population/phenotype variant viewer (https://advp.niagads.org/plot , Figure 4E).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint A)

B)
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint C) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020 Figure 5 -Integration with NIAGADS AD Genomics database (GenomicsDB, 2020) providing additional biological information and functional evidence. Shown are the provided annotation and functional genomics data categories (red rectangles).

Discussion
Here, we present ADVP, a portal to search, browse and visualize the largest collection of systematically curated, harmonized, and annotated AD-specific genetic variants and associations (~7,000 genetic associations in the current release, V1.0 (August 2020)).
Among the main distinctive features of ADVP is the uniqueness of reporting harmonized AD variant and association information (standardized meta-table curation schema), integration with the genomic annotation, and functional information (NIAGADS Alzheimer's Genomics database (GenomicsDB, 2020)), as well as extensive consortium level information.
ADVP uniquely includes associations at SNP, gene, and interaction levels and contains curated phenotypes not limited to disease risk, but also includes endophenotypes, fluid biomarkers, imaging, neuropathology, and other phenotypes. Moreover, ADVP curates and records AD and ADRD eQTL association findings ( Figure 2B).
In addition to the standard p-values and effect sizes reported for association records, ADVP puts particular emphasis on harmonizing meta-data curated from the publications. Both the curated and derived columns are stored in the database. These include phenotype, association type, standardized gene names, study information (population, cohort, sample size, subset analyzed), and details of analyses (analyses type, imputation) ( Figure 4A). All these columns enable the researchers to interpret, compare and view these records at different levels: phenotype (Figure 2A), population ( Figure 2C), cohort ( Figure 2D), to name a few.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10. 1101/2020 All ADVP records are annotated with the genomic context (upstream/downstream genes, and their distances) and their co-localized genomic element (Figure 3). They are also cross-referenced with NIAGADS Alzheimer's Genomics DB (GenomicsDB, 2020), providing other genomic annotation and functional genomic information. The standardized, structured design of ADVP association data allows systematic integration with other genetic, genomics, and molecular databases.
Lastly, we made substantial efforts to ensure high-quality of ADVP data contents. First, quality control at multiple levels is performed (Figure 1, Section "Quality control steps") to ensure the uniqueness of included genetic associations (no double counting / rereporting of associations). Besides, variant information in ADVP has been crosschecked against other reference databases such as dbSNP.
ADVP will continuously be updated with versioned releases every six months. New publications on any AD-related GWAS studies will be added in an ongoing manner. It will first be added to our existing unified catalog ('Publications' meta-data table). Genetic association records will then be extracted from each of these publications, processed, QC-ed, and imported into ADVP (Association's meta-data table).
In the future, in addition to curating and including new AD GWAS findings, ADVP data collections will consist of a broader range of genetic results: • AD whole-genome/whole-exome sequencing analyses • AD xQTL associations, where x = protein, methylation, epigenetics marks, or other molecular traits Other genetic variant types, such as insertions/deletions (indel), copy number variations (CNV), or structural variations (SV) as they become available • AD-related disorders (ADRD) Last but not least, future ADVP functionality will include further collection and addition of functional genomic evidence supporting genetic associations.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10. 1101/2020 To conclude, ADVP contains the largest collection of systematically curated, harmonized, and annotated literature-derived variants for AD to the best of our knowledge. The extensive and unique features in ADVP allow researchers to easily access, interpret, compare, and visualize the vast collection of AD genetics findings.

Availability
All AD variant and association information is available through ADVP website (https://advp.niagads.org/ ). The code for processing reported variant and association data is also available upon request. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10. 1101/2020 Bullido, M. J., Engelborghs, S., De Deyn, P., Berr, C., Pasquier, F., Dubois, B., Tognoni, G., Fi?vet, N., Brouwers, N., Bettens, K., Arosio, B., Coto, E., Del Zompo, M., … Campion, D. (2011). APOE and Alzheimer disease: a major gene with semi-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10.1101/2020.09.29.20203950 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 30, 2020. . https://doi.org/10. 1101/2020