Abstract
Importance The identification of modifiable risk factors for Alzheimer’s disease (AD) is paramount for early prevention and the targeting of new interventions.
Objective To assess the associations between health conditions diagnosed in primary care and the risk of incident AD over time.
Design, Setting, and Participants Data for 20,214 AD patients from the United Kingdom and 19,458 AD patients from France were extracted from The Health Improvement Network (THIN) database. For each AD case, a control was randomly assigned after matching for sex and age at dementia diagnosis. We agnostically tested the associations between 123 different ICD10 diagnoses extracted from health records and AD dementia, by conditional logistic regression. We focused on two time periods: 2 to 10 years vs. 0 to 2 years before the diagnosis of AD, to separate risk factors from early symptoms/comorbidities.
Exposures We considered all health conditions that had been recorded in more than 0.1% of visits per 1000 person-years in both cohorts, corresponding to 123 potential types of exposure.
Main Outcomes and Measures Odds ratios (ORs) for the association of AD with the various health conditions were calculated after Bonferroni correction for multiple comparisons.
Results Ten health conditions were significantly associated with high odds ratios for AD when diagnosed 2 to 10 years before AD, in the British and French samples: major depressive disorder (OR 95% confidence interval (UK):1.23-1.46)), anxiety (1.25-1.47), reaction to severe stress (1.24-1.59), hearing loss (1.11-1.28), constipation (1.22-1.41), spondylosis (1.14-1.39), abnormal weight loss (1.33-1.63), malaise and fatigue (1.14-1.32), memory loss (6.65-8.76) and syncope and collapse (1.1-1.37). Depression was the first comorbid condition associated with AD, appearing at least nine years before the first clinical diagnosis of AD, followed by, anxiety, constipation and abnormal weight loss.
Conclusions and Relevance These results from two independent primary care databases provide new evidence on the temporality of risk factors and early signs of Alzheimer’s disease. These results could guide new dementia prevention strategies.
Introduction
Alzheimer’s disease (AD) is thought to account for 60-70% of dementia cases worldwide, and is, thus, one of the principal health challenges of the 21st century (WHO, 2020). Dementia affected 10.5 million Europeans in 201523, and projections suggest that 13.5 million Europeans will be affected in 203023. It is estimated that 4.5% of people over the age of 65 years currently have AD in Europe (6.4% for dementia in general)23. AD has major consequences at the individual and societal levels, as it requires a high degree of social care21.
The neurodegenerative diseases responsible for dementia (and AD in particular) are progressive and develop over decades before becoming disabling. Attempts to develop an effective disease-modifying treatment for patients with symptomatic AD have been unsuccessful. There has, therefore, been a shift towards early interventions, to maximise the therapeutic window through primary prevention measures acting on actionable risk factors, or secondary prevention measures to slow disease progression through early therapeutic interventions. The key to the implementation of such preventive measures is a data-driven understanding of the complex dynamics of the presymptomatic period at the point of care.
Review of potential risk factors for dementia
The most recent Lancet Commission on Dementia, Prevention, Intervention and Care 20 added three new modifiable risk factors (excessive alcohol consumption33, head injury36 and air pollution27) to the initial list of nine factors 21 (low level of education, hypertension, hearing impairment, smoking, obesity, depression, physical inactivity, diabetes and infrequent social contact). The report concluded that modifying these 12 risk factors might prevent or delay dementia in up to 40% of cases (7% for low level of education and 8% for hearing loss). For Alzheimer’s dementia, another recent large meta-analysis of 243 observational prospective studies and 153 randomised controlled trials 35 identified 10 modifiable risk factors: diabetes, hyperhomocysteinaemia, poor BMI management, low level of education, hypertension in midlife, orthostatic hypotension, head trauma, poor cognitive activity, stress and depression. All the studies stressed the importance of a specific feature of neurodegenerative diseases, which include a prodromal phase lasting up to 20 years before clinical diagnosis 10,34. This makes it more difficult to separate possible early symptoms from causal risk factors when describing the natural course of the disease. It also highlights the urgent need for observational prospective studies with long follow-up periods, to investigate the long-term relationship between medical conditions and dementia onset 34,35.
Investigating long-term risk factors observable in primary care: an agnostic approach
Epidemiological studies on the aetiology of AD have focused on hypothesis-driven searches for potential risk factors, in cohort studies on older people (mostly over the age of 60 years). By contrast, agnostic studies of risk factors test all possible associations exhaustively, in a similar fashion to genome-wide association studies for finding genotype-phenotype associations. They require large samples for the detection of statistically significant signals for associations. The constitution of such large samples is made possible by access to large electronic health record databases 9,26, facilitating complex longitudinal analyses and the testing of multiple hypotheses in well-powered studies.
We set up and analysed two large nested case-control studies drawn from two large independent primary care databases in the UK and France, extracted from The Health Improvement Network (THIN) database. Through this extensive study, we sought to confirm some well-known dementia risk factors and exhibit new candidates for consideration in future anti-dementia strategies.
Methods
Study design and participants
THIN database
We used The Health Improvement Network (THIN®) database3, a large standardised European database of fully anonymised and non-extrapolated electronic medical records collected from physicians by Cegedim and coded according to International Classification of Diseases, 10th Revision (ICD-10) codes. The THIN database complies with all current European data protection laws (General Data Protection Regulation (GDPR)) and adheres to the Observational Medical Outcomes Partnership (OMOP) model. The French data were collected from a pool of 2,000 general practitioners14 representative of the French population in terms of age, sex and living area. The UK data were collected from 400 general practices, representing around 6% of the UK population, selected for the THIN quality data recording scheme with Vision practice management software 32. Several reports have already shown that the electronically coded diagnoses in this database are representative of the UK general practice population in terms of demographics and type of consultation 3,17. For each patient, the diagnosis corresponding to each visit, the prescriptions made by the general practitioner (GP) and all other diagnoses associated with these prescriptions were available. Data for educational level and social status were not provided, to preserve anonymity.
Data extraction on the basis of ICD10 classification
For each patient, we had access to all the disease diagnoses made by the general practitioner, whether the main reason for the visit or associated with a prescription made by the GP. In the French cohort, we may not have had access to the full history of some patients, if they consulted several different general practitioners during the study. We defined past exposure to the health conditions considered, according to the 10th International Classification of Disease (ICD10). ICD10 codes were provided directly in the French database. For the UK database, ICD10 code was obtained by converting READ codes according to the correspondence provided by the UK Clinical Terminology Centre 7. We used the first three characters of the code, defining the category of the disease. In this exploratory approach, we assessed the association of AD with each of the health conditions defined by ICD10 codes recorded in more than 0.1% of visits per 1000 person-years in both countries (the exact list is provided in the supplementary material). Study design
We extracted two samples of patients diagnosed with AD from the French and UK electronic health records in the THIN database (dementia cases according to International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-10-CM], codes = F00 and G30). In France, all patients with AD and at least two years of follow-up were extracted, whereas, in the UK, 35% of patients with AD and at least two years of follow-up (n=20654 from a total of 59912 AD patients with two years of follow-up in the database) were sampled with a Python random sampling routine. Age at AD onset was defined as age at the first record coded with an AD diagnosis. We considered data recorded from January 1996 to April 2020 in the UK and from September 1996 to February 2019 in France. For each country, a sample of control individuals, with no history of diagnosis of neurodegenerative diseases — Alzheimer’s disease (F00 and G30), Parkinson’s disease (G20), frontotemporal dementia (G31.0), dementia with Lewy bodies (G31.8), Huntington’s disease (G10), or multiple sclerosis (G35) — was drawn, matched with AD cases for age (± 1 year) at last record in the database and sex. Patients with less than two years of follow-up were excluded (eFigure 2). A summary table of the original data is provided in Table 1, and a flowchart (eFigure 1) is provided in the supplementary material. The study included 20,214 AD cases and matched controls for the UK cohort and 19,458 AD cases and matched controls for the French cohort (Table 1). Based on these initial cohorts, we designed several nested case-control analyses. For the investigation of risk factors and symptoms, we considered several time frames preceding AD diagnosis. We calculated odds ratios for three time periods between exposure to a particular health condition and dementia onset: 0 to 2 years, 2 to 10 years and 10 to 15 years. The index date was defined as the date of AD diagnosis for AD patients, and as the date of AD diagnosis of the matched AD patient for controls. For each analysis, we individually matched each AD patient with a control for sex, age at last visit (± 1 year) and years of follow-up, to ensure that cases and controls had been observed over the same lifespan, as explained in eFigure 2 provided in the supplementary material.
Statistical analysis
We first used conditional logistic regression to analyse the association between each health condition and the diagnosis of AD. Bonferroni correction for multiple comparisons was applied. All analyses were conducted with the StatsModel and ScikitLearn libraries in Python. Values of p < 0.0004 in two-tailed tests were considered significant (p=0.05 with Bonferroni correction for 123 potential exposures). We estimated effect size globally and by time between health condition exposure and AD diagnosis, as follows: (0 - 2] years, (2 – 10] years, and (10-15] years. In this last category, no health condition was found to increase the risk of AD onset in both countries. In the appendix, we provide tables including all the odds ratios calculated in the conditional logistic regression analysis, a table comparing the prevalences of the tested exposures in controls from the two countries and figures showing the odds ratios of each significantly associated variable. We also show (alpha%-(1-alpha)%, alpha=0.05/ntests=0.05/123=0.0004) confidence intervals, which take into account the uncertainty due to multiple comparisons.
We determined the independent contributions of the different health conditions to the risk of AD, by performing a conditional logistic regression analysis including all the health conditions significantly and independently associated with an increase in AD risk. Assuming causality, we calculated a combined population attributable fraction associated with health conditions displaying significant association with AD.
We tried to identify the window of exposure of greatest importance for the various health conditions, by calculating the incidence of the health conditions concerned in the AD and matched control samples and following its change over time until the index date, for all conditions found to be significantly associated with AD in the (2-10 years] time window. We merged the French and the UK nested case-control samples for this analysis. We assumed that incidence follows a Poisson distribution and calculated the associated confidence intervals. We added five additional comorbid conditions (sleep disorders, hypotension, disorders of urinary system, hypothyroidism and fracture) associated with AD when diagnosed in the (0-2] years preceding AD diagnosis, which have also been reported as possible risk factors for dementia in previous studies. We plotted the change in incidence ratio for each comorbid condition with time to AD diagnosis. We smoothed the incidence ratio curve by fitting a quadratic polynomial, to facilitate interpretation of the plot. For each comorbid condition, we show the plot from the first time point at which the 95% confidence intervals of the corresponding incidence rates are separate.
Results
Our initial full analytical cohorts included 20214 patients with a diagnosis of Alzheimer’s disease in the UK and 19458 AD patients in France; the study flowchart is provided in eFigure 1 in the supplementary materials.
Table 1 describes the age and sex distribution of patients with AD and their matched controls in the two countries. The median age at diagnosis was 81 years in the UK and 80 years in France. There were more women than men in both countries (69% in the UK and 63% in France). More follow-up data were available for the UK sample, with 13 person-years of data available before the index date in the full analytical cohorts (versus seven years in France). The median number of visits per year was also higher in the UK than in France (seven versus five). At two years before the index date, 19940 AD patients were included in the UK and 19056 in France; at 10 years, the corresponding numbers were 13064 and 5170; and at 15 years, they were 7731 and 1449. Age at diagnosis was not dependent on the number of years of data available before the index date in the UK, whereas, in France, AD patients with at least 10 years of retrospective data before diagnosis had a median age at the index date of 82 years (versus 80 years for the full analytical cohort). The mean number of visits per year was similar for the case and control groups in both countries. In the control groups, the health condition most frequently recorded in the databases of both countries was essential hypertension, with a prevalence of 53% in the UK and 65% in France, followed by cough (42%) and dorsalgia (41.6%) in the UK and by dorsalgia (37.9%) and hypercholesterolaemia (32%) in France (eTable1).
We agnostically tested 123 possible health conditions in the (0-2], (2-10] and (10-15] years before the diagnosis of Alzheimer’s disease (eTable 2). Only 10 of the 123 health conditions were significantly positively associated with the diagnosis of AD in both countries (Table 2 and Figure 1), after Bonferroni correction for the (2-10] year time window. None of these health conditions was significantly associated with the diagnosis of AD in the (10-15] years time frame. Spondylosis was the only health condition significantly associated with AD diagnosis in the (2-10] year period but not in the (0-2] period, this relationship being observed only in the UK. Cervical spondylosis accounted for 63% of the reported spondylosis cases in the UK and 60% in France. Another 12 health conditions were found to be significantly positively associated with AD in both countries in the (0-2] year time frame (eTable2 and eFigure 3). Five of the 10 health conditions identified as significantly associated with AD (anxiety, constipation, spondylosis, memory loss and abnormal weight loss) remained significant (p < 0.05) in the conditional logistic model containing all these health conditions (Table 3 and eTable 3 for the French analysis). They were also significant in the French model (eTable3). Assuming causality for depression, anxiety, spondylosis, constipation, hearing loss and reaction to severe stress, the potential attributable risk associated with these health conditions was 14% in France and 18% in the UK.
Finally, we show the change in incidence over time in the control and AD cohorts, for the health conditions significantly associated with AD risk (Figures 3 and 4). We found that depression and anxiety were the first health conditions to be associated with AD, at least nine years before the clinical diagnosis of AD, followed by constipation and abnormal weight loss seven years before the index date. These findings facilitate differentiation between risk factors associated with early stages of the disease and probable comorbid conditions occurring within a few years of diagnosis (e.g. sleep disorders, and urinary system disorders).
Discussion
This large agnostic study of subjects from two European countries shows that several health conditions observed by general practitioners are significantly associated with the diagnosis of AD several years later. In these two studies, including a total of 39672 AD patients, 10 of the 123 health conditions considered were significantly associated with the subsequent diagnosis of AD in both countries and could be considered potential prediagnostic features. Depression and anxiety were among the risk factors for dementia identified by the recent Lancet Commission 20. Nevertheless, it remains a matter of debate whether these conditions are risk factors for dementia, early symptoms of dementia, or both34. The association between depression and subsequent AD diagnosis became significant at least nine years before the first clinical diagnosis of AD, consistent with the Whitehall II cohort study report34. Only anxiety was found to be associated with AD in the UK in a multivariable analysis of all the identified risk factors. This was the first multivariable analysis to consider all these health conditions at the same time.
Hearing loss1,25 has also been identified as a potential risk factor for dementia in multiple observational and prospective studies 2,5,8,11,19. A recent genetic study 22 found a correlation between genetically determined hearing impairment and AD. However, no causal link could be demonstrated between these traits, to confirm or infirm the current hypothesis that the simple management of hearing loss can decrease the risk of AD. We also observed a trend towards an association between AD and hearing loss, but it was not quite significant (p=0.057) after adjustment for other risk factors as possible cofounders (Table 3). We also confirmed associations between AD and abnormal weight loss35, which remained significant in the multivariable model. The causality of the link between abnormal weight loss and dementia remains to be demonstrated, and it is unclear whether this relationship is bidirectional.
The agnostic approach used here made it possible to identify other potential new risk factors for AD. Constipation has already been linked to depression6 and has been identified as a symptom in some neurodegenerative diseases, such as Lewy body dementia and Parkinson’s disease32. To our knowledge, this is the first time that constipation has been shown to be associated with a risk of AD. This association was detected seven years before the first clinical diagnosis of AD, and remained significant until the time of AD diagnosis and after controlling for depression and anxiety. We also found a strong association between spondylosis and AD dementia risk (OR=1.26 in the UK and OR=1.45 in France), which remained significant after correcting for other identified risk factors. One possible reason for this association may be a decline in physical activity20 due to spondylosis. A higher prevalence of dementia and AD was reported in a retrospective longitudinal study of patients with ankylosing spondylitis16. Nevertheless, we found an association with spondylosis (ICD10 code M47) but not with ankylosing spondylitis (ICD10 code M49). This result suggests that the association may not be due to the inflammatory nature of the spondylitis. We noted that the spondylosis was cervical in most of our cases. It may therefore be worthwhile investigating whether cervical spondylosis affects the flow of blood29 or cerebrospinal fluid30 to the brain, thereby favouring or accelerating neurodegeneration. The associations of AD with syncope and collapse and hypotension were also documented in a recent study reporting an association between midlife orthostatic hypotension and future dementia28.
We found associations present only in the last two years before diagnosis. The lateness of these associations suggests that these conditions are probably prodromal symptoms of the disease rather than risk factors. For instance, a recent Mendelian randomised study 12 showed that AD was more likely to be the cause of sleep disorders rather than a consequence of these disorders. The association with “hypothyroidism” may, therefore, reflect active investigations in patients with cognitive impairment, as the evaluation of thyroid hormone levels is recommended for this indication by the World Health Organisation.
We also highlight risk factors not replicated in the analysis. Hypertension has repeatedly been presented as a potential modifiable risk factor for AD24, whereas our data suggest a protective effect in the UK study and no association in the French study (FR: 1.01 (0.87-1.18), UK: 0.82 (0.75-0.90)). As in other observational studies 4,13, this association may result from a channelling effect, in which demented patients with hypertension or other cardiovascular comorbidities are more likely to be diagnosed with vascular dementia than AD.
Other previously reported associations not replicated in this study included associations with herpes virus infections, alcohol use disorders33, diabetic disorders and obesity. We cannot rule out the possibility that these discrepancies may be due to the control cohorts consisting of patients regularly consulting a general practitioner for these chronic illnesses.
Strengths and limitations
Epidemiological approaches, such as Lancet Commissions 20,21, tend to focus on all-cause dementia, whereas other approaches in dementia research tend to argue for increasingly precise biological definitions of AD for the testing of hypothesis-driven specific pharmaceutical strategies 15. The early detection and labelling of biomarker-positive patients before the onset of symptoms has been criticised, due to the uncertainty of progression to dementia and the absence of proven risk-reducing interventions 18,31. This study has the advantage of bridging the gap between these different approaches, through a large-scale, exploratory, data-driven approach, considering all the main health conditions observable at the point of care.
Beyond its data-driven precision, another notable strength of this study is its inclusion of sufficient numbers of patients from two countries, making it possible to replicate the analysis, with consistent results obtained for the two samples, ensuring a high degree of generalisability. One question raised by this study is whether the variables identified are genuine risk factors. This could be tested in Mendelian randomisation studies on databases including genetic data12 or in new prospective research cohorts.
The study also has several limitations. We did not have access to other variables, such as education level, socio-economic status or genetic information. In addition, the recurrence of some diagnoses over time was not taken into account, as we considered only the first presentation of each health condition. Future studies could use the number of visits per diagnosis as a proxy for disease severity. The matched control samples were probably not entirely representative of the dementia-free general population, as they probably included a higher proportion of people with chronic illnesses or in poorer health generally. The odd ratios reported may therefore be an underestimate, as the real dementia-free general population includes a higher proportion of healthy ageing individuals than our control sample. The diagnosis of AD by a general practitioner rather than a trained neurologist may also have added to the background noise in the results, although this effect was probably counterbalanced by the large number of patients considered in the study. Diagnosis by a GP probably also resulted in a later diagnosis than in prospective cohorts in which each patient attends an annual visit with a neurologist. This is the main reason for which we considered comorbid conditions associated with AD at least the two years before the diagnosis of AD. We believe that the performance of a large number of studies of this type on large databases and their replication by independent research teams will lead to the emergence of a consensus for confirmation through future prospective cohorts.
Conclusion
This study provides a modern extensive data-driven survey of potential risk factors and early symptomatic expression for Alzheimer’s disease in primary care, and constitutes a first attempt to determine when these comorbid conditions appear during the course of the disease. This study is innovative in its agnostic approach, leading not only to the confirmation of known risk factors for AD, but also to the identification of new risk factors. We confirmed several associations identified in previous studies, with hearing loss, depression and anxiety, for example, and identified other associations, with spondylosis, constipation and weight loss. Our findings are robust, as the associations were explored in two independent databases from France and the UK, ensuring the generalisability of the results. Finally, our findings make it possible to model the possible trajectories of risk factors in the period preceding the diagnosis of AD, providing new insight into possible windows for prevention.
Data Availability
The data used in the preparation of the article are available from the Cegedim company upon reasonable request.
Contributors
TN, BCD, SE, CD and SD conceived and designed the study. FM, MA, LG, BL extracted the data. FM, MA and TN ensured data quality and TN, BCD, SE, CD and SD analysed the data. TN and BCD generated the figures. TN, BCD, SE, FM, MA, LG, BL, CD and SD interpreted the results and contributed to the writing of the final version of the manuscript. All authors agreed with the results and conclusions and approved the final draft.
Competing interest
SE reports personal fees from Biogen, Eisai, Roche, and GE Healthcare, for presentations or participation to advisory boards. FM, LG and BE are full time employees of Cegedim. Other authors declare no competing interests.
Data sharing statement
The data used in the preparation of the article are available from the Cegedim company upon reasonable request.
Ethics committee approval
The French database obtained approval for data collection from the French National Data Protection Authority (CNIL) in 2002. For the UK database, ethical approval was granted by the NHS South-East Multicentre Research Ethics Committee in 2003 (ref: 03/01/073) for establishment of the THIN database, and it was updated in 2011. The study was a retrospective analysis of secondary anonymised patient data only.
Patient and Public involvement
We plan to disseminate the results of our research to the public and relevant patient community through the current publication and its related press release as well as presentations at scientific conferences.
The lead authors (CD,SD) affirm that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Acknowledgement
This work was funded in part by the Agence Nationale de la Recherche, as part of the “Investissements d’avenir” program, under grants ANR-10-IAIHU-06 (IHU-ICM) and ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). BCD is supported by a CJ Martin Fellowship (NHMRC, APP1161356).