ABSTRACT
Several studies have demonstrated the high agreement between routine clinical visual assessment and quantification, suggesting that quantification approaches could support the assessment of less experienced readers and/or in challenging cases. However, all studies to date have implemented a retrospective case collection and challenging cases were generally underrepresented.
Methods In this prospective study, we included all participants (N=741) from the AMYPAD Diagnostic and Patient Management Study (DPMS) with available baseline amyloid-PET quantification. Quantification was done with the PET-only AmyPype pipeline, providing global Centiloid (CL) and regional z-scores. Visual assessment was performed by local readers for the entire cohort. From the total cohort, we selected a subsample of 85 cases 1) for which the amyloid status based on the local reader’s visual assessment and CL classification (cut-off=21) was discordant and/or 2) that were assessed with a low confidence (i.e. ≤3 on a 5-point scale) by the local reader. In addition, concordant negative (N=8) and positive (N=8) scans across tracers were selected. In this sample, (N=101 cases: ([18F]flutemetamol, N=48; [18F]florbetaben, N=53) the visual assessments and corresponding confidence by 5 certified independent central readers were captured before and after disclosure of the quantification results.
Results For the AMYPAD-DPMS whole cohort, the overall assessment of local readers highly agreed with CL status (κ=0.85, 92.3% agreement). This was consistently observed within disease stages (SCD+: κ=0.82/92.3%; MCI: κ=0.80/89.8%; dementia: κ=0.87/94.6%). Across all central reader assessments in the challenging subsample, global CL and regional z-scores quantification were considered supportive of visual read in 70.3% and 49.3% of assessments, respectively. After disclosure of quantitative results, we observed an improvement in concordance between the 5 readers (κbaseline=0.65/65.3%; κpost-disclosure=0.74/73.3%) and a significant increase in reader confidence (Mbaseline=4.0 vs. Mpost-disclosure=4.34, W=101056, p<0.001).
Conclusion In this prospective study enriched for challenging amyloid-PET cases, we demonstrate the value of quantification to support visual assessment. After disclosure, both inter-reader agreement and confidence showed a significant improvement. These results are important considering the arrival of anti-amyloid therapies, which utilized the Centiloid metric for trial inclusion and target-engagement. Moreover, quantification could support determining Aβ status with high certainty, an important factor for treatment initiation.
Introduction
Recent advances in anti-amyloid immunotherapies and their availability in routine clinical praxis makes it essential to determine amyloid-β (Aβ) status of potentially eligible patients with high certainty1. Within this context, quantifying Aβ positron emission tomography (PET) for routine clinical use to support the diagnostic process of neurodegenerative disorders has received great interest over recent years2. Several studies have demonstrated the high agreement between routine clinical visual assessment and quantification, suggesting that quantification approaches could support the assessment of less experienced readers and/or in challenging cases3-7. However, all studies to date have implemented a retrospective design, which did not allow for direct assessment of the impact of quantification disclosure on visual-based classification of Aβ status and confidence of the assessment. In addition, while most previous studies have speculated on the value of quantification to support particularly challenging cases, these are generally underrepresented and hence require more detailed investigation to support this statement6-8.
The three most comprehensive retrospective studies have illustrated the high agreement (86%-96%) between amyloid-PET visual read and several quantification approaches across the three FDA and EMA approved fluorine-18 radiotracers9-11. For [18F]flutemetamol, an average agreement of 94% between visual read and standard uptake value ratio (SUVr) quantification derived from local non-harmonized quantification pipelines across 5 large clinical studies has been reported6. A very similar percentage agreement (i.e., 96.4%) has been reported for [18F]florbetaben, where visual read was compared to quantification across 15 different software packages7. Finally, in the arguably more ‘real-world’ IDEAS dataset, consisting mostly of [18F]florbetapir scans, an 86% concordance between visual read and Centiloid quantification using the robust PET-only Processing (rPOP) pipeline has been demonstrated12.
Centiloid quantification has been more widely implemented in recent years, as it brings the tracer-specific SUVr metric to a standardized scale, providing intuitive and across center/tracer generalizable cut points reflecting overall Aβ pathological burden13. Neuropathological studies have shown that the earliest detectable amyloid PET signal occurs around 12 CL, while 21-24 CL best discriminates between subjects with none-to-low Aβ plaque burden and those with intermediate-to-high deposition14, 15 and 30 CL is indicative of established Aβ burden16. Compared to visual positivity, CL cut-offs generally fall in between these values, ranging roughly between 17 CL for highly experienced readers4 and 40 CL in a routine clinical setting17, though most consistently around 25 CL4,5,8,12,14. Considering the robustness of the measure18, the Centiloid metric has been widely implemented in AD interventional trials. For example, the Lecanemab (Eisai)19 and Donanemab (Eli Lilly)20 phase III trials have implemented CL as their primary target engagement outcome and set a negativity threshold (CL<30 and 24.1, respectively) based on this quantification unit. Moreover, in the Donanemab Phase III trial, treatment was stopped if Centiloids were below 11 in a given scan or below 25 in two consecutive ones. Therefore, quantification could also be considered for the discontinuation of anti-amyloid treatment in future clinical routine. It is therefore key to familiarize routine clinical users with quantitative amyloid-PET measures during their diagnostic workup5.
We aimed to determine the value of quantification in challenging clinical amyloid-PET cases using a prospective clinical dataset and design. Here, we selected participants from the Amyloid Imaging to Prevent Alzheimer’s disease Diagnostic and Patient Management Study (AMYPAD-DPMS)21, who underwent amyloid-PET imaging as part of their diagnostic work-up22 and assessed agreement between visual reads performed at each imaging site by ‘local’ readers and quantification performed centrally. For the primary analysis, we selected a subset of challenging cases based on the local readers and assessed the agreement among 5 independent ‘central’ readers before and after disclosure of quantitative results, as well as the confidence in their assessments.
Methods
Cohort
Amyloid-PET scans were obtained from the AMYPAD-DPMS randomized controlled trial (N=840), which recruited patients across the disease continuum, including subjective cognitive decline plus (SCD+), mild cognitive impairment (MCI), or dementia from 8 memory clinics across Europe. A detailed description of the baseline characteristics has been described previously21. For the current work, the final disease stage (SCD+, MCI or dementia) and etiological diagnosis (AD, non-AD or not yet achieved [NYA]) during the DPMS observation period were used. All participants gave written informed consent. The trial was registered with EudraCT (2017-002527-21). The study was approved by the CCER (Commission Cantonale d’Ethique de la Recherche) in Geneva Switzerland (#2017-01408).
Patient selection
All participants with an available baseline amyloid-PET scan that passed quality control (see below) for quantification were included (N=741). From this cohort, we selected a subsample of 85 amyloid-PET scans, 1) for which the amyloid status based on the local reader assessment and Centiloid (CL) (cut-off=21, reflecting the lower level of the CL that best discriminates none-to-low and intermediate-to-high Aβ buren14) was discordant and/or 2) that were assessed with a low confidence (i.e. ≤3 on a 5-point scale) by the local reader. In addition, VR/CL concordant negative (N=8) and positive (N=8) scans across tracers and sites to represent real-world negative and positive cases were selected from the total cohort to balance the dataset (N=101: ([18F]flutemetamol, N=48; [18F]florbetaben, N=53).
PET acquisition and quantification
Scans were acquired according to the standard protocol for each tracer (i.e. [18F]florbetaben/ Neuraceq or [18F]flutemetamol/Vizamyl), starting at 90 min. post-injection (p.i.) of 350 MBq (±20%) for florbetaben and 185 MBq (±10%) for flutemetamol and collected in 4 frames of 5 min. each (90 to 110 minutes p.i.). PET images were processed centrally using GEHCs AmyPype PET-only pipeline providing global CL (cortical target mask) and regional z-scores (based on the AAL atlas) for 6 cortical ROIs (frontal, anterior cingulate, posterior cingulate/precuneus, lateral parietal, and lateral temporal cortex). In brief, amyloid-PET images undergo frame to frame alignment and summing and images are spatially normalized to the standard Montreal Neurological Institute (MNI152) space using an adaptive template registration method.23 The whole cerebellum was used as reference region. Of note, agreement between AMYPYPE Centiloids and those obtained with the standard Centiloid pipeline has been previously established24. Interested parties can request access to the AmyPype software through amypype.downloads{at}gehealthcare.com. For the primary analysis, CL>21 was considered the cut-point for a positive amyloid-PET scan. For illustrative purposes, scans were additionally classified as negative (CL<10), intermediate or so-called ‘gray-zone’ (10>CL<30), and positive (CL>30).
Visual assessment
Visual assessment was performed according to established reader guidelines; by the local readers for the total cohort and by 5 certified independent central readers for the selected subsample. Images were rated together with an T1-weighted MR scan or CT scan and as either positive (binding in one or more cortical brain region unilaterally, or striatum in case of [18F]flutemetamol) or negative (predominantly white matter uptake). In addition, regional\ classifications and reader confidence for both the final and regional visual classification based on a 5-point Likert scale were captured. To assess the effect of quantification disclosure on visual assessment of the 5 central readers of the challenging cases subsample, visual read (VR) and corresponding confidence were captured before and after disclosure of the quantification results. Readers also stated whether CL quantification and/or the regional z-scores supported their assessment or not. Readers were blinded to clinical information.
Importantly, all subsample cohort readers received a short training on the AmyPype processing pipeline, Centiloid quantification anchor points based on the review from Pemberton et al., (2022)2, and z-score quantification. The training material can be found in supplementary materials.
Statistical analysis
All analyses were performed in R Studio V4.2.2. Disease stage and etiological diagnostic group differences in quantitative amyloid burden were assessed using ANOVA analysis, corrected for age and sex. Agreement metrics were assessed using Cohen’s or Fleiss Kappa, when applicable. First, agreement between local readers and CL quantification status across the whole cohort and stratified by disease stage was assessed. Next, agreement between local readers and central readers was determined, where a majority VR was established based on the 5 readers (i.e., 3/5 assessments reflected majority Aβ status). Changes in reader confidence after disclosure of quantitative results were assessed using Wilcoxon rank test, as the data was left-skewed.
RESULTS
The total quantitative cohort consisted of 223 (30.1%) patients with SCD+, 258 (34.8%) with MCI, and 260 (35.1%) with dementia. The mean age was 70.8 years (7.6), 44.8% were female and the average MMSE was 25.5 (4.3). Overall, 49.5% of patients were considered VR-positive based on the local reader assessment (Table-1).
The ‘challenging’ subsample included mostly MCI patients (52 [51.5%], followed by SCD+ (29 [28.7%]), and finally dementia (20 [19.8%]). The mean age of this subpopulation was 72.5 years (7.6), 44.6% were female and the average MMSE was 26.2 (4.3). Overall, 57.4% of patients were considered VR-positive based on the local reader assessment (Table-2).
Quantitative amyloid burden across diagnostic groups
Global amyloid burden expressed in CL showed a stepwise increase with disease stage (SCD+<MCI<dementia: F=60.5, p<0.001, Table-1, Figure-1A) and was higher in AD than in non-AD or “not yet achieved” etiological diagnostic groups (F=411.9, p<0.001, Table-1, Figure-1B). However, amyloid burden did not differ across the different clinical disease stages within etiological groups (Supplementary Figure-1). Regional z-scores were highest in the AD group (pall<0.01) but did not differ between the non-AD and “not yet achieved” group.
Agreement between local readers and Centiloid quantification
For the whole quantitative cohort (N=741), the overall assessment of local readers highly agreed with CL status based on the predefined cutoff of 21 (κ=0.85, 92.3% concordance). This high agreement was consistently observed within disease stage, ranging from κ=0.87/94.6% for dementia cases, κ=0.82/92.3% for SCD+, and κ=0.80/89.8% for MCI (Figure-2).
Local vs Central readers
For the subsample enriched with challenging cases (N=101), the agreement between local and central readers was as per study design low (κ=0.21). Importantly, with majority central read as the reference standard, local readers were more inclined to classify an amyloid-PET scan as negative, resulting in 29 (28.7%) false-negative cases and 12 (11.9%) false-positive cases.
Quantification supports visual read of challenging cases
Across all 5 certified independent central reader assessments (N=505), global CL and regional z-score quantification was considered supportive of VR in 70.3% and 49.3% of assessments, respectively. Figure-3 illustrates changes in number of positive visual assessments (0-5 as per number of readers) pre- and post-disclosure of quantitative results. After disclosure of the quantitative results, we observed an improvement in concordance between the 5 readers (κbaseline=0.65/65.3%; κpost-disclosure=0.74/73.3%), which can be appreciated in the Figure-3 post-disclosure column, where relatively more cases were consistently VR- or VR+ for all readers. In line, we also observed a slight improvement in agreement between the consensus read and amyloid status based on CL (κbaseline=0.53; κpost-disclosure=0.60). Finally, a significant increase in reader confidence (Mbaseline=4.0 vs. Mpost-disclosure=4.34, W=101056, p<0.001) after disclosure of quantitative results was observed.
Nonetheless, some cases did not reach consensus between readers or showed clear discrepancy between visual assessment and Centiloid quantification. Examples are illustrated and further commented on in Figure-4.
Possible additional value of regional z-scores
As stated above, for 249/505 of the central assessments the regional z-scores were considered helpful in addition to the global CL quantification. This was more apparent for 3 out of 5 raters, who stated the regional z-scores added to their read in 74, 67, and 62 (out of 101) cases, compared to the 24 and 22 cases for the other 2 raters. For 48 cases across central readers more detailed comments were provided, which suggested that the main benefit of regional z-scores was in case of a borderline scan (28/48, 58.3%) and particularly for the frontal and PC/PCC regions, followed by quality of the image and/or atrophy (8/48, 16.7%). In 8 instances (16.7%), the regional z-score caused confusion rather than further support of the initial visual assessment.
DISCUSSION
In the prospective DPMS study, we observed excellent agreement between local visual reads and CL quantification across the clinical continuum. In a subgroup enriched for challenging cases we demonstrate that an improvement in reader agreement and confidence can be achieved by using quantification results. While overall agreement between local readers and quantification is high and consistent across cognitive stages, approximately 11% of scans in the AMYPAD-DPMS cohort representative of a typical memory clinical population was considered challenging due to a variety of causes, such as suboptimal scan quality, atrophy, or emerging Aβ pathology/borderline scan. In the case of the latter, regional z-scores might be of additional support to the global CL metric.
Our results suggest that quantification can support readers in determining Aβ status with high certainty, which is crucial considering the arrival of anti-amyloid therapies and their associated costs and potential side-effects. Similar to previous studies6,12, 25, approximately 8% of AMYPAD-DPMS cases showed discordance between local readers and CL quantification. Though the CL cut-off utilized in the current work was at the lower end of the range that best discriminates none-to-low and intermediate-to-high Aβ buren14, some considerable discrepancies were observed (Figure-2). These examples possibly reflect misdiagnosis and could explain some of the previously reported discrepancies between visual read status and final diagnosis for this cohort22. It is important to note, that even though the potential value of quantification has been previously demonstrated3-7, most studies probably underestimated its true impact in a real-world scenario, due to its limited value when visual assessment yields a clear positive or negative outcome. This study did evaluate amyloid-PET quantification performance in a wide range of amyloid load from negative to positive (local reads) and in a subset of challenging cases where quantification seems more beneficial as visual analysis alone can be insufficient or less accurate. Nonetheless, quantification should always be done in conjunction with visual assessment, to avoid misclassifications due to potential quantification errors. For example, one case had a very low CL value, but was consistently assessed as visual read positive by all readers (Figure-3&4 case B).
In addition to high certainty in Aβ status, the extent of burden as expressed in CL units also has clinical relevance considering the inclusion and discontinuation of treatment criteria implemented in the two successful anti-amyloid trials. More specifically, the Lecanemab Phase III trial defined amyloid-positivity as a CL>30, while Donanemab utilized a CL>37 cut-off. The real-world IDEAS study demonstrated that around two thirds of the discordant cases were assessed as visually positive but classified as amyloid negative based on CL12. In a future era of anti-amyloid therapies, the adjunctive use of quantification could avoid such false-positive patients being unnecessarily medicated for treatment regimens which potentially could last 1-2 years without any therapeutic value but with the risk of side effects. While quantification is already added to the label of both radiotracers used in this study by the EMA, current FDA guidelines for amyloid-PET do not mention the added value that quantitation could bring to reaching high confidence and accurate determination of Aβ-status based on visual reads. In addition, as the clearance rate for Donanemab was so high, the study also implemented a treatment discontinuation criterion, namely when the amyloid-PET quantification was CL<11 or CL<25 in two consecutive scans. To what extent these specific cut-offs will be implemented in the user-criteria for lecanemab and donanemab remains to be determined, as the current appropriate use recommendations (e.g., for lecanemab26) only elude to a ‘positive amyloid-PET or CSF result indicative of AD’. Nonetheless, some initial results suggest a steady clearance rate of Aβ, independent of baseline amyloid burden27. As such, future work should investigate whether the extent of baseline Aβ burden is predictive of necessary treatment duration to achieve full Aβ clearance. In such a setting, quantification will not only inform on Aβ status, but also optimize individual treatment plans.
A limitation of the study is that we did not repeat visual assessments by the local readers after disclosure of quantitative results. Though our central readers also had different levels of experience, this might have been ever more dispersed across 11 site local readers. Also, we had limited data to investigate whether subjects in the CL grey-zone would convert to a Aβ-positive status at follow-up.
CONCLUSION
In this prospective study and subsample enriched for challenging amyloid-PET cases, we demonstrate the value of quantification to support visual assessment. After disclosure, inter-reader agreement and confidence showed a significant improvement. These results are important considering the arrival of anti-amyloid therapies, which utilize the Centiloid metric for trial inclusion and target-engagement. Moreover, quantification could support determining Aβ status with high certainty, an important feature for treatment initiation.
DISCLOSURES
DA, IB, DVG, ILA, AP, and GBF report no relevant disclosures.
LEC has received research support from GE Healthcare and Springer Healthcare (funded by Eli Lilly), both paid to institution. Dr. Collij’s salary is supported by the MSCA postdoctoral fellowship research grant (#101108819) and the Alzheimer Association Research Fellowship (AARF) grant (#23AARF-1029663).
GNB is funded by the Deutsche Forschungsgemeinschaft (DFG) - Project-ID 431549029 - SFB 1451 and partially by DFG, DR 445/9-1.
MB is employed by GE HealthCare.
RW is employed by IXICO ltd.
RG is employed by Life Molecular Imaging
AWS is employed by Life Molecular Imaging
ZW has received research support from GE Healthcare.
PS is employed by EQT Life Sciences team.
AN has received consulting fee from H Lundbeck AB, AVVA pharmaceuticals and honoraria for lecture from Hoffman La Roche.
JDG has received research support from GE HealthCare, Roche Diagnostics and Hoffmann – La Roche, speaker/consulting fees from Roche Diagnostics, Esteve, Philips Nederlands, Biogen and Life Molecular Imaging and serves in the Molecular Neuroimaging Advisory Board of Prothena Biosciences.
AD has received research support from: Siemens Healthineers, Life Molecular Imaging, GE Healthcare, AVID Radiopharmaceuticals, Sofie, Eisai, Novartis/AAA, Ariceum Therapeutics, speaker Honorary/Advisory Boards: Siemens Healthineers, Sanofi, GE Healthcare, Biogen, Novo Nordisk, Invicro, Novartis/AAA, Bayer Vital, Lilly Stock: Siemens Healthineers, Lantheus Holding, Structured therapeutics, Lilly. Patents: Patent for 18F-JK-PSMA-7 (Patent No.: EP3765097A1; Date of patent: Jan. 20, 2021).
SM received speaker honoraria from GE Healthcare, Eli Lilly and Life Molecular Imaging.
CB is employed by GE HealthCare.
VG is supported by the Swiss national science foundation (project n.320030_185028 and 320030_169876), the Aetas Foundation, the Schmidheiny Foundation, the Velux Foundation, the Fondation privée des HUG. She received support for research and speakers’ fees from Siemens Healthineers, GE HealthCare, Janssen, Novo Nordisk, all paid to institution.
GF is employed by GE HealthCare.
FB is supported by the NIHR biomedical research centre at UCLH. Steering committee or Data Safety Monitoring Board member for Biogen, Merck, Eisai and Prothena. Advisory board member for Combinostics, Scottish Brain Sciences. Consultant for Roche, Celltrion, Rewind Therapeutics, Merck, Bracco. Research agreements with ADDI, Merck, Biogen, GE Healthcare, Roche. Co-founder and shareholder of Queen Square Analytics LTD.
ACKNOWLEDGMENTS
The project leading to this paper has also received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 115952. This Joint Undertaking receives the support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. This communication reflects the views of the authors and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein.