Remote monitoring of progression in early Parkinson’s disease: reliability and validity of the Roche PD Mobile Application v2

Digital health technologies (DHTs) enable remote and therefore frequent measurement of motor signs, potentially providing reliable and valid estimates of motor sign severity and progression in Parkinson’s disease (PD). The Roche PD Mobile Application v1 was revised to v2 to include more measures of bradykinesia, and bradyphrenia and speech tests, to optimize suitability for early-stage PD. It was studied in 316 early-stage PD participants who performed daily active tests at home then carried a smartphone and wore a smartwatch throughout the day for passive monitoring (study NCT03100149). Adherence was excellent (96.29%). All pre-specified sensor features exhibited good-to-excellent test-retest reliability (median intraclass correlation coefficient = 0.9), and correlated with corresponding Movement Disorder Society - Unified Parkinson's Disease Rating Scale items (rho: 0.12–0.71). These findings demonstrate the preliminary reliability and validity of remote at-home quantification of motor sign severity with the Roche PD Mobile Application v2 in individuals with early PD. UPDRS upper limb bradykinesia item scores, and showed convergent and divergent validity in cross-correlations with MDS-UPDRS Part III subscale scores, correlating numerically most strongly with bradykinesia compared with all other subscale scores. These findings indicate that the Roche PD Mobile Application v2 bradykinesia tests indeed reflect the neurological concept of upper limb bradykinesia. Finger tapping and pronation/supination tasks are well-established assessments of upper limb bradykinesia as evidenced by their inclusion in both the UPDRS 23 and MDS-UPDRS 7 . Over the last decade, different digitized variants of finger tapping and pronation/supination tests have been developed 19 . Despite methodological differences, studies of these DHT tasks generally showed good correspondence between finger tapping sensor features and respective clinical ratings, as well as the ability to differentiate healthy controls from individuals with early PD, and individuals with early PD from individuals with later-stage PD 16,20,24-27 , in line with the present findings. While the literature on digitized pronation/supination assessments is less rich than for finger tapping, available results also consistently demonstrate correlations with related clinical scores and the ability to differentiate healthy participants from individuals with PD 14,28-31 . Spiral drawing is traditionally used in behavioral neurology to assess fine motor impairment including bradykinesia and tremor 32-35 . DHT versions of spiral drawing demonstrated that time to completion correlated with clinician ratings of bradykinesia severity, and differentiated PD cases from controls 34 . The majority of previous DHT spiral drawing tasks used pens/digital pens to draw on regular paper or tablets, a more challenging motor task compared with the present finger drawing on smaller smartphone touch screens. In the present study, celerity, i.e. accuracy/time to complete spiral shape tracing on the smartphone screen, was pre-specified to additionally consider the accuracy of directed fine motor movements in the unsupervised at-home setting. Spiral celerity correlated with MDS-UPDRS bradykinesia measures, and the strength of

Introduction unilateral bradykinesia and rigidity (first upper then lower extremities) to midline functions to bilateral bradykinesia and rigidity and finally general movement problems 8 . These findings confirm the centrality of bradykinesia in early PD 9 and suggest that bradykinesia is a critical motor progression marker in early PD 1 . However, the quantification of bradykinesia and other motor signs in early PD with rating scales such as the MDS-UPDRS remains a challenge: MDS-UPDRS Part II scores change little in early PD 10 , i.e. less than the established minimal clinically meaningful difference 11 , and RMT analyses of MDS-UPDRS Part III item scores revealed multiple measurement irregularities in scores during the first 2 years of PPMI 8,12 . These findings highlight the urgent need for alternative methods of motor sign quantification in the earliest stages of PD.
Many smartphone-and smartwatch-based DHTs have been developed to estimate bradykinesia and other motor and non-motor signs of PD 5,[13][14][15][16][17][18][19] . Finger tapping is one of the most commonly used DHT measures of bradykinesia 19 . When sensor-based finger-tapping data are aggregated over 2-week periods, test-retest reliabilities increase 5 and correlate with MDS-UPDRS Part III clinician ratings of finger-tapping performance in patients with early PD 5 . Additional bradykinesia tests implemented on smartphones include measures of hand turning and leg agility (by holding the phone on the thigh and lifting and stomping the foot), for example as implemented on CloudUPDRS 14,15 . Smartwatches offer additional means to estimate bradykinesia during daily life, for example by estimating the time taken to move a utensil from a plate to the mouth while eating. 13 Most DHT solutions for PD such as CloudUPDRS 14,15 , HopkinsPD 16 , mPower 17

and the
Roche PD Mobile Application v1 20 test not only bradykinesia, but also tremor, gait, and balance, thereby providing a profile of motor impairments for estimating and tracking PD motor severity.
DHTs may additionally benefit from measures of cognition such as information processing speed . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021. 10.07.21264414 doi: medRxiv preprint (e.g., electronic Symbol Digit Modalities Test [eSDMT]) and speech, which are also affected in the earliest stages of PD 18 .
The present report describes the reliability and validity of the Roche PD Mobile Application v2, a revision of V1 to include multiple novel measures of bradykinesia and cognition (eSDMT and speech tests), and to optimize existing tasks for the detection of PD motor signs.
Three hundred and sixteen individuals with early-stage PD (<2 years; Hoehn and Yahr stages I or II) who are participating in the Phase II PASADENA study (NCT03100149) were provided with a smartphone and smartwatch with the Roche PD Mobile Application v2 preinstalled, and requested to perform active tests daily on the smartphone (4 or 5 out of 10 tests each day, information processing speed once per fortnight), and to carry the smartphone and wear a smartwatch throughout the day to collect passive monitoring data. Pre-specified sensor features were calculated for each active test and for passive monitoring, and aggregated over the first two 2-week periods of the study. Adherence, test-retest reliability, and clinical validity (relationship to baseline clinical scales, known-groups validity) were quantified. These metrics represent the grounds for judging the potential utility of the Roche PD Mobile Application v2 to quantify and track progression of disease severity in early PD.

Adherence
On average, daily remote active testing took a median of 5.3 (interquartile range [IQR] = 1.7) minutes on days without the SDMT, and 7.32 (median; IQR = 1.58) minutes on days with SDMT.
Average adherence was high with 96.29% (median per participant) of all possible active tests . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint performed during the first 4 weeks of the study. Participants contributed a median of 8.6 hours/day of smartphone and a median of 12.79 hours/day of smartwatch passive monitoring data.

Reliability of sensor features
Reliability results are reported in Table 2. All 17 pre-specified sensor features demonstrated goodto-excellent 21 test-retest reliability between the first two 2-week study periods (ICCs ≥0.75) ( Table   2). 22 The median sensor feature ICC was 0.9 (range, 0.75-0.95).

Clinical validity of sensor features
Clinical validity was assessed via Spearman's correlations between the sensor features and corresponding MDS-UPDRS subscale and item scores (Fig. 1). Correlations with MDS-UPDRS item scores revealed that all sensor features correlated with their corresponding clinical items (  (Fig. 2).
The Postural Instability/Gait Disorders (PIGD) subscale score showed the strongest correlations with the U-turn and the Hand Turning test (most affected side). Rigidity subscale scores correlated . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021.

Sensor feature sensitivity to side differences
Sensor features values from all lateralized tests demonstrated significant differences between the most and least affected sides (Table 3). Moreover, sensor features and MDS-UPDRS scores measuring the same motor sign on the less affected (or more affected) side were more strongly correlated than sensor features and MDS-UPDRS scores measuring the same motor sign on different sides of the body (Fig. 7).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint

Discussion
The Roche PD Mobile Application v1 was designed to measure the core motor signs of PD, and was recently revised to v2 to primarily include two new active tests of bradykinesia (Hand Turning, Draw A Shape), as well as a test of psychomotor slowing (eSDMT) and a speech test. In addition, the original gait task was revised to a U-turn test, and a smartwatch was incorporated into the remote passive monitoring procedure. Preliminary test-retest reliability scores for the pre-  19 . Despite methodological differences, studies of these DHT tasks generally showed good correspondence between finger tapping sensor features and respective clinical ratings, as well as the ability to differentiate healthy controls from individuals with early PD, and individuals with early PD from individuals with later-stage PD 16,20,[24][25][26][27] , in line with the present findings. While the literature on digitized pronation/supination assessments is less rich than for finger tapping, available results also consistently demonstrate correlations with related clinical scores and the ability to differentiate healthy participants from individuals with PD 14,28-31 . Spiral drawing is traditionally used in behavioral neurology to assess fine motor impairment including bradykinesia and tremor [32][33][34][35] . DHT versions of spiral drawing demonstrated that time to completion correlated with clinician ratings of bradykinesia severity, and differentiated PD cases from controls 34 . The majority of previous DHT spiral drawing tasks used pens/digital pens to draw on regular paper or tablets, a more challenging motor task compared with the present finger drawing on smaller smartphone touch screens. In the present study, celerity, i.e. accuracy/time to complete spiral shape tracing on the smartphone screen, was pre-specified to additionally consider the accuracy of directed fine motor movements in the unsupervised at-home setting. Spiral celerity correlated with MDS-UPDRS bradykinesia measures, and the strength of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021.
these correlations was numerically smaller compared with Finger Tapping and Hand Turning. This may be due to the relative difficulty of the latter two tasks compared with spiral drawing, which may have challenged individuals more, thereby revealing greater impairment. We note that additional sensor features (e.g. variability in drawing speed, hesitation), analyzed either individually or combined within and across shapes, are expected to provide additional meaningful information, as has been shown for PD and multiple sclerosis 36,37 .
Passive monitoring with smartwatches provides a unique opportunity to explore slowing of upper limb movements during daily life. Here, sensor data segments during gesture movements were identified from the circa 90% non-walking periods in the passive monitoring sensor data stream, using the squared magnitude of the accelerometer sensor movement as the sensor feature.
This same feature has been related to decreased expressivity in patients with schizophrenia with negative symptoms 38 . Here, gesture power was specifically related to the MDS-UPDRS bradykinesia subscore and item scores, as well as the rigidity subscore, and is in line with a slowing of hand movement in daily non-gait-related activities such as gesturing when speaking, eating, etc.
These findings are consistent with previous research with wrist-worn wearables, which traditionally focused on arm swing during gait [39][40][41] , as well as multi-sensor systems used to measure the impact of bradykinesia on activities of daily living 13,42 . Thus, passively monitored motor behavior in daily life may facilitate our understanding of the effect and burden of PD on individuals' daily lives.
The eSDMT 43 is commonly applied to measure psychomotor slowing, or bradyphrenia, one of the earliest cognitive signs in PD, appearing up to 5 years prior to a PD dementia diagnosis 18 . However, as the test requires multiple cognitive functions, it is not surprising that it is sensitive to many forms of neurologic impairment. 44 Indeed, while SDMT performance is reduced . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint in PD 45 , impairments are exacerbated in individuals with PD with concomitant vascular 46 and amyloid 47 imaging findings. A standard SDMT outcome measure,-number of correct responses in 90 seconds, was pre-specified for the present analyses of the eSDMT, and showed 'good' 21 testretest reliability (ICC = 0.75). However, it correlated only weakly (rho = -0.18) with the MDS-UPDRS item 1.1. assessing global cognitive impairment. This finding is surprising given the catchall nature of both the eSDMT and MDS-UPDRS item 1.1., but may be accounted for by the fact that cognitive impairments were excluded during the screening process in the PASADENA study, leading to a truncation of range in both scores (see Supplementary Fig. 1). We note that we attempted to minimize the effect of bradykinesia on eSDMT scores by requiring a simple tap response on a number pad displayed at the bottom half of the smartphone screen. Nevertheless, to mitigate the risk of this confound, eSDMT performance could be controlled by a non-cognitively demanding motor test using a similar response format.
Voice and speech impairments in PD are varied and generally summarized under the term dysarthria, and include resonatory, articulatory, phonatory, prosodic and respiratory components 48 .
This symptomatology and its relevance to patients' daily lives motivated the inclusion of a Sustained Phonation task in the suite of active tests, and the development of the novel Speech test.
Voice jitter was pre-selected as a proxy of disordered vocal fold function for the sustained phonation test. In line with previous research 49  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. and subscale scores. This is consistent with similar DHT reports 5,24,50 . The novel U-turn test (which instructed individuals, if safe to do so, to walk several paces and make a U-turn at least five times) and the identification of turning while walking throughout the day in passive monitoring sensor data, were motivated by findings that turning is particularly impaired in PD 5,51,52 . For example, a 360 degree walking turn and instrumented timed-up-and-go test showed strong reliability and discriminated controls from PD participants 53,54 . Similarly, sensor-based measures of turn speed in daily life differentiated PD individuals from controls 55 . In the present study, turn speed measured in both the active test and passive setting correlated with MDS-UPDRS 3.14. body bradykinesia item scores, but was not specifically related to MDS-UPDRS PIGD relative to other subscores. While neither measure of turn speed differentiated between less and more affected . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. A machine learning approach was also used to combine different HopkinsPD baseline sensor . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint features to predict clinically significant events (e.g. falls, functional impairment) at the 18-month follow-up 61 . In contrast to data-driven approaches to composite score development, a clinical outcomes assessment approach could be applied whereby information from individuals with PD inform the selection of sensor features such that they optimally reflect what matters most to patients 62 .
Several facets of the present study limit the generalizability of the findings. Firstly, all individual's disease duration was <2 years, and individuals were in Hoehn and Yahr Stages I or II.
Thus, the applicability of the present findings to later-stage or prodromal PD is unknown. The reduced range of disease severities also appeared to limit the ranges of some DHT and clinical measures, which consequently limited the possibility to detect relationships between the two (Supplementary Fig. 1) week periods of DHT data were analyzed; thus, the long-term adherence to the remote monitoring procedure and ability of sensor features to detect changes over time remain to be established.
Towards this end, it is critical to quantify and report test-retest reliabilities of sensor feature scores towards assessing a sensor feature's potential to detect changes over time 63 and any deviation from normal progression as a function of e.g. pharmacological interventions.
The Roche PD Mobile Application v2 was designed to measure the severity of early PD core motor signs and to provide information complementary to established clinical outcome . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table 1

Roche PD Mobile Application v2
The Roche PD Mobile Application v2 consists of an application installed on a provisioned smartphone and smartwatch (see Fig. 1). The PD Mobile Application prompted participants to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint perform the active tests described below. All unilateral tests were performed twice, once with each side of the body. 5. Phonation: participants were instructed to make a single, continuous "aaaah" sound for as long as possible with one breath and in a steady pitch and volume while the phone was held at the ear (timeout: 30 seconds); 6. Postural tremor: participants were instructed to sit with their eyes closed, and to hold the smartphone in an outstretched hand while counting down out loud from a pre-specified number that differed for each test administration (15 seconds per hand); 7. Rest tremor: participants were instructed to sit with their eyes closed and to hold the phone in the palm of their hand, with their forearm resting on their thigh, and to count down out . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint of the motor tests were presented on alternating days, and the eSDMT every 2 weeks (Fig. 1), with a total expected testing time including transitions and test-start countdowns between tests of 5-10 minutes (including eSDMT).

Sensor data processing
The raw sensor data from the smartphone and smartwatch were extracted and converted into pre-  Table 2).
Data underwent quality control (QC) checks to ensure that the tests had been performed properly.
Towards this end, QC metrics were generated. For example, one QC metric quantified the amount of energy from the accelerometer during the Hand Turning test to estimate whether the smartphone was lying still (e.g. on a table) or moving during the test. 0.3% (n=179/56,786) of digital active test data not meeting the pre-specified QC thresholds and were therefore excluded from the analyses.

Statistical analyses
Sensor features from passive monitoring and each active test performed were summarized (median) over 2-week intervals starting at the baseline visit (Weeks 1 and 2) and in the 2-week . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint period thereafter, provided that ≥3 data points were available during each 2-week testing interval (Supplementary Table 3). For convergent validity, the averaged (median) sensor data collected during the first two study weeks were compared with clinical data collected at the baseline visit (Day 1) using Spearman's correlations. Adherence and test-retest metrics were calculated for aggregated sensor features for the first two 2-week study periods. Adherence was defined as the

Financial disclosures
The authors declare that the study is funded by F. Hoffmann-La Roche Ltd and Prothena Inc. F.
Hoffmann-La Roche Ltd and Prothena Inc were involved in the study design, collection, analysis, interpretation of data, the writing of this article and the decision to submit it for publication.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Ethics declarations
Participants were identified for potential recruitment using site-specific recruitment plans prior to consenting to take part in this study. Recruitment materials for participants had received . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint Table 1 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 10, 2021.      . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 10, 2021. ; https://doi.org/10.1101/2021.10.07.21264414 doi: medRxiv preprint