1 POPULATION ANALYSIS OF MORTALITY RISK: 2 PREDICTIVE MODELS USING MOTION SENSORS FOR 3 100,000 PARTICIPANTS IN THE UK BIOBANK NATIONAL COHORT

ABSTRACT

ubiquitous in high-income countries already and will become ubiquitous in low-income countries 23 in the near future. Our study simulates smartphones by using accelerometers as sensor input. 24 We analyzed 100,000 participants in UK Biobank who wore activity monitors with motion 25 sensors for 1 week. This national cohort is demographically representative of the UK population, 26 and this dataset represents the largest such available sensor record. We performed population 27 analysis using walking intensity, with participants whose motion during normal activities 28 included daily living equivalent of timed walk tests. We extract continuous features from sensor 29 data, for input to survival analysis for predictive models of mortality risk.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint with sensor records, the demographically representative UK Biobank. Death is the most definite 48 outcome, accurate death records are available for 100,000 participants who wore sensor devices 49 some five years ago. We analyzed this dataset to extract walking sessions during daily living, 50 then used these to predict mortality risk. The accuracy achieved was similar to activity monitors 51 measuring total activity and even to physical measures such as gait speed during observed walks. 52 Our scalable methods offer a potential pathway towards national screening for health status.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint 54 The association of physical activity with mortality risk is well known. National cohort studies 55 using self-reports have shown intensity is correlated with survival, so persons with lower 56 mortality have more moderate-to-vigorous activity and less sedentary activity [1]. These studies 57 focus upon the volume of activity at certain level of intensity. These have been replicated with 58 large meta-analysis studies using objective physical activity, where wearable sensors record total 59 activity and statistical models predict mortality risk from accelerometer measures [2]. Cohort corridor, is a standard evaluation for cardiopulmonary disease. This test has been shown in large 69 meta-analysis studies to be strong independent predictor of mortality from heart failure [7]. 70 We analyze the largest national cohort, the UK Biobank [8], where 103,683 participants wore 71 accelerometer devices for 1 week as wrist sensors [9]. Following our previous analysis of a 72 national cohort for physical activity in the US Women's Health Initiative [10], we used raw 73 sensor data during labelled walking sessions to identify characteristic motions for predictive 74 models. This is the first population analysis of walking intensity with mobile sensors, and uses 75 only inputs that could be accurately gathered using only personal smartphones.

INTRODUCTION
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint 4 76 As surmised from this introduction, there are four primary methods for measuring physical 77 activity. It will be shown later that all these methods achieve roughly the same accuracy for 78 predictive models of mortality risk. Two methods are active, requiring persons explicitly do 79 some activity, such as answering a questionnaire concerning their health status (self-report) or 80 walking fixed distance under observation (gait-speed). These have proven feasible within cohort 81 studies, but are problematic for population health, due to logistic difficulty of getting large 82 numbers of people to perform the required tasks on a routine basis. Two methods are passive, 83 requiring persons to wear measurement devices, such as activity monitors, to measure total 84 activity during the day or specific measures like walking pace over limited periods. These 85 sensor-based methods have the major advantage that they can measure physical activity in daily 86 living, without requiring persons to change their normal activity other than wearing the devices.

122
We show short bursts of steady walking suffice for predictive models of mortality risk, evaluated 123 using raw sensor data for 100,000 participants in UK Biobank. Our Results evaluate the model 124 accuracy for mortality risk using walking intensity, defined as 12 walking windows of 30 125 seconds each during a consecutive session, representing daily living versions of walk tests. Our 126 accuracy is comparable to previous models using daily profiles of activity volume. Our methods 127 are logistically easier, with 6 minutes per day (12 windows) rather than 600 minutes (10 hours)  Death Registry is used to determine which participants had died by that time.

141
As detailed in the Methods section, we choose 20 traditional predictors, from self reports and 142 laboratory tests. These 20 questions are listed in Table 1 as the Categorical Features. The full 143 encoding from UK Biobank data is given in Supplementary Table S1. We also choose 76 derived . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  Figure 1 gives the computation flowchart for predictive models.   Table 3.  practical outcomes, we also include risk factors that are easy to change (especially modifiable), . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint 9 190 such as smoking and alcohol, health (general) and obesity (BMI). Min Model values are given in 191   Table 4. We rank order the top 10 features, after considering all 76 features.

192
If only selected continuous features are considered, then the cumulative effort on model 193 accuracy is given in Table 4a for this minimum set. The top accelerometer feature is  When easy risk factors are also included, acceleration magnitude features improve C-index.

199
As shown in Table 4b for Min Model, health and MPD beat smoking and obesity (BMI). This  These mortality curves are shown in Figure 5, demonstrating sensor measures provide 212 independent value. Higher ENMO values predict longer survival, independent of age and sex.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint The other acceleration magnitude features (MPD/MAD) have similar independence graphs. So 214 the activity intensity predictor ENMO predicts mortality risk --higher magnitude is lower risk.

235
The participants can thus be relied upon to wear the devices all day, so the studies assume 10 236 hours per day of wear time during normal activities. For effective usage in daily living, the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint 11 237 patients must continuously wear a medical quality sensor device. In contrast, our methods 238 assume a single 6MWT per day, so 6 minutes rather than 600 minutes, two orders of magnitude 239 less sensor data. Our methods enable the usage of cheap smart phones, since often carried while 240 walking yet having adequate accelerometers for predictive models of pulmonary function [16].

241
With cardiopulmonary patients, intensity is more important than duration, as shown in large comparative purposes as special additions to biobank features. These concern the crossing rate, 256 how often the acceleration changes from above the mean to below the mean and vice versa,  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. The UK Biobank accelerometer dataset has also been analyzed by a concurrent study which used 263 cooked data to analyze the activity volume [26], rather than using the raw data as we did to 264 analyze the walking intensity. This study took the 5 second averages from Biobank field 90004 265 and further averaged over 1 minute intervals. Our study used raw signals from field 90001,     A walk test measures "quality" (intensity) rather than "quantity" (volume). Our previous work 308 showed accelerometer sensors in carried smartphones can digitally model physical distance [27] 309 and oxygen saturation [28] from Six Minute Walk Test (6MWT). We also showed the 310 pulmonary models similarly worked with carried smartphones during daily living [16]. The 311 logistic advantage of 6 minutes walking intensity is two orders of magnitude less frequent sensor 312 input, using ENMO for quality versus RA for quantity. This makes it possible to effectively 313 utilize smart phones instead of wearable sensors for predictive models.

322
We are involved in planning the physical activity study for the US Precision Medicine 323 Initiative (All of Us Research Program). This cohort is projected to become the largest national 324 cohort with more than 1M participants, close to half already registered. These participants are 325 being recruited to be representative of the national population, which is far more diverse in the 326 US than in the UK. All agreeing participants would be longitudinally measured on their personal 327 smartphones, both larger and longer than our mortality study as well as directly utilizing phone 328 sensors for the measurement study.

329
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022.  The study is longitudinally collecting participants' information, including data from is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint 376 due to careful derivation from training set of representative participants who wore head-mounted 377 cameras to visually identify activities. We included any participant with at least one session of 378 steady walking, defined by 12 consecutive walking windows. Only windows labelled as walking 379 were considered input data for feature extraction. We exclude 2758 participants for insufficient 380 walking, which with other minor exclusions, yields total 100,655 participants for our analysis. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 21, 2022. There is no current standard for walk tests during daily living. We chose 6 minutes as 399 empirical lower bound for cardiopulmonary slowdown during steady walking, with relaxed 400 criteria to allow longer periods. During daily living, a person may walk more slowly than when 401 they are pushing hard during walk test, so it might take longer for them to experience SOBOE.

402
Thus we require 12 consecutive walking windows to be the "equivalent of 6MWT", and include week, although only 10% walk half an hour in 6 minute sessions as shown in Figure 6.  6MWT sensor records, so more data points are needed than simply average sensor data over a 420 labelled walking window.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table 2, while Supplementary Table 2 gives their formulas. 429 We added dimensional data, such as x-y-z, plus computing our own features from the   The raw data was collected into 30-second windows over the entire week of recording, each 443 window contained 3000 3-axis motion samples from field 90001. This comprised 25 terabytes.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   We then perform 10-fold cross-validation [43]. Cross-validation procedures are more stable 462 than pre-fixing testing data since they enable all observed data to be used in evaluation steps.  We use a stratified 10-fold cross-validation approach, since the proportion of death is small 468 (about 2% of participants). For each model with maximum follow-up length (1/2/3/4/5 years), 469 we consider the data is randomly split into 10 equal-sized subsets. Each subset contains 1/10 of 470 the live data (participants who are still alive or censored by the maximum follow-up time) and 471 1/10 of the dead data (participants who have died by the maximum follow-up time). With these 472 10 equal-size datasets, a single subset is utilized for testing the model and the remaining 9 473 subsets are used as training data. The cross-validation process is repeated 10 times with every 474 subset used exactly once as the testing data. Finally, the 10 results from the folds can be 475 averaged to produce a single estimation for a specific model with exact $alpha$ and $lambda$.

476
For each $alpha$, the $lambda$ with best performance is selected from grid as model parameter.

477
In addition to the regularized Cox proportional hazards model, we fit other models to compare 478 their performance. We adopt stepwise selection to choose variables. With fixed variables as 479 input, we set the prediction performance as inclusion criteria to do stepwise forwards selecting 480 over these variables. In every step, the variable that increases the C-index the most based on the 481 previous selected variables is included in the group. The selection runs until the increment is less 482 than a specific threshold. We have tested over traditional predictors and accelerometer derived 483 predictors with threshold 0.01 and 0.001, to enable model evaluation.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.    . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   Table 1 Categorical Features from UK Biobank dataset fields.
612 Table 2 Continuous Features from Participant Sensor Records.  Table 3 Max Model results with C-index, Feature Sets versus Risk Years.  Table 4 Min Model results sensor only, with Cumulative C-index rankings.

Figure 5
Demographic Independence Curves for sensor records.   Figure S1 Lasso Model. Hierarchy tree average with red selected features.

Figure S2
Geographic Variation of Models across Cohort sites.

630
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 21, 2022. ; https://doi.org/10.1101/2022.04.20.22274067 doi: medRxiv preprint