Self-supervised learning of accelerometer data provides new insights for sleep and its association with mortality

Summary Background. Sleep is essential to life. Accurate measurement and classification of sleep/wake and sleep stages is important in clinical studies for sleep disorder diagnoses and in the interpretation of data from consumer devices for monitoring physical and mental well-being. Existing non-polysomnography sleep classification techniques mainly rely on heuristic methods developed in relatively small cohorts. Thus, we aimed to establish the accuracy of wrist-worn accelerometers for sleep stage classification and subsequently describe the association between sleep duration and efficiency (proportion of total time asleep when in bed) with mortality outcomes. Methods. We developed and validated a self-supervised deep neural network for sleep stage classification using concurrent laboratory-based polysomnography and accelerometry data from three countries (Australia, the UK, and the USA). The model was validated within-cohort using subject-wise five-fold cross-validation for sleep-wake classification and in a three-class setting for sleep stage classification wake, rapid-eye-movement sleep (REM), non-rapid-eye-movement sleep (NREM) and by external validation. We assessed the face validity of our model for population inference by applying the model to the UK Biobank with 100,000 participants, each of whom wore a wristband for up to seven days. The derived sleep parameters were used in a Cox regression model to study the association of sleep duration and sleep efficiency with all-cause mortality. Findings. After exclusion, 1,448 participant nights of data were used to train the sleep classifier. The difference between polysomnography and the model classifications on the external validation was 34.7 minutes (95% limits of agreement (LoA): −37.8 to 107.2 minutes) for total sleep duration, 2.6 minutes for REM duration (95% LoA: −68.4 to 73.4 minutes) and 32.1 minutes (95% LoA: −54.4 to 118.5 minutes) for NREM duration. The derived sleep architecture estimate in the UK Biobank sample showed good face validity. Among 66,214 UK Biobank participants, 1,642 mortality events were observed. Short sleepers (<6 hours) had a higher risk of mortality compared to participants with normal sleep duration (6 to 7.9 hours), regardless of whether they had low sleep efficiency (Hazard ratios (HRs): 1.69; 95% confidence intervals (CIs): 1.28 to 2.24 ) or high sleep efficiency (HRs: 1.42; 95% CIs: 1.14 to 1.77). Interpretation. Deep-learning-based sleep classification using accelerometers has a fair to moderate agreement with polysomnography. Our findings suggest that having short overnight sleep confers mortality risk irrespective of sleep continuity.

13 Five-class sleep staging (wake/REM/N1/N2/N3) for internal validation:  CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ; To obtain a feature extractor by leveraging a large amount of unlabelled data 681 from the UK Biobank, we applied multi-task self-supervised learning following [8].

682
In self-supervision pre-training, the model was designed to discriminate whether a 683 27 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint set of binary transformations have been applied to the signal. We selected reversal, 684 permutation, and time-warping as potential self-supervised learning because they are 685 suitable for learning spatiotemporal patterns.

686
The feature extractor was built on top of ResNet-17 V2 [9] with 1D convolution, 687 in total, with 10M parameters. Each feature vector is of size 1024. We used cross-688 entropy as the cost function, with each task having the same weight to balance the 689 features learned from each task. In the training procedure, we applied axis swap and 690 rotation as data augmentation to obtain a representation that is orientation invariant.

691
During training time, we used a batch size of 2000 as a larger batch size was found 692 to produce features with better quality. Adam [10] was used for optimisation with a 693 learning rate of 1e-3. We distributed the training across 4 Tesla V100-SXM2 GPUs 694 with 32GB. Early-stopping with a patience of five steps was used to avoid overfitting.

695
It took about 420 GPU hours for the model to converge. More details can be found   The learning rate was set to be 1e-3. We also set the gradient clapping to 1 to 707 avoid exploding gradient for LSTM. We used weighted Cross-Entropy as the objective 708 function and weighted each class with the inverse of its frequency to account for the 709 imbalanced dataset. We also used rotation and axis swap to augment the input data CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint Tesla V100-SXM2 with 32GB of memory. It took about 12 hours for the model to 715 converge. The model performance was reported using five-fold subject-wise cross-716 validation. We first split the data into train/test with a ratio of 8:2. We further split 717 the train set into train/validation with a ratio of 8:2. We used early stopping with a 718 patience of ten steps to avoid overfitting on the validation set in each cross-validation 719 fold.

29
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint Table 4: Sleep parameter definitions: total sleep duration (TSD), rapid-eye-movement (REM), non-rapid-eye-movement (NREM), sleep onset latency (SOL), wake after sleep onset (WASO), and sleep efficiency (SE).

Parameter Definition
Total sleep duration (TSD) The total time spent in sleep during the recording period per day.
Overnight sleep duration The longest sleep window duration (max one hour of sleep discontinuity allowed) over a noon-to-noon interval.

31
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.  The UK Biobank variable codes are shown in Table 5. We used the month of birth 722 (p52) and year of birth (p34) along with device wear time (p90010) to compute the 723 age at wear time. Participants were asked about their insomnia symptoms history 724 (p1200) by "Do you have trouble falling asleep at night or do you wake up in the 725 middle of the night?". Four responses were possible: "never/rarely", "sometimes", 726 "usually", and "prefer not to answer". CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ; the sensitivity analysis, seven sleep groups were created on exact hour cut-offs to 735 capture the variations in participants with lower and higher sleep durations.

736
Mortality was determined using death registry data (obtained by UK Biobank  In addition to the exclusions described for the analyses above, for prospective 745 analyses for incident mortality we further excluded the participants if they had a 746 prior hospitalisation for restless syndrome, any cardiovascular disease or cancer (a 747 hospital episode with primary diagnosis G473, I00-I99 or C00-C99).

756
Results are presented with their 95% confidence intervals. The Floating Absolute 757 Risk approach was used to calculate confidence intervals for the estimate in each 758 group, without contrast to a reference group [16, 17,18]. 759 In statistical testing using the Grambsch-Therneau test with the Kaplan-Meier

33
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. We needed to discard participants with too much non-wear time to obtain a stable

35
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.  Table 6: Subject-wise sleep stage classification for benchmark models using internal validation datasets with the Raine Study and the Newcastle cohort: The random forest model was trained using hand-crafted features. SleepNet is the deep recurrent network without pre-training. SleepNet-SSL is the network pre-trained using self-supervision. Five-fold subjectwise performance metrics (mean ± SD) are reported using the internal validation data. REM: rapid-eye-movement sleep, NREM: non-rapid-eye-movement sleep, Kappa score: κ.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. Table 7: Subject-wise performance sleep classification validation using our bestperforming model: All the performance is reported within period in bed. Cohort-specific and pooled performance (Kappa (κ), balanced accuracy, and F1) are shown for both internal and external validation. The pooled performance is calculated by combining all the participants from different datasets. REM: rapid-eye-movement sleep; NREM: non-rapid-eye-movement sleep.

37
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ; 38 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ; Table 9: Model characteristics on the internal validation datasets (wake versus REM versus NREM): subject-wise performance metrics (mean ± SD) are reported using the internal validation data. REM: rapid-eye-movement, NREM: non-rapid-eyemovement, Kappa score: κ.

39
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ;

40
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint 8.2. Cohort-specific performance against polysomnography using SleepNet 797 41 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

42
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

43
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

44
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

45
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

46
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

47
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

48
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

49
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted

50
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

51
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

Mean acceleration (g)
A sample night for a participant in their 50s Figure 16: A sample actigram, hypnogram ground truth and prediction for a participant whose sleep stages are well captured: the top hypnogram is the ground-truth and the bottom hypnogram is the prediction generated by SleepNet based on the actigram. REM: rapid-eye-movement sleep, N1, N2, N3: non-rapid-eye-movement sleep 1, 2, 3.

52
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023. ; 54 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 8, 2023.

55
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Device-measured overnight sleep duration (h) Figure 19: Box plots showing the distributions of device-measured overnight sleep duration against self-reported total sleep duration. The box whiskers reflect the lowest and highest data points that are 1.5 times of the inter-quartile-range from the median.

56
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

57
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Figure 22: Device-measured sleep probability trajectories throughout the day for the UK Biobank participants (weekday vs weekend). Top: variations of the average overnight sleep probability for the participants with self-reported "morning" and "evening" chronotype (a) and the overnight sleep distributions across thirds of device-measured physical activity level (b). Bottom: variations of the average REM (c) and NREM (d) probability in participants with a history of self-reported insomnia symptoms versus those without. Rapid-eye-movement sleep (REM), and non-rapid-eye-movement sleep (NREM). Areas of squares represent the inverse of the variance of the log risk. And the I bars denote the 95% confidence interval for the floated risks.

58
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint all-cause mortality. The model used 1,642 events among 62,214 participants. We used age as the timescale and adjusted for sex, ethnicity, Townsend Deprivation Index of baseline address (split by quarter in the study population), educational qualifications, smoking status, alcohol consumption (Never, <3 times/week, 3+ times/week), overall activity (measured in milli-gravity units). Areas of squares represent the inverse of the variance of the log risk. The I bars denote the 95% confidence interval for the floated risks.

59
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023.  Figure 24: Associations of overnight sleep duration with all-cause mortality for groups with low and high sleep efficiency additionally adjusted for body mass index. The model used 1,642 events among 62,214 participants. We used age as the timescale and adjusted for sex, ethnicity, Townsend Deprivation Index of baseline address (split by quarter in the study population), educational qualifications, smoking status, alcohol consumption (Never, <3 times/week, 3+ times/week), overall activity (measured in milli-gravity units). Areas of squares represent the inverse of the variance of the log risk. The I bars denote the 95% confidence interval for the floated risks.

60
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint  Figure 25: Associations of overnight sleep duration (a) and sleep efficiency (b) with allcause mortality additionally adjusted for body mass index. The model used 1,642 events among 62,214 participants. We used age as the timescale and adjusted for sex, ethnicity, Townsend Deprivation Index of baseline address (split by quarter in the study population), educational qualifications, smoking status, alcohol consumption (Never, <3 times/week, 3+ times/week), overall activity (measured in milli-gravity units), and body mass index. Areas of squares represent the inverse of the variance of the log risk. The I bars denote the 95% confidence interval for the floated risks.

61
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint 8.3.2. Sensitivity analysis for overnight sleep duration 800 Figure 26: Associations of device-measured overnight sleep duration and all-cause mortality with greater granularity. The model used 1,642 events among 62,214 participants. We used age as the timescale and adjusted for sex, ethnicity, Townsend Deprivation Index of baseline address (split by quarter in the study population), educational qualifications, smoking status, alcohol consumption (Never, <3 times/week, 3+ times/week), and overall activity (measured in milli-gravity units). Areas of squares represent the inverse of the variance of the log risk. The I bars denote the 95% confidence interval for the floated risks.

62
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 8, 2023. ; https://doi.org/10.1101/2023.07.07.23292251 doi: medRxiv preprint