The impact of a gamified intervention on physical activity in real-life conditions: a retrospective analysis of 4800 individuals

Background. Digital interventions integrating gamification features hold promise to promote physical activity (PA). However, results regarding the effectiveness of this type of intervention are heterogeneous. Objectives. This study aimed to examine the effectiveness of a gamified intervention and its potential moderators in a large sample using real-world data. Specifically, we tested (1) whether a gamified intervention enhanced daily steps during the intervention and follow-up periods compared to baseline, (2) whether this enhancement was higher in participants to the intervention than in nonparticipants, and (3) what participants’ characteristics or intervention parameters moderated the effect of the program. Methods. Data from 4819 individuals who registered for a mHealth Kiplin program between January 1st, 2019, and January 2nd, 2022 were retrospectively analyzed. In this intervention, participants could take part in one or several games where their daily step count was tracked, allowing individuals to play with their overall activity. Nonparticipants are people who registered to the program but did not take part in the intervention and were considered as a control group. PA was assessed via the daily step count of participants. Exposure to the intervention, the intervention content, and participants' characteristics were included in multilevel models to test the study objectives. Results. Compared with nonparticipants, participants who benefited from the intervention had a significantly greater increase in mean daily steps from baseline during the same period (p <.0001). However, intervention effectiveness depended on participants' initial PA. Whereas the daily steps of participants with <7500 baseline daily steps significantly improved from baseline both during the Kiplin intervention (+3291 daily steps) and during follow-up periods (+945 daily steps), participants with a higher baseline had no improvement or significant decreases in daily steps after the intervention. Age (p <.0001) and exposure (p <.0001) positively moderated the intervention effect.


Introduction
Physically inactive individuals are at higher risk of developing non-communicable diseases -such as cardiovascular diseases, cancers, type 2 diabetes mellitus, or obesity -and mental health issues compared to the most active ones. 1 Yet, one-third of the world's population is insufficiently active 2,3 and the trend is downward, with adults performing on average 1000 fewer steps than 2 decades ago 4 .Additionally, it has recently been reported that the global population step count did not return to pre-pandemic levels in the 2 years since the onset of the COVID-19 outbreak. 5In this context, there is an urgent need to increase physical activity (PA) of individuals in primary, secondary, and tertiary prevention.
Digital behavior change interventions and more particularly gamified services are promising avenues to promote PA.Gamification refers to the use of game elements in nongame contexts 6 and allows to transform a routine activity into a more engaging one.A recent meta-analysis 7 revealed that digital gamified interventions, lasting on average 12 weeks, improved PA by 1600 daily steps on average.Importantly, the results showed that a) gamified interventions appear more effective than digital non-gamified interventions, b) seem appropriate for any type of user regardless of their age or health status, and c) the PA improvement persists in the long term.As a result, gamified interventions are emerging as high-potential behavior change tools to tackle the physical inactivity pandemic.
However, the effect sizes reported in this meta-analysis were heterogeneous, ranging from 0.00 to 2.41, and the authors found high between-study heterogeneity (e.g., I 2 = 82%).If this heterogeneity may be explained by differences in study quality or diversity of designs in the included studies, the behavior change intervention ontology proposed by Michie et al. 8 argues that heterogeneity in behavioral interventions could also be explained by different variables such as intervention characteristics (e.g., content, delivery), the context (e.g., characteristics of the population targeted such as demographics, setting such as the policy environment or physical location), exposure of participants with the program (e.g., engagement and reach), and the mechanisms of action (the processes by which interventions influence the target behavior).Considering these variables within gamification contexts could provide a useful means to better understand the conditions under which interventions are successful.
The present study investigated this question based on a retrospective analysis of real-world data collected from a large sample of participants who were proposed a mHealth gamified intervention.In this one, participants could take part in one or several games where their daily step count was tracked, allowing individuals to play with their overall activity.In addition to offering the possibility of direct intervention on people's activity habits in natural context, the capacity of this app to collect, in real-time, a large amount of objective real-world data can be useful to understand the processes and outcomes of behavioral health interventions. 9ore specifically, these data can help make explicit when, where, for whom, and in what state for the participant, the intervention will produce the expected effect, notably thanks to continuous data collection over time.The withinperson evolution in daily steps obtained via the app combined with between-person individual factors and intervention parameters is of great interest in this perspective.By analyzing these data, we can therefore a) better identify for which individuals the intervention is the most effective considering their age, health status, baseline PA, or the context in which they had the intervention, b) examine the relationship between exposure and intervention effectiveness, and c) better understand which features of the app were the most effective.
Thus, the objectives of this study were to analyze the data collected in order (1) to examine within-individual evolutions of PA before, during, and after the intervention, (2) to test the effectiveness of a gamified program in reallife conditions on PA of participants versus nonparticipants, and (3) to explore the variables that could explain heterogeneity in response to the intervention.Based on previous results, we first hypothesized that PA will increase both during and after the gamified program, in comparison to initial PA (H1).Second, we hypothesized that this improvement will be greater for participants than for non-participants (i.e., participants who registered on the app but did not complete any game, H2).Finally, we expected that intervention's characteristics (i.e., type and number of games), the context within the intervention was performed (i.e., population and settings), and the exposure to the intervention (i.e., engagement of participants with the app) will moderate the intervention effect (H3).

Study design and participants
This study retrospectively analyzed data from adult participants who had registered for a Kiplin program including PA games and had given consent for their data to be collected.To be included, participants must be 18 years old or older, have registered on the app between January 1 st , 2019, and January 2 nd , 2022, and logged daily steps (measured via their smartphone or an activity monitor) on a time frame of at least 90 days with less than 20% of missing daily observations. 10Non-wear days were defined as days with fewer than 1000 steps and considered as missing observations -as previous research suggested that daily step values less than 1000 may not represent full data capture. 11,12Days before the first day of the first game were considered as 'baseline' (Mdn = 14 days ± 42.9), the period between the first day of the first game and the last day of the last game as 'intervention period' (Mdn = 19 days ± 31.2), and the days after the last day of the last game as 'follow-up' (Mdn = 90 days ± 22.8).We restricted the follow-up periods to 90 days post-intervention (i.e., 3 months).
Participants could receive the Kiplin intervention a) in the context of their work (i.e., primary prevention with employees), b) in a senior program (i.e., primary prevention with volunteer retirees), or c) as part of their chronic disease care (i.e., patients mainly treated for obesity or cancer).In all the aforementioned conditions, the program was paid not by the participant but by their employer or health care center.At the beginning of the intervention, participants had to download the Kiplin app.They were given an access code by their employer or health care center, and could then create their account.Upon registration, participants agreed that their anonymized data may be stored on certified health data servers.Participants then benefited from one or several PA games (depending on the program) lasting approximately 14 days each.If several games were proposed, these games followed each other in an interval of fewer than 60 days.In programs with multiple games, there was always a break of some days between games to provide regular doses of gamification.Details on the games' content have been reported previously. 13ome participants registered for the program, created their account, but did not take part in the intervention (i.e., did not completed any game).These individuals were considered "nonparticipants" and were used as a control group (as proposed in previous research 14 ).Similarly, the baseline period of these nonparticipants corresponds to the days prior to the date they were supposed to start the intervention period.

Variables
The variables of interest were selected on the basis of the behavior change intervention ontology of Michie et al. 8 and included (1) the longitudinal evolution of daily steps, (2) the exposure of each participant to the intervention, (3) the intervention parameters, and (4) the context (participants' characteristics and settings), as these variables are likely to influence the intervention effect.Table 1 specifies the measures of interest and their operationalization.

Statistical analyses
Mixed-effects models were used to 1) analyze within-person evolution across time (i.e., changes in daily steps between baseline, intervention, and follow-up periods), and across participants and nonparticipants, and 2) examine the associations between intervention parameters, exposure to the intervention, participants' characteristics and settings, and the daily steps evolution.This statistical approach controls for the nested structure of the data (i.e., multiple observations nested within participants), does not require an equal number of observations from all participants, 15 and separates between-person from within-person variance, providing unbiased estimates of the parameters. 16,17irst, an unconditional model (i.e., with no predictor) was estimated for each variable to calculate intra-class correlations (ICC) and estimate the amount of variance at the between and within-individual levels, which allowed us to determine whether conducting multilevel models was relevant or not.Then, a model that allowed random slope over time (i.e., model with random intercept and random slope) was compared to the null model (i.e., with only random intercept) using an ANOVA, to evaluate whether the less parsimonious model explain a significantly higher portion of the variance of the outcome, compared to the unconditional model. 18,19Third, between-level predictors and confounding variables were added to another model (Model 1)* 1 and compared to the previous models.Finally, intervention characteristics as well as their interactions with the phases (i.e., baseline/intervention/follow-up) of the study were added in a final model excluding nonparticipants (Model coefficients, θ0j and θ1j are the random effect for the participant j (one random intercept and one random slope), γ0i is the random effect for the Time i (random intercept), and εij is the error term.
2)* 2 .Model fit was assessed via the Bayesian Information Criterion (BIC) and −2-log-likehood (−2LL). 20All models were performed using the lmerTest package in the R software. 21An estimate of the effect size was reported using the marginal and conditional pseudo R 2 .Models' reliability was estimated with residual analyses, performed using the Performance package. 22When the interaction terms turned significant, contrasts analyses were computed using the emmeans package. 23he data and code for the statistical analyses used in the present study are available on Open Science Framework (https://osf.io/scnu7/).

Compliance ratio
The engagement of participants with the app was computed as the compliance ratio representing the number of days with a login during the game period divided by the duration of the game periods.This variable allows measuring the frequency of the engagement with the service. 25

Number of games played
The total number of games played during the intervention period.

Context (population and setting) Self-reported age and gender
Filled out by participants when they registered on the app.

Population
Employees, seniors, or patients (treated for obesity, or cancer).

Cofounders Season
The season (winter, spring, summer, autumn) when the data step was logged, was controlled as the season can influence PA. 26

Type of device
The type of device used to assess daily step count (i.e., Android or iOS smartphones, Garmin, Withings, Polar, Fitbit, or Tomtom wearables) was controlled as smartphone apps and wearable devices differ in accuracy and precision. 27

Lockdown
The study period was characterized by the COVID-19 pandemic and 3 lockdowns were set up in France to limit the spread of the outbreak.As these periods had a strong influence on PA of individuals, 28,29 we controlled the lockdown periods in our analyses.

Descriptive Results
Descriptive results are presented in β0 to β12 are the fixed effect coefficients, θ0j and θ1j are the random effect for the participant j (one random intercept and one random slope), γ0i is the random effect for the Time i (random intercept), and εij is the error term.
Is the gamified program effective to promote PA? (H1) During the intervention period, participants increased their daily steps by 4177 steps per day on average, compared to the baseline period, and by 478 steps per day on average during the follow-up period, compared to the baseline.In comparison, the daily step count of the control group remained more or less stable throughout the same timeframe with a mean increase of 84 daily steps compared to baseline.
Overall, contrast analyses of the model for the intervention participants (Model 2, Table 3) revealed a negative effect of the intervention on the daily step count during the intervention phase compared to baseline activity (b = -0.09,95 CI [-0.14; -0.05], p <.0001) and no significant effect (b = 0.01, 95 CI [-0.05; 0.06], p = 0.79) during follow-up periods compared to baseline.However, the patterns were different when participants were stratified by baseline PA.Participants with lower baseline daily steps (<5000 steps per day or 5001-7500 steps per day) showed a significant increase of their daily steps during the intervention and the follow-

Is the intervention effect greater for participants compared to nonparticipants? (H2)
In Model 1 (Table 3), participants who received the intervention from Kiplin had a significantly greater increase in mean daily steps between baseline and the intervention period, compared with nonparticipants (b = 0.54, 95%CI [0.52; 0.58], p <.0001).

What are the moderators of the intervention effect? (H3)
The Model 2 estimates are displayed in Table 2.The variables under consideration explained 39% of the variance in daily steps.In this model, we tested the hypothesized interactions, to investigate predictors associated with the efficiency of the intervention (Table 4).Contrast analyses were conducted on significant interactions and revealed that the  age (b = 0.05, p <.0001) and the compliance ratio (b = 0.37, p <.0001) were positively associated with the change in daily steps between baseline and intervention.Specifically, the older the age, the more regularly the individuals played and the more effective the intervention was.On the other hand, the number of games played by participants was negatively associated with this change (b = -0.02,p = 0.02).In other words, the longer the intervention and the higher the number of games, the less effective the intervention.For categorical outcomes, contrast analyses revealed differences in the intervention effect between the different populations.Compared to employees, cancer patients (b = -0.18,95 CI [-0.24; -0.12], p <.0001), and seniors (b = -0.19,95 CI [-0.25; -0.13], p <.0001) observed a significantly weaker effect of the intervention in comparison to baseline PA.
There was no significant difference between employees and obese patients (b = -0.07,95 CI [-0.16; 0.02], p = 0.13).In sum, programs conducted with office workers or patients treated for obesity had better effects than programs on other populations (Figure 2).All the results of these analyses are available in supplementary materials.

Discussion
This observational study retrospectively analyzed the realworld data of 4800 participants who registered on the Kiplin app.We found that participants benefiting from the Kiplin games significantly increased PA compared with nonparticipants during the same period.We also found that the intervention effect depended on the baseline PA of individuals.Participants with lower baseline steps (<5000 steps per day or 5001-7500 steps per day) significantly improved their PA both during the intervention and follow-up periods whereas participants with more than 7500 steps had no change or significant decreases.These results suggest that a gamified program is more efficient among inactive individuals than active ones, with the existence of a plateau effect.They also confirm recent findings 7 and the ability of gamified interventions to improve PA both during and after the end of the program -at least for the more inactive individuals.This effectiveness is particularly interesting considering that current behavioral interventions struggle to change PA in the long haul. 30nce again, our results stressed that older age may not be incompatible with gamified interventions.Indeed, we found that intervention effectiveness was moderated by the age of the individual and that gamification was more efficient among older individuals, compared to younger ones.
Whether in a global way the literature on gamification shows cautiously positive results in the use of gamification for older people, 31 the present results are in line with a previous study 32 which found that older users had a greater degree of use of the gamification features.The authors proposed the explanation that older adults pay generally more attention to their health and thus have a stronger intention to engage in a health program.From another perspective and in light of the Kiplin games' characteristics, these results could also be explained by the fact that these games are accessible -inspired by traditional board game rules and mechanics widely known in the general population (e.g., Cluedo, snakes and ladders) -and thus may be more attractive for older populations.Indeed, the most engaging game mechanics may diverge between youth and other populations, 33 and we can expect that younger populations may prefer more complex game mechanics and need more novelty during the intervention to stay interested by the service.
Regarding the effects of our gamified intervention according to the characteristics of the population, we found a stronger effect on office workers and obese patients.If these results do not allow to draw conclusions that would be too hasty considering the high variability observed in patients or senior participants, these findings highlight that, beyond the attributes and health status of the participants, the setting of the intervention can be important.For example, interventions proposed with employees were conducted within their company.Participants thus know each other which can enhance the motivational impact of some gamification features (such as social comparison with leaderboards and social connectedness with teams) whereas interventions in healthcare settings usually involve patients who do not know each other.
Our findings also revealed several insights that could help to improve future intervention design.First, exposure to the content is essential for the gamified intervention to be effective.It is interesting as gamification has often been assimilated into a self-fulfilling process permitting automatic engagement of participants.These results are consistent with previous findings demonstrating that higher use of gamification features was associated with greater intervention effectiveness. 32,34If gamification can ultimately increase program engagement, developers need first to design their apps to be as attractive as possible and optimize retention.
Second, the total number of games played was negatively associated with the intervention effect, suggesting that a shorter intervention could be more beneficial for behavior change.These results are in line with previous research 7,35 suggesting that users benefit more from digital interventions shorter than 3 months.It also suggests a «doseresponse» relationship in inverted U shape, with an optimal "middle" to find.Nevertheless, it is necessary to take into consideration the fact that Kiplin programs of more than one game are built in such a way as to decimate several doses at regular intervals.Periods without games were therefore considered in the intervention phases and could explain why, overall, the shorter games were more efficient.More refined analyses of the intervention effect over time will be necessary in the future.
Third, the daily step count of participants was significantly higher in the adventure and the challenge.These two games share the characteristic of being more competitive with a stronger emphasis on leaderboards than the two other games more focused on collaboration.In that idea, Patel et al. 36 observed that the competitive version of their gamified intervention outperformed the collaborative and supportive arms.Moreover, various studies demonstrated that leaderboards are a particularly successful gamification mechanic. 32,37

Strengths and limitations
This study has several strengths, including the large number of participants included, the intensive objective PA measurement in real-life conditions, and the longer baseline and follow-up duration compared with most trials on gamification that typically incorporate measurement bursts dispersed across time. 7However, several limitations should be considered.First, this study was observational and not a randomized controlled trial.Thus, we cannot establish the causality of the intervention's effect on outcome improvement.The non-participants are not a true control group.If they did not receive the intervention, it may be for underlying motivational reasons that could impact their PA.Second, intervention lengths differed between participants.Third, if multilevel models are useful for describing trends in PA behavior change over time, they are limited in their capacity to assess precise fluctuations patterns of non-stationarity behavior such as PA 38 across time.Slightly more complex options are available to precisely describe time changes and patterns (e.g., time series analyses) and could be used in future longitudinal studies.Finally, the compliance ratio used in this study as a proxy for engagement tends to oversimplify the exposure of participants to the service.Complementary measures of engagement (e.g., using the number of logins, time spent per login, and the number of components accessed) will need to be conducted to draw the longitudinal impact of the engagement of the participants on the intervention effect.

Conclusion
In this study in which we retrospectively analyzed the daily step count of 4800 individuals in real-life conditions, participants who benefited from the Kiplin gamified intervention had a significantly greater increase in mean daily steps from baseline than nonparticipants.Responses to the intervention were significantly different as a function of individuals' initial PA.Whereas participants with less than 7500 baseline daily steps had significant improvements both during the intervention and follow-up periods, the intervention had no effect on participants with initial values >7500.The age of participants and the engagement with the app were positively and significantly associated with the intervention effect while the number of games played was negatively associated with it.The results of this study suggest that gamification is effective to promote PA of inactive populations at short and medium-term effects.To our knowledge, this study is the first to examine the longitudinal effect of a gamified program outside the context of a trial, with real-world data.The results of this study are therefore highly generalizable and confirm the interest of gamification in both primary and tertiary prevention.

Figure 1 .
Figure 1.Changes in daily steps throughout the study phases for participants who received a Kiplin program, stratified by baseline daily steps

Figure 2 .
Figure 2. Changes in daily steps throughout the study phases for the different populations who received a Kiplin program.

Table 1 .
24erationalization of the variables Daily step count PA was assessed via the daily step count, measured with the smartphone or activity monitor of the participant.The daily step count is a trusted proxy for PA.24

Intervention (content and delivery) and mechanisms of action
13pe of gameParticipants could play 4 types of games (i.e., challenge, adventure, boardgame, mission).The challenge is a competitive game where participants had to walk more than other teams to win.The three other games have been introduced elsewhere with details on embedded behavior change techniques.13

Table 4 .
Interactions tested between the intervention phase, participants' characteristics, and intervention parameters in Model 2.