Where should new parkrun events be located? Modelling the potential impact of 200 new events on geographical and socioeconomic inequalities in access and participation.

Background parkrun, an international movement which organises free weekly 5km running events, has been widely praised for encouraging inactive individuals to participate in physical activity. Recently, parkrun received funding to establish 200 new events across England, speciﬁcally targeted at deprived communities. This study aims to investigate the relationships between geographic access, deprivation, and participation in parkrun, and to inform the planned expansion by proposing future event locations. We conducted an ecological spatial analysis, using data on 455 parkrun events, 2,842 public green spaces, and 32,844 English census areas. Poisson regression was applied to investigate the relationships between the distances to events, deprivation, and parkrun participation rates. Model estimates were incorporated into a location-allocation analysis, to identify locations for future events that maximise deprivation-weighted parkrun participation. Results The and the Index of Multiple Deprivation were negatively associated with local


Methods
We conducted an ecological spatial analysis, using data on 455 parkrun events, 2,842 public green spaces, and 32,844 English census areas. Poisson regression was applied to investigate the relationships between the distances to events, deprivation, and parkrun participation rates. Model estimates were incorporated into a location-allocation analysis, to identify locations for future events that maximise deprivation-weighted parkrun participation.

Results
The distance to the nearest event (in km) and the Index of Multiple Deprivation (score) were both independently negatively associated with local parkrun participation rates. Rate ratios were 0.921 (95%CI = 0.921-0.922) and 0.959 (0.959-0.959), respectively. The recommended 200 new event locations were estimated to increase weekly runs by 6.9% (from 82,824 to 88,506). Of the additional runs, 4.1% (n=231) were expected to come from the 10% most deprived communities.

Conclusion
Participation in parkrun is wide spread across England. We provide recommendations for new parkrun event location, in order to increase participation from deprived communities. However, the creation of new events alone is unlikely to be an effective strategy. Further research is needed to study how barriers to participation can be reduced.

Introduction
Insufficient physical activity is a leading cause of disease and disability worldwide. [1] For the UK, around one in six deaths is attributable to insufficient physical activity. [2] Increasing the physical activity levels of the population has the potential to improve quality of life, reduce mortality rates and alleviate the strain on health and social care services. However, interventions designed to increase population physical activity risk failing to reach those from the most deprived communities, potentially worsening health inequities. [3,4] parkrun, an international movement which organises free weekly 5km running events, has been widely praised as being successful in encouraging participation in individuals who were previously inactive. [5,6] Participants report that accessibility and inclusivity are among the main factors that contributed to their involvement. [7] Sport England recently announced funding to support the creation of 200 new parkrun events across England within three years, with the aim of increasing participation of individuals from lower socioeconomic groups. [8] This study aims to provide recommendations for greenspace sites on which to establish new parkrun events. We start with an analysis of the relationship between deprivation, distance, and parkrun participation. We then use the observed relationship to propose a simple algorithm for identifying future event locations, in order to improve participation from deprived communities.

Data and Measures
This study is an ecological analysis of the geographic and socio-economic disparities in participation in parkrun events in England. The observational period covered 49 weeks in 2018 (1st January to 10th December). All analyses were conducted on the level of Lower Layer Super Output Area (LSOA). LSOAs are census areas with a population of approximately 1,700, which divide England into 32,844 geographic units.
We collected and combined location (longitude and latitude) and attribute data of LSOAs with information on parkrun events and public green spaces. An overview of the relevant measures and the data sources is provided in Table 1. 3 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org /10.1101/19004143 doi: medRxiv preprint In this study, we included all 455 English parkrun events which were in operation during the observational period. Events held in prisons (n=7) were excluded. Data on 143,822 public green spaces were retrieved from an open dataset of Ordnance Survey. [9] In the absence of more detailed information (e.g. terrain), we considered all public parks, gardens, and playing fields in England with an area of 0.1 km 2 (e.g. 316m x 316m) or more potentially suitable for hosting parkrun events (n=2,842).

4
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint

Analysis
Mean, standard deviation, median, interquartile range, and range were used as descriptive statistics. The associations between IMD, distance to the nearest parkrun event, and parkrun participation on the LSOA level were investigated using generalised linear models with a Poisson distribution and a log link. The dependent variable was the total number of runs between 1st January and 10th December 2018. To model the outcome as a rate (runs per 1,000 population per week), the log of the person weeks was used as an offset variable.
We first fitted bivariate models to study the effect of IMD and distance on participation separately. The correlation between both predictors was assessed using Pearsons correlation coefficient. Subsequently, we used a multivariable model to investigate the independent effects of IMD and distance on participation. The Akaike (AIC) and the Bayesian Information Criterion (BIC), as well as the McFaddens pseudo R 2 were used to assess model fit.

Rationale and objective
We conducted a location-allocation analysis to solve the following problem: parkrun UK received funding to start 200 additional parkrun events. There are 2,842 public green spaces in England in which new events could be set up. Which 200 locations should be selected? parkruns stated aim of increasing participation from lower socio-economic groups [8] was operationalised in the following way: find the set of 200 green spaces, which maximise the number of additional runs at parkrun events, weighted by LSOAs squared IMD scores. By using the squared IMD scores as weights, the generation of new runs from more deprived areas is prioritised over new runs from less deprived areas. We also explored the following alternative specifications: 1. minimise deprivation-weighted distances; 2. maximise the total (i.e. unweighted) number of runs; 3. Minimise (unweighted) distances.

Location-allocation analysis
We developed a flexible algorithm to select the optimal new parkrun event locations from the set of candidate green spaces. The algorithm consists of two steps: firstly, for each green space, we evaluate how many new runs (weighted by squared LSOA IMD score) would be generated by setting up a 5 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint new parkrun event at the respective location. Secondly, the green space with the greatest effect is selected. This procedure is repeated 200 times.
More formally, we define that for any candidate green space location j, the objective function f (j|E) provides the sum of parkrun runs r i over all LSOA i, weighted by the squared IMD score w 2 i , given the set of established parkrun event locations E = {e 1 , e 2 , ..., e 455 }: In the absence of causal estimates, we use the Poisson regression model specified above to predict the expected number of runs r ij for LSOA i based on its IMD score w i , its (linear) distance to the nearest parkrun event d ij , and its population p i . The functional form is given below.
Filling-in the parameter coefficients (see table 3), we derive the following formula:r Note that j can have an effect on r ij through d ij : setting up a new event at location j will reduce the distance to the nearest event for some LSOA i. This means, we evaluate the distances from LSOA is location l i to all established parkrun event locations {e 1 , e 2 , ..., e 455 } ∈ E, denoted l i e 1 , l i e 2 , . . . , l i e 455 , and to the candidate green space location j, denoted l i j, and then take the minimum value, i.e. d ij = min(l i j, l i e 1 , l i e 2 , . . . , l i e 455 ).
The expected change in the objective function is computed for all candidate locations j in the set of the available green spaces C = {c 1 , c 2 , . . . , c 2842 }, and the location with the maximum value is selected. The selection function is expressed in the following formula: arg max . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint parkrun events E, and removed from the set of available green spaces C. Thereby, the kth new parkrun event location is taken into account when selecting the kth+1 location. The full algorithm can be described in pseudo code in the following way:

Interactive map, source code and data
We mapped all relevant data points, including LSOAs, green spaces, parkrun events, and recommended new event locations, and created an interactive map, which can be accessed online [16]. An annotated version of the R source code and all data that were used to generate the results of this study are provided on a repository under a creative commons license [17].

Ethical approval
Ethical approval was obtained from the Sheffield Hallam University Ethics Committee (ER10776545). We did not collect any personal information, but only used aggregate secondary data. The parkrun Research Board approved this research project, and three of its members (AMB; EG, SSJH) were actively involved in it.

Descriptive statistics
As of 10th December 2018, approximately 7%, 69%, and 90% of the English population lived within 1, 5, and 10km of a parkrun event. Only 192,208 people (0.35% of the English population) lived more than 25km from their nearest event. Over the study period of 49 weeks, an average of 82,824 participants attended parkrun events each week. This is equivalent to a national average weekly participation rate of 1.49 per 1,000 population (this corresponds to an unweighted average of 1.52 per LSOA). About 3% of these runs were done by runners living in the 10% most deprived LSOAs, while 18% came from the 10% least deprived LSOAs. Further descriptive statistics are provided in Table 2. 7 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint Table 3 shows the results of the bi-and multivariable Poisson regression models. We found that the distance to the nearest event and the IMD score were both negatively associated with a LSOAs parkrun participation rate. LSOAs with a further distance to the nearest parkrun event, and those with a high IMD score (i.e. more deprived) had lower parkrun participation rates. The reported adjusted rate ratios denote the expected relative change in the average number of runs from a one unit increase in the predictor variable. After controlling for IMD, an increase in the distance to the nearest parkrun event by 1km was associated with a 7.9% lower parkrun participation rate (adjusted rate ratio = 0.921). The adjusted rate ratio for IMD was 0.959, respectively. Due to the high number of observations (n=32,844), confidence intervals were very narrow, with a width of less than 0.01 for all coefficients. The collinearity between distance and IMD was low (Pearson correlation coefficient r = -0.14), and their effect sizes were similar across the bivariate and the multivariable Poisson regression models, which suggests that both predictors explained different parts of the variance. Moreover, models goodness-of-fit values suggested that IMD scores were markedly better than 8 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Relationship between deprivation, distance and participation in parkrun events
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint distance to the nearest event in predicting participation (McFadden Pseudo R 2 = 0.05 vs. 0.25). Figure 1 illustrates the results of the multivariable regression model. Each point shows the observed distance to the nearest event and the number of runs per 1,000 population per week for a single LSOA (n=32,844). We plotted the estimated relationship between distance and participation for three IMD levels: the 10th percentile (least deprived; IMD = 5.67), 50th percentile (median; 17.40) and 90th percentile (most deprived; 44.57).
As can be seen from the graphs, participation rates for the least deprived LSOAs are markedly higher, compared to the most deprived areas. At all levels of deprivation, participation falls as distance to nearest parkrun event increases. However, the most deprived LSOAs are much less responsive to changes, and their participation rates remain relatively low for any given distance. Figure 2 shows current English parkrun events (black circles) alongside recommendations for 200 additional event locations (red triangles), which maximise the total parkrun participation, weighted by the squared IMD score. The numbers correspond to the rank, where 1 is the location which would improve the weighted participation the most. The names and exact locations of the selected green spaces are provided in the appendix.

Optimal locations for new parkrun events
We estimated that the 200 new events would reduce the average distance to the nearest parkrun event by 1.15km (SD = 2.71, range = 0.00-47.55).

9
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint In addition to the results reported here, we also explored alternative objective specifications (1. maximising total participation, 2. minimising total distances, 3. minimising squared-IMD-weighted distances). For the location recommendations which maximise the (unweighted) total participation, we estimated that a 33% higher number of weekly runs could be achieved (n = 7,530). However, fewer runs originated from the 10% most deprived LSOAs (n=110), while 14 times more runs originated from the 10% least deprived LSOAs (n=1,493). Detailed results for alternative objective specifications are provided in the appendix. In addition, an interactive map, displaying the recommendations alongside existing parkrun events is provided online [16].

10
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint

11
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Main findings
Our study supports parkruns planned expansion by providing recommendations for the optimal 200 new parkrun event locations. Based on our multivariate regression model, we estimate that the new events would reduce the average distance to the nearest parkrun event by 1.15km, and generate an additional 295,444 runs per year, which would be an increase of 7%, compared to the average participation in 2018.
However, our policy recommendations are expected to have only a modest effect on the participation from the most deprived LSOAs. We found that participation rates from these communities are low, even when geographical access is very good. Improving geographic access, by creating new events, is therefore unlikely to substantially increase the number of participants from lower socio-economic groups. In fact, setting up new parkrun events is likely to worsen inequalities in participation. Therefore, complementary measures should be considered.
Notwithstanding the above, it should be noted that in 2018, participation in parkrun events was widespread across England: on average, 82,824 people, that is 0.15% of the population participated in parkrun events each week, with participants coming from 95% of all LSOAs. Geographic access can be assumed to be an enabling factor for their success: about 69% of the English population live within 5km of a parkrun event. Interestingly, we observed a weak negative correlation between a LSOAs IMD and the distance to the nearest event (r = -0.14), suggesting that parkrun events tend to be located closer to more deprived communities.
Optimal locations for new parkrun events are contingent on how the objective function is specified. The recommendations reported here aim to maximise the total number of parkrun runs, weighted by the squared IMD score. This reflects a strong preference towards increased participation from deprived communities -a run from a LSOA in the 90th IMD percentile (most deprived) had 62 times the weight of a run from the 10th percentile (least deprived). However, alternative specifications may also be legitimate: without IMDweights, for example, it would be possible to generate a markedly higher number of total runs. Moreover, one could argue that parkrun should not seek to maximise actual participation, but rather focus on giving people the opportunity to participate [18], by minimising distance to nearest event corresponding location recommendations are provided in the appendix. The decision where new events should be located involve normative judgements, 12 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint which will depend on the values and preferences of the decision maker. This study is not meant to provide definite location recommendations, but to showcase a potentially useful decision aid. To enable parkrun and other stakeholders to adjust the objective function and repeat the analysis, we provide the source code and data that were used for this study.

Limitations
In our analysis, we had to make several simplifying assumptions that should be considered when interpreting our findings. Firstly, the use of linear (geodesic) distances is likely to underestimate true travel distance. Secondly, included candidate park locations that may not necessarily be suitable for parkrun (e.g. due to a lack of paths). Likewise, it is possible that some suitable sites, for example green spaces not categorised as public parks, were not included in the study. [17] Furthermore, we conducted an ecological analysis on the level of the LSOA. Since there might be substantial variation within communities, further studies are required to confirm whether the estimated relationships hold at the individual level. [19] Finally, it is important to note that, although we used the multivariable regression analysis (see Table 3) to predict the effect of setting up new parkrun events on LSOA participation rates, the models were not specified to investigate causal relationships [20,21]. There might be other important factors that explain participation [7] -the generally increasing popularity of parkrun across England, for example -that were not considered. However, building a causal model to reliably predict future parkrun participation would require more detailed and longitudinal data, as well as more extensive modelling.

Implications for policy
This is the first study to investigate the socio-economic and geographic disparities in participation in parkrun events in England. We aimed to support parkruns planned expansion by identifying green space sites for new events which maximise participation, while incorporating concerns about socio-economic equity. However, our findings call into question whether the creation of new events is the most effective strategy to increase participation from lower socio-economic groups. parkrun might want to consider how they could complement the current efforts to improve geographic access to parkrun events, in order to encourage people from deprived communities to participate.

13
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/19004143 doi: medRxiv preprint