ABSTRACT
Background In elite football (soccer), periodic health examination (PHE) could provide prognostic factors to predict injury risk.
Objective To develop and internally validate a prognostic model to predict individual indirect (non-contact) muscle injury (IMI) risk during a season in elite footballers, only using PHE-derived candidate prognostic factors.
Methods Routinely collected preseason PHE and injury data were used from 119 players over 5 seasons (1st July 2013 to 19th May 2018). Ten candidate prognostic factors (12 parameters) were included in model development. Multiple imputation was used to handle missing values. The outcome was any time-loss, index indirect muscle injury (I-IMI) affecting the lower extremity. A full logistic regression model was fitted, and a parsimonious model developed using backward-selection to remove non-significant factors. Predictive performance was assessed through calibration, discrimination and decision-curve analysis, averaged across all imputed datasets. The model was internally validated using bootstrapping and adjusted for overfitting.
Results During 317 participant-seasons, 138 I-IMIs were recorded. The parsimonious model included only age and frequency of previous IMIs; apparent calibration was perfect but discrimination was modest (C-index = 0.641, 95% confidence interval (CI): 0.580 to 0.703), with clinical utility evident between risk thresholds of 37-71%. After validation and overfitting adjustment, performance deteriorated (C-index = 0.580; calibration-in-the-large =-0.031, calibration slope =0.663).
Conclusion The selected PHE data were insufficient prognostic factors from which to develop a useful model for predicting IMI risk in elite footballers. Further research should prioritise identifying novel prognostic factors to improve future risk prediction models in this field.
Trial registration number NCT03782389
KEY POINTS
Factors measured through preseason screening generally have weak prognostic strength for future indirect muscle injuries and further research is needed to identify novel, robust prognostic factors.
Because of sample size restrictions, and until the evidence base improves, it is likely that any further attempts at creating a prognostic model at individual club level would also suffer from poor performance.
The value of using preseason screening data to make injury predictions or to select bespoke injury prevention strategies remains to be demonstrated, so screening should only be considered as useful for detection of salient pathology or for rehabilitation/ performance monitoring purposes at this time.
1. BACKGROUND
In elite football (soccer), indirect (non-contact) muscle injuries (IMIs) predominantly affect the lower extremities and account for 30.3% to 47.9% of all injuries that result in time lost to training or competition [1-5]. Reduced player availability negatively impacts upon medical [6] and financial resources [7, 8], and has implications for team performance [9]. Therefore, injury prevention strategies are important to professional teams [9].
Periodic health examination (PHE), or screening, is a key component of injury prevention practice in elite sport [10]. PHE is used by 94% of elite football teams and consists of medical, musculoskeletal, functional and performance tests, performed during preseason and in-season periods [11]. PHE has a rehabilitation and performance monitoring function [12] and is also used to detect musculoskeletal or medical conditions that may be dangerous or performance limiting [13]. Another perceived role of PHE is to recognise and manage factors that may increase, or predict, an athlete’s future injury risk [10], although this function is currently unsubstantiated [13].
PHE-derived variables associated with particular injury outcomes (such as IMIs) are called prognostic factors [14], which can be used to identify risk differences between players within a team [12]. Single prognostic factors are unlikely to satisfactorily predict an individual’s injury risk if used independently [15]. However, several factors could be combined in a multivariable prognostic prediction model to offer more accurate personalised risk estimates for the occurrence of a future event or injury [15, 16]. Such models could be used to identify high-risk individuals who may require an intervention that is designed to reduce risk [17], thus assisting clinical decision making [18]. Despite the potential benefits of prognostic models for injury, we are unaware of any that have been developed using PHE data in elite football [19].
Therefore, the aim of this study was to develop and internally validate a prognostic model to predict individualised IMI risk during a season in elite footballers, using PHE-derived candidate prognostic factors.
2. METHODS
The methods have been described in a published protocol [20] so will only be briefly outlined. This study has been registered on ClinicalTrials.gov (Identifier: NCT03782389) and is reported according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [21, 22].
2.1 Data Sources
This study was a retrospective cohort design. Eligible participants were identified from a population of male elite footballers, aged 16-40 years old at an English Premier League club. A dataset was created using routinely collected injury and preseason PHE data over 5 seasons (1st July 2013 to 19th May 2018). For each season (starting on 1st July), participants completed a mandatory PHE during week 1 and were followed up to the final first team game of the season. If eligible participants were injured at the time of PHE, a risk assessment was completed by medical staff. Only tests that were appropriate and safe for the participant’s condition were completed; examiners were not blinded to injury status.
2.2 Participants and eligibility criteria
During any season, participants were eligible if they: 1) were not a goalkeeper; and 2) participated in PHE for the relevant season. Participants were excluded if they were not contracted to the club for the forthcoming season at the time of PHE.
2.3 Ethics and Data Use
Informed consent was not required as data were captured from the mandatory PHE completed through the participants’ employment. The data usage was approved by the club and University of Manchester Research Ethics Service.
2.4 Outcome
The outcome was any time-loss, index lower extremity IMI (I-IMI). That is, any I-IMI sustained by a participant during matches or training, which affected lower abdominal, hip, thigh, calf or foot muscle groups and prohibited future football participation [23]. I-IMIs were graded by a club doctor or physiotherapist (not blinded to PHE data) according to the validated Munich Consensus Statement for the Classification of Muscle Injuries in Sport [24, 25], during routine assessments undertaken within 24h of injury.
2.5 Sample size
We allowed a maximum of one candidate prognostic factor parameter per 10 I-IMIs, which (at the time of protocol development) was the main recommendation to minimise overfitting (Online Resource 1) [20]. The whole dataset was used for model development and internal validation, which agrees with methodological recommendations [26].
2.6 Candidate Prognostic Factors
The dataset contained 60 candidate factors [20]. Because of the sample size considerations, before any analysis, the set of candidates was reduced based upon data quality and reliability assessment, previous evidence of prognostic value [19] and clinical reasoning [20]. This process left a final set of 10 candidate factors, represented by 12 model parameters (Table 1).
3. Statistical analysis
3.1 Data handling – outcome measures
Each participant-season was treated as independent. Participants who sustained an an I-IMI were no longer considered at risk for that season and were included for further analysis at the start of the next season if still eligible. Any upper limb IMI, trunk IMI or non-IMI injuries were ignored and participants were still considered at risk.
Eligible participants who were loaned to another club throughout that season, but had not sustained an I-IMI prior to the loan were still considered at risk. I-IMIs that occurred whilst on loan were included for analysis, as above. Permanently transferred participants (who had not sustained an I-IMI prior to leaving), were recorded as not having an I-IMI during the relevant season and exited the cohort at the season end.
3.2 Data handling –missing data
Missing values were assumed to be missing at random [20]. The continuous parameters generally demonstrated non-normal distributions, so were transformed using normal scores [34] to approximate normality before imputation, and back-transformed following imputation [35]. Multivariate normal multiple imputation was performed, using a model that included all candidates and I-IMI outcomes. Fifty imputed datasets were created in Stata 15.1 (StataCorp LLC, Texas, USA) and analysed using the ‘mim’ module.
3.3 Prognostic Model Development
Continuous parameters were retained on their original scales and their effects assumed linear [22]. A full multivariable logistic regression model was constructed, which contained all 12 parameters. Parameter estimates and results were combined across imputed datasets using Rubin’s Rules [36]. To develop a parsimonious model that would be easier to utilise in practice, backward variable selection was performed simultaneously across all imputed datasets to successively remove non-significant factors with p-values > 0.157 (approximate equivalence with Akaike’s Information Criterion) [37, 38]. Multiple parameters representing the same candidate factor were tested together so that the whole factor was either retained or removed. Candidate interactions were not examined and no terms were forced into the model. All analyses were conducted in Stata 15.1.
3.4 Assessment of model performance
The parsimonious model was used to predict I-IMI risk over a season, for every participant-season in all imputed datasets. For each performance measure, the model’s apparent performance was assessed in each imputed dataset and then averaged across all imputed datasets using Rubin’s Rules [36]. Discrimination determines a model’s ability to differentiate between participants who have experienced an outcome compared to those who have not [39], quantified using the concordance index (c-index). This is equivalent to the area under the receiver operating characteristic (ROC) curve for logistic regression, where 1 demonstrates perfect discrimination, while 0.5 indicates that discrimination is no better than chance [40].
Calibration determines the agreement between the model’s predicted outcome risks and those observed [41], evaluated using a calibration plot in each imputed dataset. All predicted risks were divided into ten groups defined by tenths of predicted risk. The mean predicted risks for the groups were plotted against the observed group outcome proportions with corresponding 95% confidence intervals (CIs). A loess smoothing algorithm showed calibration across the range of predicted values [42]. For grouped and smoothed data points, perfect predictions lie on the 45° line (i.e. a slope of 1).
The systematic (mean) error in model predictions was quantified using calibration-in-the-large (CITL), which has an ideal value of 0 [39, 41], and the expected/observed (E/O) statistic, which is the ratio of the mean predicted risk against the mean observed risk (ideal value of 1) [39, 41]. The degree of over or underfitting was determined using the calibration slope, where a value of 1 equals perfect calibration on average across the entire range of predicted risks [22]. Nagelkerke’s pseudo-R2 was also calculated, which quantifies the overall model fit, with a range of 0 (no variation explained) to 1 (all variation explained) [43].
3.5 Assessment of clinical utility
Decision-curve analysis was used to assess the model’s clinical usefulness in terms of net benefit (NB) if used to allocate possible preventative interventions. This assumed that the model’s predicted risks were classed as positive (i.e. may require a preventative intervention) if greater than a chosen risk threshold, and negative otherwise. NB is then the difference between the proportion of true positives and false positives (weighted by the odds of the chosen risk threshold), divided by the sample size [44]. Positive NB values suggest the model is beneficial compared to treating none (which has no benefit to the team but with no negative cost and efficiency implications). The maximum possible NB value is the proportion with the outcome in the dataset.
The model’s NB was also compared to the NB of delivering an intervention to all individuals (a treat-all strategy, offering maximum benefit to the team, but with maximum negative cost and efficiency implications) [17]. A model has potential clinical value if it demonstrates higher NB than the default strategies over the range of risk thresholds which could be considered as high risk in practice [45].
3.6 Internal validation and adjustment for overfitting
To examine overfitting, the model was internally validated using 200 bootstrap samples, drawn from the original dataset. In each sample, the complete model-building procedure (including multiple imputation, backward variable selection and performance assessment) was conducted as described earlier. The difference in apparent performance (of a bootstrap model in its bootstrap sample) and test performance (of the bootstrap model in the original dataset) was averaged across all samples. This generated optimism estimates for the calibration slope, CITL and C-index statistics. These were subtracted from the original apparent calibration slope, CITL and C-index statistics to obtain final optimism-adjusted performance estimates.
To produce a final model adjusted for overfitting, the regression coefficients produced in the parsimonious model were multiplied by the optimism-adjusted calibration slope (uniform shrinkage factor) to adjust (shrink) for overfitting [46]. Finally, the CITL (model intercept) was then re-estimated to give the final model, suitable for evaluation in other populations or datasets.
3.7 Complete case and sensitivity analyses
To determine the effect of multiple imputation and player transfer assumptions on model stability, the model development process was repeated: 1) as a complete case analysis; and 2) as sensitivity analyses excluding participant-seasons for participants who were loaned or transferred (performed as both multiple imputation and complete case analyses).
4. RESULTS
4.1 Participants
During the five seasons, 119 participants were included, contributing 317 participant-seasons and 138 IMIs in the primary analyses (Fig. 1). For the sensitivity analyses (excluding loans and transfers), 265 independent participant-seasons with 120 IMIs were included; 47 participants were transferred during a season, which excluded 52 participant-seasons (Fig. 1).
Table 2 shows anthropometric and all prognostic factor characteristics for participants included in the primary analyses. These were similar to those included in the sensitivity analyses (Online Resource 2).
4.2 Missing data and multiple imputation
All I-IMI, age and previous muscle injury data were complete (Table 2). For all other candidates, missing data ranged from 6.31% (for hip internal and external rotation difference) to 13.25% for countermovement jump (CMJ) power (Table 2). The distribution of imputed values approximated observed values (Online Resource 3), confirming their plausibility.
4.3 Model development
Table 3 shows the parameter estimates for the full model and parsimonious model after variable selection (averaged across imputations).
For both models, only age and frequency of previous IMIs had a statistically significant (but modest) association with increased I-IMI risk (p <0.157). No clear evidence for an association was observed for any other candidate factor.
4.4 Model performance assessment and clinical utility
Table 3 shows the apparent performance measures for the full and parsimonious models. Fig. 2 shows a calibration plot for the parsimonious model. These were identical across all imputed datasets because the retained prognostic factors contained no missing values. The parsimonious model had perfect apparent overall CITL and calibration slope by definition, but calibration was more variable around the 45° degree line between the expected risk ranges of 28% to 54%. Discrimination was similarly modest for the full (C-index= 0.670, 95% CI=0.609 to 0.731) and parsimonious models (C-index = 0.641, 95%CI = 0.580-0.703). The overall model fit after variable selection was low (Nagelkerke R2 = 0.089).
Fig. 3 displays the decision-curve analysis. The NB of the parsimonious model was comparable to the treat-all strategy at risk thresholds up to 31%, marginally greater between 32% and 36% and exceeded the NB of either default strategies between 37% and 71%.
4.5 Complete Case and Sensitivity Analyses
The full and parsimonious models were robust to complete case analyses and excluding loans and transfers, with comparable performance estimates (c-index range= 0.632-0.678; Nagelkerke R2 range= 0.102 to 0.130) (Online Resources 4-7). The same prognostic factors were selected in all parsimonious models.
4.6 Internal validation and adjustment for overfitting
Table 3 shows the optimism-adjusted performance statistics for the parsimonious model, with full internal validation results shown in Online Resource 8. After adjustment for optimism, the model’s discrimination performance deteriorated (c-index = 0.580). Furthermore, bootstrapping suggested the model would be severely overfitted in new data (calibration slope = 0.663), so a shrinkage factor of 0.663 was applied to the parsimonious parameter estimates and the model intercept re-estimated to produce our final model (Table 3).
5. DISCUSSION
We have developed and internally validated a multivariable prognostic model to predict individualised I-IMI risk during a season in elite footballers, only using retrospectively collected preseason PHE and injury data. This is the only study that we know of that has developed a prognostic model for this purpose, so the results cannot be compared to previous work.
The performance of the full and parsimonious models was similar, which means that utilising all candidate factors offered very little advantage over using two for making predictions. Indeed, variable selection eliminated 8 candidate prognostic factors that had no clear evidence for an association with I-IMIs. Our findings confirm previous suggestions that PHE tests designed to measure modifiable physical and performance characteristics typically offer poor predictive value [10]. This may be because unless particularly strong associations are observed between a PHE test and injury outcome, the overlap in scores between individuals who sustain a future injury and those who do not results in poor discrimination [10]. Additionally, after measurement at a single timepoint (i.e. pre-season), it is likely that the prognostic value of these modifiable factors may vary over time [47] due to training exposure, environmental adaptations and the occurrence of injuries [48].
The variable selection process resulted in a model which included only age and the frequency of previous IMIs within the last three years, which are simple to measure and routinely available in practice. Our findings were similar to the modest association previously observed between age and hamstring IMIs in elite players [19]. However, while a positive previous hamstring IMI history has a confirmed association with future hamstring IMIs [19], we found that for lower extremity I-IMIs, cumulative IMI frequency was preferred to the time proximity of any previous IMI as a multivariable prognostic factor. Nevertheless, the weak prognostic strength of these factors explains the parsimonious model’s poor discrimination and low potential for clinical utility.
Our study is the first to examine the clinical usefulness (net benefit) of a model to identify players at high risk of IMIs and who may benefit from preventative interventions such as training load management, strength and conditioning or physiotherapy programmes. Our parsimonious model demonstrated no clinical value at risk thresholds of less than 36%, because its NB was comparable to that of providing all players with an intervention. Indeed, the only clinically useful thresholds that would indicate a high-risk player would be 37-71%, where the model’s NB was greater than giving all players an intervention. However, because of the high baseline IMI risk in our population (approximately 44% of participant-seasons affected), the burden of IMIs [1-5] and the minimal costs [10] versus the potential benefits of such preventative interventions in an elite club setting, these thresholds are likely to be too high to be acceptable in practice. Accordingly, it would be inappropriate to allocate or withhold interventions based upon our model’s predictions.
Because of severe overfitting our parsimonious model was optimistic, which means that if used in new players, prediction performance would likely to be worse [38]. Although our model was adjusted (shrunk) to account for overfitting and hence improve its calibration performance in new datasets, given the limitations in performance and clinical value, we cannot recommend that it is validated externally or used in clinical practice.
This study has some limitations. We measured candidate factors at one timepoint each season and assumed that participant-seasons were independent. While statistically complex, future studies may improve predictive performance and external validity by harnessing longitudinal measurements and incorporating between-season correlations.
We also merged all lower extremity I-IMIs rather than using specific muscle group outcomes. Although less clinically meaningful, this was necessary to maximise statistical power. Nevertheless, our limited sample size prohibited examination of complex non-linear associations and permitted a small number of candidates to be considered. A lack of known prognostic factors [19] meant that selection was mainly guided by data quality control processes and clinical reasoning, so it is possible that important factors were not included. Risk prediction improves when multiple factors with strong prognostic value are used [15]. Therefore, future research should aim to identify novel prognostic factors, so that these can be used to develop models with greater potential clinical benefit. This may also allow updating of our model to improve its performance and clinical utility [49].
Until the evidence base improves, and because of sample size limitations, it is likely that any further attempts to create a prognostic model at individual club level would suffer similar issues. Importantly, this means that for any team, the value of using preseason PHE data to make individualised predictions or to select bespoke injury prevention strategies remains to be demonstrated. However, the pooling of individual participant data from several participating clubs may increase sample sizes sufficiently to allow further model development studies [50], where a greater number of candidate factors could be utilised.
6. CONCLUSION
Using PHE and injury data available pre-season, we have developed and internally validated a prognostic model to predict I-IMI risk in players at an elite club, using current methodological best practice. The paucity of known prognostic factors and data requirements for model building severely limited the model’s performance and clinical utility, so it cannot be recommended for external validation or use in practice. Further research should prioritise identifying novel prognostic factors to improve future risk prediction models in this field.
Data Availability
An anonymised summary of the dataset that was analysed during this study may be available from the corresponding author on reasonable request.
CONTRIBUTORS
TH was responsible for the conceptualisation of the project, study design, database construction, data extraction and cleaning, protocol development and protocol writing. TH conducted the data analysis, interpretation and wrote the main manuscript. RR provided statistical guidance and assisted with development of the study design, analysis, and edited manuscript drafts. MC assisted with the study conceptualisation and design, protocol development, clinical interpretation and editing the manuscript drafts. JS provided guidance with the study design, development of the analysis and protocol, supervision and interpretation of the analysis, as well as editing the study manuscripts. All authors read and approved this final manuscript.
COMPLIANCE WITH ETHICAL STANDARDS
Conflict of interest
Tom Hughes and Michael J. Callaghan are employed by Manchester United Football Club. Richard D. Riley, and Jamie C. Sergeant declare that they have no known conflicts of interest.
Funding
The lead researcher (TH) is receiving sponsorship from Manchester United Football Club to complete a postgraduate PhD study programme. This work was also supported by Versus Arthritis: grant number 21755.
Informed consent
Informed consent was not required as data were captured from the mandatory PHE completed through the participants’ employment.
Ethical approval
The data usage was approved by the football club and the Research Ethics Service at the University of Manchester.
DATA AVAILABILITY STATEMENT
An anonymised summary of the dataset that was analysed during this study may be available from the corresponding author on reasonable request.
PATIENT CONSENT FOR PUBLICATION
Not required.
ACKNOWLEDGEMENTS
The authors would like to thank all staff within the Medical and Sports Science Department at Manchester United for their continuing help and support with this manuscript and thank all players for their participation (without whom this study would not be possible). The authors also thank the Centre for Epidemiology Versus Arthritis for their support: Versus Arthritis grant number 21755.
Footnotes
AUTHOR AGREEMENT STATEMENT: We the authors, agree to the sharing of this preprint on SportRχiv