Abstract
Due to patient heterogeneity, the exact mechanisms of paediatric abdominal pain (AP) remain unknown. We sought to resolve this by identifying paediatric AP phenotypes and developing predictive models to determine their associated factors. In 13,790 children from a large birth cohort, the frequencies of paediatric and maternal demographics and comorbidities were catalogued from general practitioner records. Unsupervised machine learning clustering was used to identify phenotypes of paediatric AP with shared characteristics. Predictive paediatric AP models were constructed using paediatric and maternal demographics and comorbidities.
1,274 children experienced AP (9.2 %) (average age: 8.4 ± 1.1 years old, male/female: 615/659), who clustered into 3 distinct phenotypes: phenotype 1 with an allergic predisposition (n = 137), phenotype 2 with maternal comorbidities (n = 676), and phenotype 3 with minimal other comorbidities (n = 340). As the number of allergic diseases or maternal comorbidities increased, so did the frequency of AP, with 17.6% of children with ≥ 3 allergic diseases and 25.6% of children with ≥ 3 maternal comorbidities. Furthermore, in high-risk children who met both ≥ 3 allergic diseases and ≥ 3 maternal comorbidities, 30.8% had AP. Predictive models demonstrated modest fidelity in predicting paediatric AP (AUC 0.66), showing that a child’s ethnicity and paediatric/maternal comorbidities were strongly predictive factors. Our findings reveal distinct phenotypes and associated factors of paediatric AP, suggesting targets for future research to elucidate the exact mechanisms of paediatric AP related to allergic diseases, ethnicity, and maternal comorbidities.
1. Introduction
Abdominal pain (AP) is one of the most common symptoms among children and adolescents, with prevalence rates across the USA and Europe ranging from 0.3% to 19.0% [1–3]. In primary care, children presenting with AP are diagnosed with functional AP or non-medically explained in 80% of the cases, while organic causes are considerably less frequent [4,5]. According to the internationally recognized Rome Criteria, functional AP is now described as disorders of gut-brain interaction (DGBI) [6]. Though rarely life-threatening, AP is often refractory to treatment and associated with psychiatric comorbidities, such as anxiety and depressive disorders [7–9], significantly affecting health-related quality of life [10]. Therefore, early recognition and intervention are warranted.
Organic diseases, as well as early life events ranging from psychological abuse, allergy status, psychological comorbidity, and parental factors, are proposed risk factors for paediatric AP [5,11–16]. Moreover, these diverse risk factors can influence each other, contributing to its complexity and heterogeneity. A more robust stratification with large patient cohorts is essential to understanding the aetiology of paediatric AP, plausibly disclosing novel insights into its underlying mechanisms.
A robust approach for identifying subgroups of patients with shared characteristics is data-driven clustering by unsupervised machine learning (ML) [17–20]. Any yielded subgroups may share an underlying mechanism associated with AP. Furthermore, supervised ML is considered a powerful tool for clinical outcome prediction [21,22], and it could aid clinicians in assessing the risk of AP development in early childhood [23]. We hypothesised that an ML algorithm could phenotype children with AP according to common routinely available characteristics and thus help unravel the complex underlying disease mechanisms. Additionally, we hypothesised that ML algorithms could predict the development of paediatric AP and help identify the important factors associated with it.
In this study, we systematically evaluated the risk factors of paediatric AP in a large birth cohort. Using unsupervised ML, we delineated phenotypes of paediatric AP using paediatric and maternal clinical data. ML models tasked to predict the development of paediatric AP moreover reveal underlying factors linked to its frequency, catalytic for future hypothesis-generating research.
1. Methods
1.1. Participants
Between 2007 and 2011, 12,453 pregnant women (recruited at 26 - 28 weeks) and 13,858 children were registered in the Born in Bradford (BiB) cohort. The detailed demographics of the whole cohort are described elsewhere [24]. For this study, we included children who had linked general practitioner (GP) records and whose comorbidities could be identified using the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT). Additionally, the clinical records of their mothers were included in the analysis. The details of the study population are illustrated in Figure 1.
1.2. Definition of AP and frequency survey of paediatric and maternal comorbidities
AP was defined as the presence of one of the following diagnoses in SNOMED-CT, regardless of its cause or whether it was acute or chronic: AP, abdominal wall pain, or generalized AP. The frequencies of comorbidities, such as gastrointestinal (GI), psychological, and allergic diseases (asthma, eczema, urticaria, or hay fever), as well as diseases causing somatic pain, were also retrieved using SNOMED-CT, since these are associated with both organic and functional AP [11,13–15,25–27]. Additionally, the comorbidities of the children’s mothers were extracted to identify any associations between the maternal comorbidities and paediatric AP. The diseases of the children and their mothers investigated in this study and their SNOMED-CT codes are listed in Supplemental digital content 1. Fathers were not included in our analysis pipeline due to a significant proportion of missing values (exceeding 74.7%).
1.3. Study 1. Identifying factors associated with paediatric AP
In Study 1, we aimed to determine the frequency of AP, elucidate the clinical characteristics of children with AP, and identify its associated factors. First, we statistically compared the demographics (gender, ethnicity, and mode of delivery) and frequency of comorbidities between children with and without AP. Then, we identified factors associated with paediatric AP from paediatric and maternal demographics and comorbidities. In Study 1, all comorbidities listed in the GP records were used to assess the association between AP and comorbidities, regardless of whether they occurred before or after the onset of AP.
1.4. Study 2. Phenotyping paediatric AP using unsupervised ML clustering
To elucidate the mechanisms of paediatric AP, we deemed patient stratification crucial. Study 2 focused on children experiencing AP, utilising unsupervised ML clustering to delineate deep phenotypes. In Python (version 3.7.12) [28], Uniform Manifold Approximation and Projection (UMAP), a non-linear dimension reduction technique, was used to embed all variables for subsequent clustering into a three-dimensional latent space [29]. Scatterplots were generated based on these latents, and phenotypic AP clusters were identified using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), a density-based hierarchical method [30]. Subsequently, demographics and comorbidities of children and their mothers were compared across the identified phenotypes to elucidate their distinct characteristics. Additionally, we examined the frequency of AP and the quantitative burden of significant comorbidities associated with AP identified in the comparison between the phenotypes.
1.5. Study 3. Using supervised ML to predict the development of paediatric AP
To facilitate early intervention through the prediction of AP onset, we developed extreme gradient-boosted tree (XGB) classifiers, a highly effective and well-validated supervised ML algorithm extensively detailed elsewhere [31]. Given the rarity of children with AP compared to those without, data imbalance could potentially reduce prediction model performance. In line with common research practice, we initially randomly selected an equal number of children without AP from the control group to match those with AP (Fig. 1).
For predicting AP, model inputs included demographic and pre-existing comorbidities diagnosed before the onset of AP. Rare comorbidities with a frequency of less than 1% were excluded to prevent overfitting. The variables used for predictive models and their frequencies in children with or without AP are summarised in Supplemental digital content 2.
The entire dataset was randomly partitioned into 70% for model training and 30% for out-of-sample testing. Using the training data, we assessed the incremental value of variable categories by constructing several models: Model 1 included all variables, Model 2 used only children’s variables, and Model 3 used only mothers’ variables.
Additionally, based on insights from the clustering analysis, we developed Model 4, focusing on counts of allergic diseases and maternal comorbidities. Further details on the methodology for constructing predictive models are provided in Supplemental digital content 3. We assessed the performance of the predictive models by creating receiver operating characteristic (ROC) curves and computing the area under the curve (AUC).
To evaluate how each variable contributed to the models, we used Shapley Additive exPlanations (SHAP) values. SHAP values provide a unified measure of the importance of variables by assigning each variable an importance value for a particular prediction. This method helps in understanding the contribution of each variable to the model’s output by considering the impact of each variable on the prediction, averaged over all possible combinations of variables. By using SHAP values, we can gain insights into the relationship between the variables and the predicted outcome, thereby enhancing the interpretability of the model’s predictions [32,33].
1.6. Statistical analysis
Continuous data were expressed as mean ± standard deviation (SD). Categorical data were expressed as numbers plus percentages. The Student’s t-test and Chi-squared test were used for numerical data and categorical data, respectively. Univariate and multivariate logistic regression analyses were utilised to calculate odds ratios (OR) and 95% confidence intervals (CI) for each variable, determining variables associated with paediatric AP. Variables significant in the univariate analysis were included in the multivariate analysis. A p-value < 0.05 was considered statistically significant. EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan) was used for statistical analysis [34].
2. Results
2.1. Study 1. Identifying factors associated with paediatric AP
A total of 13,790 children were included in the analysis, of whom 1,274 (9.2%) experienced AP (male/female = 615/659) (Table 1). The average age at diagnosis of AP was 5.6 ± 2.7 years. Compared to children without AP, those with AP were more likely to be female (51.7% vs. 48.1%, P = 0.01). Additionally, more children of Pakistani origin were in the AP group (58.6% vs. 45.8%, P < 0.01), while fewer were of white British origin (22.3% vs. 36.4%, P < 0.01).
Children with AP had higher rates of comorbidities, including allergic diseases (47.2% vs. 37.3%, P < 0.01), constipation (4.5% vs. 1.1%, P < 0.01), and gastro-oesophageal reflux disease (GORD) (5.2% vs. 2.7%, P < 0.01). Moreover, mothers of children with AP reported higher incidences of AP (49.5% vs. 31.8%, P < 0.01), allergic diseases (47.8% vs. 39.4%, P < 0.01), arthritis (8.6% vs. 4.0%, P < 0.01), depressive and/or bipolar disorder (24.3% vs. 21.0%, P = 0.01), GORD (13.4% vs. 7.0%, P < 0.01), irritable bowel syndrome (IBS) (9.9% vs. 7.0%, P < 0.01), and migraine (22.4% vs. 14.7%, P < 0.01).
Univariate and multivariate logistic regression analysis identified several significant variables associated with paediatric AP, summarized in Table 2 (non–significant results are summarized in Supplemental digital content 4). In multivariate analysis, significant associations with paediatric AP included female gender (OR, 1.17; 95% CI, 1.04–1.33; P = 0.01), Pakistani origin (OR, 1.57; 95% CI, 1.38–1.78; P < 0.01), allergic diseases (OR, 1.20; 95% CI, 1.06–1.36; P < 0.01), migraine (OR, 2.92; 95% CI, 1.61–5.30; P < 0.01), and Ehlers-Danlos syndrome (EDS) (OR, 4.09; 95% CI, 1.17–14.30; P = 0.03), as well as various GI diseases. Regarding maternal comorbidities, allergic diseases (OR, 1.16; 95% CI, 1.03–1.32; P = 0.02), GORD (OR, 1.47; 95% CI, 1.21–1.77; P < 0.01) and migraine (OR, 1.23; 95% CI, 1.05–1.43; P = 0.01), were significantly associated with paediatric AP, suggesting the adverse effect of maternal comorbidities on paediatric AP.
2.2. Study 2. Phenotyping children using unsupervised ML clustering
In Study 1, multiple factors were significantly associated with paediatric AP. However, due to the heterogeneity of the AP group, it was unclear which factors were relevant to specific cases. To address this, we employed unsupervised ML clustering to phenotype the children with AP, aiming to identify specific subgroups of AP by analysing stratified patient groups. The unsupervised model classified children with AP into three distinct phenotypes: 137 children in Phenotype 1 (10.8%), 677 children in Phenotype 2 (53.1%), and 340 children in Phenotype 3 (26.7%) (Fig. 2). The remaining 120 children (9.4%) exhibited miscellaneous characteristics and were sparsely distributed on the plot rendering them unclassifiable by the model; these children were excluded from downstream analyses.
Subsequently, to investigate the characteristics of the three phenotypes identified by ML clustering, we compared the clinical characteristics among the three phenotypes (Table 3). Age at diagnosis of AP, gender, route of birth, and ethnicity showed almost no significant differences among the three phenotypes. The frequency of GI disorders was low in all phenotypes and showed few significant differences between the phenotypes. A summary of the other clinical characteristics of each phenotype is described below.
Phenotype 1 (AP with allergic predisposition)
Most children in Phenotype 1 had allergic diseases (99.3%), and all mothers had allergic diseases (100%), which were significantly higher than in Phenotypes 2 and 3 (P < 0.01). On the other hand, mother’s AP was not observed in this phenotype. In short, this phenotype was characterized as ‘AP with allergic predisposition’, suggesting relevance of allergic mechanisms in AP development in children.
Phenotype 2 (AP with mother’s comorbidities)
In this phenotype, the frequency of allergic diseases is relatively high at 68.7%, but other comorbidities in the children were uncommon. In contrast, maternal comorbidities showed the highest frequencies in this phenotype: AP (70.0%), allergic diseases (40.0%), depressive and/or bipolar disorder (37.2%), GORD (18.9%), and migraine (34.0%). We termed Phenotype 2 as ‘AP with mothers’ comorbidities,’ suggesting that the mothers’ illnesses may play a role in the development of paediatric AP rather than comorbidities in the children themselves.
Phenotype 3 (AP with the least comorbidities)
The frequency of mother’s AP (46.2%) was the second highest in Phenotype 3. However, the frequencies of other comorbidities, including children’s allergic diseases and other maternal comorbidities were uncommon. This phenotype was termed ’AP with the least comorbidities’, suggesting mechanisms of AP onset that could not be explained by paediatric or maternal comorbidities investigated in this study.
Impact of allergic diseases and maternal comorbidities on paediatric AP
Since ML phenotyping suggested that there were groups of children whose AP was associated with paediatric allergies and maternal comorbidities, we investigated the effect of allergic diseases and maternal comorbidity burdens on the frequency of paediatric AP. As the number of paediatric allergic diseases increased, the frequency of AP in children increased commensurately (Fig. 3A). Specifically, 17.6% of children with ≥ 3 allergic diseases experienced AP, which was significantly more frequent than those with 0–2 allergic diseases (P < 0.01). Similarly, it was found that as the number of maternal comorbidities increased, the frequency of AP in children also increased (Fig. 3B). In cases where mothers had ≥ 3 comorbidities, the frequency of paediatric AP was 25.6%, which was significantly higher than in cases with 0–2 comorbidities (P < 0.01). Furthermore, when comparing the high-risk children (those who met the criteria of having both ≥ 3 allergic diseases and ≥ 3 maternal comorbidities) to the others (control), the high-risk group had a significantly higher frequency of AP (30.8 vs. 9.1%, P < 0.01) (Fig. 3C). When visualising and comparing the frequency of children with ≥ 3 allergic diseases and/or ≥ 3 maternal comorbidities across each phenotype using a Venn diagram, it was found that the frequency of having ≥ 3 maternal comorbidities was significantly higher in phenotype 2 than in the other phenotypes (P < 0.01) (Fig. 3D-3F).
2.3. Study 3. Development of ML predictive models for paediatric AP
Finally, we constructed predictive models for the development of paediatric AP. Model 1, which used all the paediatric and maternal clinical and demographic data (detailed in Supplementary digital content 2) achieved an AUC value of 0.63 (95% CI, 0.59– 0.67) on the test dataset (Fig. 4A). Model 2, which included only the paediatric clinical data, showed almost the same performance as Model 1, with an AUC of 0.64 (95% CI, 0.60–0.68) on the test dataset. Both models demonstrated modest ability to distinguish between children with and without AP. According to SHAP feature importance, variables such as ethnicity and gender were more influential in Models 1 and 2 compared to paediatric or maternal comorbidities (Figs. 4B–4E). Specifically, the model predicted that White British children were less likely to develop AP, while Pakistani children and females were more likely to exhibit AP. Meanwhile, the performance of Model 3, which utilised only maternal clinical data, was weaker, with an AUC of 0.54 (95% CI, 0.50–0.58) on the test dataset. In Models 1 and 3, maternal AP was the most significant contributing factor among the maternal comorbidities, indicating that children with maternal AP were more likely to develop AP (Figs. 4F– 4G).
Based on the results of the analysis in Study 2, we further developed Model 4, which focused on the number of allergic diseases and maternal comorbidities (Figs 4H–4I). Despite being a simpler model using only variables such as ethnicity, gender, the number of allergic conditions, and the number of maternal comorbidities, Model 4 achieved an AUC of 0.66 (95% CI, 0.62–0.70), superiorly performant to the other models.
3. Discussion
3.1. Summary of findings
The frequency of paediatric AP in our study cohort of 13,790 children from the BiB birth cohort was 9.2 %. The univariate and multivariate logistic regression analysis demonstrated that some paediatric and maternal factors, such as paediatric allergic diseases and maternal AP, in addition to paediatric GI disorders, were associated with paediatric AP. ML-based clustering successfully identified three paediatric AP phenotypes, implying subgroups within children with AP. The phenotype characteristics suggest that allergic diseases and maternal comorbidities are relevant in developing paediatric AP across two of those phenotypes. The performance of the predictive models for paediatric AP was moderate when using the information from GP records. The greatest determinants of paediatric AP were ethnicities, maternal comorbidities (especially maternal AP), paediatric gender, and paediatric allergic predisposition.
Associated factors for paediatric AP
Our study showed that maternal comorbidities, such as GORD and migraine, were significant associated factors for paediatric AP. This aligns with previous research, showing that parental factors are related to functional AP in children [11,14,15]. One of those studies has shown that children of mothers with IBS had more GI symptoms, which was explained by the effect of parents’ solicitous behavioural response to children’s symptoms [15]. Moreover, IBS and migraine are more common among mothers of children with functional AP [35]. Thus, maternal comorbidities and paediatric AP appear associated.
Our study also revealed that paediatric and maternal allergic predispositions were important factors for paediatric AP. Some studies have reported that allergic diseases form significant risk factors for functional GI disorders such as IBS in children [12,26,36]. Furthermore, a recent study revealed that injection of food antigens into the rectosigmoid colon of IBS patients induces local oedema and mast cell activation, proposing IBS may be a food-induced disorder mediated by mast cell activation localized to the intestine [37]. This study implies that GI neuro-immune reactions are one of the possible causes of AP in children with allergic predispositions.
3.2. Cluster analysis identifies 3 distinct phenotypes
The ML clustering revealed 3 distinct phenotypes of paediatric AP. In all phenotypes, the frequency of paediatric GI diseases was very low, suggesting that paediatric AP could not be simply explained by GI diseases. Since Phenotype 1 was characterized by allergic diseases, the aforementioned allergic mechanism could account for AP in Phenotype 1 [37]. Furthermore, in Phenotype 2, in which maternal comorbidities were associated, maternal illness behaviour or their solicitous response to their children are potential mechanisms of paediatric AP [15,38]. Our results reveal areas for future research and suggest tailored therapeutic approaches based on the identified phenotypes and risk factors that may enhance the management of paediatric AP.
In Phenotype 3, maternal abdominal pain was relatively frequent (46.2%). Therefore, some of the AP in children included in Phenotype 3 may be explained by maternal comorbidities. However, since the frequency of maternal comorbidities is much lower compared to Phenotype 2, it is unlikely that all cases can be attributed to this factor. It is possible that children in Phenotype 3 are characterized more by other variables, and the information available from GP records may have been insufficient to deeply phenotype these children. Furthermore, 9.4% of children were classified as unclassified in this study. With more detailed data, it might be possible to identify these unclassified children as a distinct phenotype with specific characteristics.
3.3. Machine prediction of paediatric AP
The predictive models’ performance was moderate. The combination of the child’s own factors and the mother’s comorbidities tended to yield good performance. This result implies that the development of paediatric AP is potentially a result of an interacting combination of the children’s innate factors and those of their surrounding family or genetics.
In the models that utilised paediatric clinical characteristics (Models 1, 2, and 4), ethnicity emerged as an important contributor, consistent with its significance in the logistic regression model. The BiB cohort is unique in that children in this cohort are mainly comprised of White British (29.3%) and Pakistani ethnicities (39.5%) [24].
Socioeconomic status was significantly different among ethnicities in the BiB cohort [39], and lower socioeconomic status is a risk factor for pain conditions in children, including AP [40]. Therefore, the variations in the cohort’s socioeconomic status might be one reason why ethnicity was an important factor both in the logistic regression analysis and in the predictive modelling.
The model focusing on the number of allergic diseases and the number of maternal comorbidities demonstrated performance comparable to the other models. As discussed in the previous section, this suggests that these factors are particularly important in the development of paediatric AP. Furthermore, it is interesting to note that while maternal AP was not significant in the multivariable logistic regression, it was the most important factor among maternal comorbidities in the predictive model construction, indicative of a complex nonlinear interaction simpler models could not disclose.
3.4. Limitations
Our study is not without limitation. Firstly, the diagnosis of diseases was based on GP records; therefore, the criteria used for each diagnosis and the exact causes of AP in each case are unknown and beyond our control. However, most children with AP are diagnosed with functional AP pain at the primary care level [5], and the frequencies of organic diseases related to AP were rare in this cohort. Hence, functional AP would be the most likely cause of AP. Second, although we evaluated the performance of our predictive models in an out-of-sample test partition, we did not entirely validate them using a different cohort of children, which would maximise generalisability further. Finally, although the study relied on GP records, by incorporating more detailed data related to AP, such as socioeconomic status, lifestyle, functionality of the autonomic nervous system, and the gut microbiota composition, more detailed phenotyping and more accurate predictions of paediatric AP may be possible [41–43].
3.5. Conclusions
Our study identified paediatric and maternal comorbidities as significant associated factors for paediatric AP. Using data-driven clustering techniques, we uncovered three distinct phenotypes of paediatric AP. These findings provide a foundation for future research aimed at elucidating the precise mechanisms underlying paediatric AP. The predictive models developed in this study highlight the potential for early identification and intervention. Further studies incorporating more detailed socio-economic, lifestyle, and biological data could refine phenotyping and these models, leading to more accurate predictions and paving the way for optimised, patient-personalised treatment strategies.
Funding
This study was supported by JA Niigata Kouseiren Grant (Niigata University School of Medicine). JKR was supported by the Medical Research Council.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Footnotes
H.Satti and Q.Aziz are joint senior authors.
Due to issues with the genetic analysis method, we have excluded the results of the genetic analysis in this version.