FormalPara Key Points

Detection of drug safety signals in children, who represent a heterogeneous population, where age may be a confounder or effect modifier, is an area in which only limited research has been carried out.

The signal detection algorithms (SDAs) showed good performance on pediatric data and can be utilized for pediatric signal detection.

Age adjustment did not improve the performance of the SDAs.

Age stratification showed that some signals may be detected only in specific pediatric age groups. For routine surveillance, checking for effect modification across age strata may generate useful information.

1 Introduction

Spontaneous reports of suspected adverse drug reactions (ADRs) can yield important information regarding the safety of drugs [1]. Usually, such reports are screened for emerging safety issues by applying statistical methods called signal detection algorithms (SDAs). Current SDAs compare the reporting rate of a drug–event combination (DEC) of interest with the expected count calculated from the overall reporting rate of that reaction in the entire database [1, 2]. Although SDAs are routinely applied to reports pertaining to the general population, the performance of SDAs in the pediatric population specifically has not been investigated to date. Compared with adults, the pattern of drug use and occurrence of ADRs in pediatric patients may differ [35] since the latter population comprises a heterogeneous group of subjects at various stages of development with age-dependent organ maturation and hormonal changes [6]. Several studies investigating ADR reporting in children have identified different reporting patterns in this population than in adults [3, 5, 7, 8]. Since ADRs may be age specific, adjustment for age seems to be a logical step when investigating pediatric ADRs and has been advocated by some researchers [4]. The major aim of stratification is verification of confounding and effect modification which otherwise may mask true signals [9]. Confounding by age can be dealt with by stratifying for age categories and pooling stratum-specific estimates. However, if age-specific estimates differ (in case of effect modification) pooling/adjustment should not be done; instead, a verification of each individual stratum should be performed. While stratification has been investigated by some researchers [10], adjustment is routinely implemented in some Bayesian but not in frequentist SDAs [1113]. Few studies have systematically addressed the impact of age stratification or adjustment and the results are contradictory [9, 14, 15].

Within the context of the Global Research in Pediatrics (GRiP) Network of Excellence [16], we aimed to evaluate the performance of two well-established SDAs in the pediatric population and determine if age stratification or adjustment impacts signal detection in this population.

2 Methods

2.1 Data Source

Data were retrieved from the publicly available version of the US FDA Adverse Event Reporting System (FAERS), which comprises spontaneous reports of suspected ADRs submitted by manufacturers, healthcare professionals, and patients. FAERS is one of the largest repositories of spontaneous reports in the world [17, 18]. In this study, we analyzed reports received from the first quarter of 2004 through to the third quarter of 2012.

For performance analysis, only reports of ADRs occurring in children and adolescents (<18 years of age) were retained. The ADRs in FAERS are coded according to the Medical Dictionary for Regulatory Activities (MedDRA®) [19].

To improve the quality of the dataset, we excluded reports with missing age, the main variable in our study. Also, reports with reported age equal to zero and with a MedDRA® preferred term indicating prenatal exposure were removed, as these imply in utero drug exposure and were therefore not relevant for our study. We minimized the number of duplicates (i.e., the same report submitted by different reporters) by applying an algorithm based on case identifier, report identifier, and drug and event names. For multiple reports (i.e., the same report is reported at a later time, with additional and updated information) [20], the most recent (and most updated) report was retained for analysis.

As drug names included in FAERS are not standardized, a harmonization procedure was implemented. Briefly, this consisted of removing superfluous characters and applying a generalized edit distance matching algorithm [21] to map free-text drug names to synonyms and finally to the corresponding active substance and World Health Organization–Anatomic Therapeutic Chemical (WHO-ATC) code.

In this study, only those drugs reported as the primary or secondary suspect in the FAERS database were retained for analysis. Analysis was performed at DEC level, meaning that within each report, every suspect drug was combined with all reported ADRs. Thus, one report may comprise more than one DEC.

2.2 Signal Detection Algorithms (SDAs)

We tested two well-established SDAs that are routinely used by various national and international regulatory and/or research institutions for signal detection: the proportional reporting ratio (PRR) [2] and the empirical Bayes geometric mean (EBGM) [13] (see Table 1). We also tested count of reports as a positive control. In order to define a signal of disproportionate reporting [22, 23], we selected thresholds that are currently applied in routine practice. We applied the SDAs at the end of the study period, when the maximum number of reports had accrued.

Table 1 Signal detection algorithms and corresponding thresholds applied

2.3 Performance Assessment Measures

The performance of the SDAs was assessed by calculating diagnostic test-related statistics, namely specificity and sensitivity, positive predictive value (PPV), and negative predictive value (NPV) [24, 25]. Sensitivity is the ability of the method to identify true signals correctly, while specificity is the ability to exclude false signals correctly. PPV and NPV are posterior probabilities, describing how many of the signals classified as positive or negative are indeed correctly classified [24, 25].

Since diagnostic test-related statistics are dependent on the threshold choice, their individual comparison has only a limited, albeit practical, value. Therefore, we also estimated the area under the curve (AUC) of receiver operating characteristics (ROC) in order to compare the performance of the SDAs [26]; the AUC incorporates both sensitivity and specificity across all the possible values for a certain SDA. Calculation of AUCs was conducted by varying only the point estimate of each SDA and did not take into account the other components of the SDA.

For the purpose of performance evaluation, a previously constructed pediatric-specific GRiP reference set of positive and negative DECs was used. It consists of 37 positive and 90 negative DECs and includes drugs that are administered to children and events that are regarded as important for this population. The positive DECs are those that were confirmed to occur based on evidence from Summary of Product Characteristics (SmPC) and the published literature, while the negative DECs are those that could not be confirmed at the time of literature review by either the SmPC or the published literature. For a full description of the reference set, see Osokogu et al. [27].

2.4 Stratification and Adjustment for Age

The impact of age stratification and adjustment on the performance of the SDAs was investigated. First, we checked for possible effect modification across age strata, by stratifying the data according to age categories defined by the International Conference on Harmonization (ICH) [28] and calculating stratum-specific measures for each SDA. Secondly, we calculated age-adjusted estimates for PRR and EBGM by combining the stratum-specific estimates in an overall measure [29]. The performance of each SDA was reassessed after adjustment.

2.5 Statistical Analysis

Differences in the performance (AUC) of each SDA, crude versus age-adjusted and crude versus count of reports (positive control) were tested using paired chi-squared tests. Stratum-specific contingency tables were tested for homogeneity using the Breslow Day Tarone test [30]. The Mantel-Haenszel approach was used for pooling and calculating age-adjusted estimates [29]. The lower bound of the EBGM 95 % confidence interval (EBGM05) was calculated using the lower bound of the 95 % confidence interval (EB05) for each stratum and then computing a Mantel-Haenszel average based on Zeinoun et al. [31]. Statistical significance was defined by p < 0.05.

Analysis was performed using SAS® software version 9.2 (SAS Institute, Cary, NC, USA). Graphs were made in SAS® software version 9.2 and R version 3.1.3.

3 Results

3.1 Descriptive Analysis

For the study period (first quarter of 2004 through to the third quarter of 2012), a total of 4,285,088 reports were retrieved from FAERS. After eliminating duplicates (n = 43,125) and removal of adult reports (n = 2,686,530) and reports with missing age (n = 1,419,524) or age equal to zero with a MedDRA® preferred term indicating prenatal exposure (n = 20,235), 115,674 reports corresponding to 893,587 individual DECs were retained for analysis of pediatric spontaneous reports (see Table 2).

Table 2 Description of pediatric reports by age categories

The total number of pediatric reports that included the investigated drugs and ADRs from the reference set can be observed in Fig. 1, which also shows data regarding adults (for comparison purposes). The number of children exposed to the drugs of interest, for whom any of the investigated ADRs was reported, varied from 26 patients (for praziquantel) to 7535 patients (for ibuprofen), with a median of 781 patients exposed across all drugs. The number of events of interest in FAERS ranged from 164 reports (ventricular arrhythmia) to 14,777 (anaphylaxis), with a median of 1004 reports across all events. For a more detailed description of reports counts please refer to Electronic Supplementary material Table 1.

Fig. 1
figure 1

Count of reports in the pediatric and adult population for the investigated adverse drug reactions (a) and drugs (b), cumulatively for the period quarter 1 2004 to quarter 3 2012. The number of reports in children is represented by bars and plotted on the left axis, while the number of reports in adults is represented by the red line and plotted on the right axis; reports with missing age or age = 0 were excluded. Only reports mentioning any of the drugs or events in the reference set were considered

3.2 Overall Performance of SDAs

Both SDAs showed high specificity and low sensitivity. They both had similar specificity values (PRR: 83.8 % and EBGM: 91.9 %), while sensitivity was lower for EBGM than for PRR (17.2 vs. 37.9 %). The NPV and PPV were similar for both SDAs. When we applied the threshold-independent (AUC-based) approach, the tested SDAs showed similar performance in the pediatric population, although the AUC value for EBGM (0.745) was slightly higher than for PRR (0.731). None of the SDAs performed better than the simple report count (AUC = 0.634; p = 0.27 for PRR and p = 0.14 for EBGM)

3.3 Stratification and Adjustment for Age and its Impact on Performance

Upon calculating SDA values per age stratum and testing for heterogeneity across strata, we observed effect modification for some associations. Some false negatives (positive DECs that failed to be highlighted as signals when analyzing data pertaining to the entire pediatric population) were unmasked in some strata. Four DECs were unmasked in total: ibuprofen–thrombocytopenia and isoniazid–seizure (by PRR) and clarithromycin–erythema multiforme and ibuprofen–erythema multiforme (by EBGM). Conversely, ‘ibuprofen–acute liver injury’, also a positive DEC, was highlighted when we analyzed data pertaining to the entire pediatric population, but it became clear after stratifying that this DEC was highlighted only in older children (adolescents) and not in younger children (see Fig. 2). For an overview of SDA values across age strata and results of heterogeneity tests please refer to Electronic Supplementary Material Figures 1A and 1B.

Fig. 2
figure 2

Variation of proportional reporting ratio and empirical Bayes geometric mean estimates across pediatric specific strata—selected examples. EBGM empirical Bayes geometric mean, PRR proportional reporting ratio, SDA signal detection algorithm

We evaluated the performance of the methods within individual age strata (see Table 3). On average, performance of the SDAs was lower within age strata than in the entire pediatric population and performance improved with increasing stratum size. For infants and neonates, the performance was very low, not better than chance (p > 0.5 for both SDAs). The adolescent group exhibited the best performance, which was similar to the overall performance.

Table 3 Performance of signal detection algorithms across age strata

After adjusting for age by pooling the stratum-specific estimates, the performance of the SDAs decreased, although not significantly (see Fig. 3; crude vs. adjusted AUC for PRR: 0.731 vs. 0.688, p = 0.267; crude vs. adjusted AUC for EBGM: 0.745 vs. 0.683, p = 0.216).

Fig. 3
figure 3

Performance of signal detection algorithms within the entire pediatric population

4 Discussion

In this study, we have demonstrated that age stratification for detection of drug safety signals in children may unmask some signals that do not appear in either crude or adjusted analysis. Adjustment for age does not improve performance of the PRR and EBGM.

For the investigated events, similar reporting patterns were observed for children and adults, while the investigated drugs appeared to have different reporting patterns (see Fig. 1). Different drug-related reporting patterns in children versus adults have been reported previously [5]. Consequently, reported DECs for children may differ from adults [3, 5], underlining the need for pediatric-specific approaches to signal detection, especially when we consider that reported drugs may vary by age group even within the pediatric population [3, 32].

Overall, the PRR and EBGM showed good performance, although results were slightly lower than results reported on other (not pediatric-specific) reference sets [32, 33]. The similarity in performance between PRR and EBGM is in accordance with recent results from the PROTECT (Pharmacoepidemiological Research on Outcomes of Therapeutics by a European Consortium) project [23]. The fact that the performance (based on AUC) of PRR and EBGM was not statistically significantly better than simple report count may be due to the lack of power. Within age strata, performance seemed to correlate with stratum size: the poorest results were observed for infants and neonates (the smaller groups), slightly improving for children, while the best performance was observed for adolescents, the age stratum with the highest number of tested DECs. Decrease in power due to fewer reports and therefore DECs may account for this observation. The fact that we used lower bounds of confidence intervals for signaling instead of point estimates might have exacerbated the influence of sample size on the results, since smaller strata will have higher variability. In neonates and infants for whom expected counts were difficult to calculate because of few reports, we observed that simple report counts performed similar or even better than the SDAs and might be an alternative to commonly used SDAs. The fact that simple report count performed better than SDAs may have been because the reference set comprised known DECs (which in turn may have influenced reporting) rather than emerging safety issues, a hypothesis proposed by Norén et al. [34].

Inspection of SDA values across child-specific strata (age stratification) revealed some heterogeneity in estimates, pointing to some effect modification. For example, ‘ibuprofen–thrombocytopenia’ was found as a signal in the adolescents’ group but not detected in the entire pediatric population or the younger age categories. This suggests that age-specific SDA calculations are sometimes needed, rather than age-adjusted SDA estimates. The age-adjusted estimates did not improve performance; in fact, even PPV unexpectedly decreased. Simulation studies have shown that when adjusted for strata, Bayesian methods such as EBGM tend to be underestimated when there are sparse strata [15]; this was also the case in our study. Previous studies in adults show contradictory results, with some showing a beneficial effect [9] while others did not [15]. The reason for our finding is not entirely clear; a possible explanation is that age is not a strong confounder for the investigated DECs. Also, the method of weighting (Mantel-Haenszel approach) may have played a role since more weight was assigned to age groups with more reports (adolescents and children). This may have masked signals occurring in age groups with fewer reports.

The limitations of data mining in FAERS include those inherent to spontaneous reporting databases: under-reporting, lack of denominator data and control group, biases in reporting, as well as missing and poor-quality data [35]. Missing information regarding age substantially reduced the study sample size since we could not determine whether these reports described patients aged less than 18 years old. While these biases are well acknowledged and have a definite impact, they cannot be completely avoided. Compared with adults, there are fewer reports and different reporting patterns for children [3, 36, 37], which may complicate signal detection in the pediatric population.

Evaluating performance of SDAs is a constant challenge due to lack of standard methodologies, imperfect reference standards, and uncertainty regarding the best thresholds (see the Electronic Supplementary Material for measures of performance using alternative thresholds). Some of the drugs and events in the reference set are specific to one age group within pediatrics and this is obvious in Fig. 1, even though the reference set was designed to be relevant for the entire pediatric population. We acknowledge that the reference set used, although specifically constructed for this purpose, does not include all the ADRs that are highly specific for pediatrics. This highlights the need for pediatric-specific approaches to signal detection, accounting for not just the entire pediatric population but also the different age strata within pediatrics. Still, the reference set captures various drug use and ADRs patterns [38] and is currently the only available pediatric-specific reference set. The thresholds applied to define a signal were obtained from previous publications and other cut-off points may generate better results; further research on pediatric-specific thresholds should be encouraged.

5 Conclusion

Our study revealed that age adjustment did not improve the performance of the SDAs. However, stratification revealed some variation in the values of SDAs across strata (effect modification) and inspection of stratum-specific estimates might sometimes yield useful information during routine surveillance.