## Abstract

High Risk Neuroblastoma (HRNB) is the second most frequent solid tumor in children. Prognosis remains poor despite multimodal therapies. Mathematical models have been developed to describe metastasis, but their prognosis value has yet to be determined and none exists in neuroblastoma.

We established such a model for HRNB relying on two coefficients: *α*(growth) and *μ* (dissemination). The model was calibrated using diagnosis values of primary tumor size, lactate dehydrogenase circulating levels (LDH) and the meta-iodo-benzyl-guanidine (mIBG) SIOPEN score from nuclear imaging, using data from 49 metastatic patients treated according to the European HR_NBL1 protocol.

The model was able to accurately describe the data for both total tumor mass (LDH, *R*^{2} > 0.99) and number of visible metastasis (SIOPEN, *R*^{2} = 0.96). Statistical analysis revealed significant association of LDH with overall survival (OS, p=0.0268). However, clinical variables alone were not able to generate a Cox-based model with sufficient prognosis ability (p=0.507). The parameter *μ* was found to be independent of the clinical variables and positively significantly associated with OS (p = 0.0175 in multivariate analysis). Critically, addition of this novel computational biomarker to the clinical data drastically improved the performances of predictive algorithms, with a concordance index in cross-validation going from 0.755 to 0.827. The resulting signature had significant prognosis ability of OS (p=0.0353).

Mechanistic modeling was able to describe pathophysiological data of metastatic HRNB and outperformed the predictive value of clinical variables. The physiological substrate underlying these results has yet to be explored, and results should be confirmed in a larger cohort.

**Significance** A mechanistic mathematical model of metastasis in high risk neuroblastoma is able to describe clinical data and provides a numerical biomarker with superior predictive power of overall survival than clinical data alone.

## Introduction

Neuroblastoma is the second solid tumor in children (8-10% children cancers in USA and Europe) with a median age at diagnosis around 2 years (1,2). Neuroblastoma is responsible of almost 15% of childhood deaths by cancer (3). Neuroblastoma is a quite heterogenous disease at clinical, histological and biological levels (4). Consequently, its prognostic spectrum is also wide (5). The International Neuroblastoma Risk Group (INRG) proposed in 2009 a classification model depending of cancer data (dissemination of neuroblastoma, histology, grade of tumor differentiation, genetic abnormalities such as MYCN amplification (6) and age (3,7)). Therefore, neuroblastoma is divided into 3 risk-groups : Low, Intermediate and High Risk Neuroblastoma (HRNB), which display different survival rates. For patients treated according to the International Society for Paediatric Oncology European (SIOPEN) recommendations, 5-year overall survival is more than 90% for the first group thanks to minimal therapeutics (surgery and/or chemotherapy or simple overseeing), 60 to 80% for the second (5) and < 50% for the lasts group, representing nearly 50% of patients (3,8–11), despite intensive multimodal treatments. Furthermore, patients progressing during or after initial response to induction have a dismal 5-year event-free survival (<20% for patient with early progressive disease (12,13)). For these refractory patients, current therapeutics are unsatisfactory and new treatments or therapeutic strategies are needed.

As early as 1964 (14), efforts have been made to develop mathematical models to assist cancer research (15). Their aim was to understand multiple biological processes involved in cancer and to propose rational tools for the design of therapeutic drug regimen (16,17). Three main types of mathematical models can be distinguished. On one hand, highly complex, multiscale models try to integrate as much of the biology as possible, ranging from intra-cellular molecular processes to systemic interactions at the whole organism level (18). This approach requires many parameters and consequently the models are often impossible to reliably calibrate for clinical purpose. On the other hand, purely statistical models and artificial intelligence techniques rely on agnostic algorithms that try to learn patterns directly from the data (19,20), with applications mostly in genomics (21) and radiology (22), but rarely in clinical oncology. In between, mechanistic or semi-mechanistic models seek to describe only the main determinants of a cancer disease, for a given purpose (e.g., prediction of survival from tumor kinetics (23), understanding (24,25) or prediction (26,27) of metastatic relapse).

To our knowledge no mechanistic model has yet been established and validated for clinical neuroblastoma. In this study, we define such a model for high-risk neuroblastoma to describe the metastatic burden at diagnosis, using two coefficients: a patient specific parameter *μ* for the dissemination process and a patient nonspecific parameter *α* for the growth process. The model was built and calibrated using clinical, biological and radiological data from a monocentric cohort of 49 patients with HRNB treated according to the HRNBL1 protocol (9). We then evaluated the prognostic value of the individual parameter *μ* and tried to identify Ultra High-Risk patients.

## Material and methods

### Ethics statement

Authorization to perform the study was obtained at APHM (Public Assistance of Marseille’s Hospitals) Health Data Access Portal (number request 32PTJ5)). We respect the Informatic and Liberty Law (1978) for the use of data. All parents and patients when appropriate gave consent to participate in the study.

### Data collection

Our population is made of 49 patients with HRNB (see stratification algorithm in Table S1) treated according to the HRNBL1 protocol recommendations (9), in the paediatric hematology and oncology unit of the children hospital of the University Hospital of Marseille (AP-HM) between 11/26/2007 and 08/30/2018. Entry date was the date of diagnosis. For survival analyses, end date was either the date of patients’ death or the date of last news. Inclusion criteria were the inclusion criteria of the HRNBL1 protocol (9) (see Figure S1). Briefly, induction chemotherapy with “rapid COJEC” or “modified N7 induction” is given for 10 weeks, followed by surgery when considered possible, then myeloablative chemotherapy with hematopoietic peripheral stem cell transplantation. Treatment is then completed with radiotherapy and maintenance therapy with immunotherapy (anti-GD2 ± IL2 therapy) and retinoic acid for 6 months.

Data were gathered from Personalized Computerized Folders (PCF) by the Axigate platform used in AP-HM, which includes neuroblastoma risk factors such as age at diagnosis, lactate dehydrogenase (LDH) – correlated to the total tumor volume (6,9,28,29) – or MYCN amplification, researched by polymerase chain reaction from peripheric blood and/or from primary or metastatic tumor tissue at diagnosis. The dataset is available in the supplementary file S1.

### SIOPEN scoring (Figure S2)

The meta-iodo-benzyl-guanidine (mIBG) is known to bind to neuroblastoma cells using iodine 123 (I123) (30) and mIBG scintigraphy is consequently used to evaluate the extent of the disease, in agreement with INRG recommendations. Indeed, mIBG binds to almost 90% of neuroblastomas (31) in primary tumors, but also bone, bone marrow (32) or even soft tissues with a high sensibility (85-94%) (33). We used a semi quantitative SIOPEN score that was elaborated to predict extension and severity of the disease (34). A high score has been shown as pejorative but no reproducible cut off has not yet been found (31,34). We established the SIOPEN score with the PCF data using the Centricity imaging software, or by retrospective double scoring scintigraphy with an experiment nuclear physician (*LT*).

### Tumor characteristics

Location and size of primary tumors was evaluated using radiological reports performed at diagnosis. Forty-five patients (91.8%) had scanner, 11 patients (22.4%) magnetic resonance imaging (MRI). Primary tumor volumes were estimated by the formula: with *α* half the largest axis, *b* half the medium axis and *c* half the smallest axis of an ellipsoid tumor.

Location and number of visible metastases were retrieved from mIBG scintigraphy imaging. Metastases locations were recorded in imaging interpretation. In addition, bone marrow metastases were searched by performing myelograms and bone marrow biopsies.

Date of best treatment response was recorded according to the International Neuroblastoma Response Criteria (INRC) (35). Date of relapse was recorded as the date on which unfavorable evolution of the disease was highlighted by radiology (scanner and/or MRI) and nuclear imaging (positron emission tomography and/or mIBG I123 scintigraphy). We defined Ultra-High-Risk (UHRNB) patients as relapsing or progressing within 18 months after diagnosis.

## Mathematical model

### Definition

The mathematical model was based on a previously published framework (25,26,36), which allows to simulate a cancer disease at the organism scale, including growth of the primary tumor (PT) and birth and growth of secondary lesions (Figure 1). We assumed growth of both the primary and secondary tumors to follow an exponential law:

where *S*_{p}(*t*) and *S*(*t*) denote the sizes of a primary and a secondary tumor (expressed in number of cells), starting from one cell at time *t* = 0. The parameter *α* denotes the proliferation rate. Assuming a metastasis birth rate proportional to the PT size with parameter *μ*, the number of metastasis at time *t* is given by (25):

The parameter *μ* corresponds to the per day probability for each cell of the PT to spread and establish a distant metastasis. The total metastatic burden (total number of metastatic cells in the organism) is given by (26):

Visible metastases at time *t* (i.e. metastases with size larger than a visibility threshold *S*_{vis}) are the ones that were born early enough to have reached *S*_{vis} at *t*, that is, before *t* − τ_{vis}, where τ_{vis} is the time to reach *S*_{vis} (see Figure 1). This time is given by and the number of only visible metastases can then be computed as:

The visibility threshold *S*_{v s} is considered a patient-specific model parameter. Numerical simulations of M were performed using the fast Fourier transform algorithm as implemented in the *scipy* python package (python 3.7), exploiting the convolution structure of the equation (37).

For forward simulations of the model, a discrete version was employed with initiation time *T*_{i} and size *S*_{i} of the i-th metastasis given by:

### Calibration

Cell doubling times (CDTs) in neuroblastoma was a prerequisite to estimate *α*. We searched PubMed for studies related to CDT, as well as CDTs from a database of commercial cell population (supplementary file S2). All cell lines were obtained from human patients and CDTs were established in vitro. Of the 73 strains studied, 15 were excluded due to a lack of knowledge regarding possible exposure to chemotherapy.

Median CDT was 48h (20 - 258h). We then fixed . This further allowed us to compute an estimation of the PT age (or time of diagnosis, *T*_{d}):
where *S*_{d} is the size of the PT at diagnosis. This last quantity was derived from three diameters obtained from computed tomography imaging, which allowed computation of the PT volume assuming ellipsoidal shape. This volume was converted into a number of cells using the standard assumption of 1 mm^{3} ≃ 10^{6} cells (38).

For each patient, two quantitative measurements were used to compare the metastatic model to the data: the SIOPEN score and the LDH blood level. The former was assumed to be a surrogate of the visible number of metastasis and the latter to represent the total cancer burden in the organism (PT + metastases, Figure 1). Denoting with *i* superscript the quantities that depend on individual *i*, we thus assumed:
which expresses a proportional error model for the observations with standard deviation *σ* = 0.1, corresponding to a 10% measurement error. The parameter *ϕ* represents a patient non-specific conversion coefficient between number of cells and LDH (expressed in units per liter, UI/L). Maximization of the log-likelihood for the expression above leads to minimization of the following objective function:

With

The parameter *ϕ* was arbitrarily fixed to 10^{−9} UI/L/cell, from preliminary simulations. Minimization was implemented using the Nelder-Mead algorithm of the *minimize* function of the *scipy* python package (python 3.7).

### Statistical and predictive analysis

Due to ranges spanning several orders of magnitude, individual values of LDH levels and the mathematical parameter *μ* were log-transformed beforehand. Association between clinical variables and/or the individual mathematical parameter ln *μ* with progression-free survival or overall survival was assessed using log-rank tests for dichotomized groups, as well as univariate and multivariate proportional hazard Cox regression models for continuous covariates. The *lifelines* python package was used to fit the models. Resulting models were evaluated for their predictive power by computing the mean of Harrell’s c-index (39) during a ten-folds cross-validation procedure. It is the equivalent of the area under the ROC curve for survival analysis (proportion of correctly predicted pairs of subjects). Specifically, we first selected the variables below a p=0.2 significance threshold in univariate analysis. Then multivariate Cox models including only these variables were trained on the cross-validation learning sets, and the c-index was computed on the corresponding test sets. For construction of prognosis scores, Cox models were trained using the selected variables on the entire data set. For a patient *i* with covariates *x*, the score was defined by *β*^{T}*x* with *β* the vector of the Cox coefficients. Calibration of these Cox models was assessed using calibration plots at the median survival value. These bin patients according to predicted survival and compare survival predictions with the Kaplan-Meier estimates of the patients within the bin (*calibrate* function of the *rms R* package).

## Results

### Description of the cohort

#### Patients and tumor characteristics (Table 1)

Forty-nine patients were included in our cohort. Two girls of 26 and 11 months diagnosed with low risk neuroblastoma were included due to early progression. The MYCN status for both patients was negative at diagnosis but changed for the younger one at relapse. We excluded 4 patients for the construction of the mathematical model as the date of inclusion in the HRNBL1 protocol was delayed when compared with the initial diagnosis (pre-treatment with other off-protocol chemotherapy types and one 36 months-old girl with a metastatic, MYCN negative esthesio-neuroblastoma whose size could not be estimated).

Median age was 36 months (range 11-140). LDH levels at diagnosis were high with a median of 842 UI/L (302-22022), compared with laboratory normal values < 300 UI/L. Metastases were present for most patients (91.8%) and SIOPEN scores were overall high (median 27 (0-60)). Three patients who had a negative mIBG (no fixing primary tumor on scintigraphy) but PET-visible metastases were excluded. All patients underwent bone marrow aspirates and/or biopsies.

Location of primary tumors was adrenal for 55,1% patients (n=27) and abdominal for 34,7% (n=17). Details are given in Figure S3A. Primary tumor median volume was 272 cm^{3} (range 0.5 – 2 266 cm^{3}). Locations of metastases are detailed in Figure S3B. The most frequent metastatic site was bone marrow (n=38, 77.6%).

### Patients outcome

All patients were evaluable for response to induction chemotherapy. Twenty-three patients had complete response (46.9%), 24 partial response (49%) and only 2 had stable disease (4.1%). However only 20 patients did not ultimately progress (40.8%). Among those who progressed (59.2%), 25 died (51% of whole initial cohort). Details of patients survival are showed in Figure 2. The median survival without progression was 29 months. At 3 and 5 years, progression-free survival rates were 44.1%, and 29.1%, respectively. Median overall survival (OS) was 43 months. At 3 and 5 years, OS rates were 55.8% and 38.9%, respectively.

### Descriptive power of the mathematical model

To describe the metastatic burden of HRNB patients, we developed a semi-mechanistic modeling approach whereby the metastatic process is reduced to two main phenomena: growth and dissemination (Figure 1). Growth was assumed to be exponential and the dissemination rate to be proportional to the primary tumor size, with a proportionality factor *μ*. To rely to the data and estimate *μ*^{i} in a given patient *i*, we assumed that the LDH level was a surrogate of the total tumor mass, whereas the SIOPEN score reflected the number of visible metastases (Figure 1). We also used the primary tumor size at diagnosis to infer the age of the tumor and simulate the pre-diagnosis history of the disease. The model was able to accurately reproduce the SIOPEN score and LDH levels (Figure 3A-B, *R*^{2} = 0.96 and > 0.99, respectively). Interestingly, The parameter ln *μ* revealed no correlation with either the log(LDH) (R = 0.25) or the SIOPEN (R = 0.201, Figure 3C), suggesting independent added value of this parameter – possibly informative of progression or survival – as compared to the data alone.

### Simulations of the natural history of HRNB

Statistical inference of the model parameters *μ*^{i} and allowed to perform simulations of the predicted natural history of the disease. Representative patients are shown in Figure 4A (see Figure S4 for all patients). These highlight the high inter-individual variability, well captured by the mathematical model (Figure 3). As a general observation however, we predicted that once initiated, metastatic dissemination then experienced a “burst”, with multiple metastasis born in a short period of time (see patients 1, 16, 24, 25 and 30 in Figure 4A). Nevertheless, the time *T*_{fm} between the first cancer cell and birth of metastasis was variable, from almost simultaneous to cancer initiation (e.g. patient 24, 0.42 days) to one month (e.g. patient 25, 34.4 days), see Figure 4B. The predicted age of the PT was less variable (median 76 days, range 58.1 – 82.1 days).

## Prognostic analysis

### Analysis of classical prognosis factors

Using log-rank tests for dichotomized groups (Figure S5), no significant difference was found in progression-free survival (PFS) for gender (p = 0.219), MYCN status (p = 0.354) or age (p = 0.825 with, 18 months cut-off). For LDH and SIOPEN, separation at the median value or from literature thresholds (9,34) was not able to significantly discriminate patients and statistical significance was only reached for extreme values, respectively at the 80^{th} percentile (2,375 UI/L) and the 90^{th} percentile (46.2) (Figure S5). In univariate Cox regression analysis, only the LDH level was associated with PFS (Hazard Ratio (HR) 1.6 (95%CI: 1 – 2.56), p=0.05). For analysis of the predictive power, univariate analysis selected the variables LDH and SIOPEN at a p < 0.2 threshold. The mean c-index in cross-validation was 0.603 for PFS.

Similarly, no significant difference was found in overall survival (OS) for gender (p = 0.198), MYCN status (p = 0.181) or age (p=0.527). Again, a significant difference in OS was found only for at the 80^{th} and 90^{th} percentiles of LDH levels and SIOPEN score. Using univariate Cox regression, only the LDH rate was significantly associated with OS (HR 1.74 (95%CI: 1.07 – 2.84), p=0.0268), which was confirmed in multivariate analysis (Figure 5A). The predictive model selected the variables LDH and MYCN status as having p < 0.2, which resulted in a c-index of 0.755. However, these variables were not sufficient to yield a Cox score able to significantly discriminate patients (p=0.507, Figure 5B).

### Added value of the mechanistic model

Neither *μ* nor *S*_{v s} were significantly associated with PFS (p = 0.475) in log-rank analysis (Figure S5) or Cox analysis. On the other hand, the parameter *μ* seemed to be the quantitative parameter most associated with OS in dichotomized analysis (Figure S6), even if not significant probably due to a crossing of the survival curves. Critically, confirming this observation, *μ* was positively significantly associated with better OS in multivariate Cox regression analysis (HR=0.839 (95% CI: 0.726 – 0.97), p=0.0175, see Table 2 and Figure 5C). Moreover, adding this novel computational biomarker *μ* to the previous significant biomarkers LDH and MYCN strongly increased the predictive power with an updated c-index of 0.827 (+9.54%). In addition, this superior predictive power was further supported by the fact that a novel Cox score, derived from the coefficients of LDH, MYCN and log(*μ*), was now able to significantly separate patients between poor and good prognosis (p=0.0353, Figure 5D). This was also confirmed by better calibration plots when including *μ* vs when not including it (Figure S7).

## Discussion

We reported here about the development of a mathematical model of metastatic neuroblastoma using a mechanistic approach based on classical risk factors, easy to collect at diagnosis and routinely used by clinicians. Our model could adequately describe total cancer mass represented by LDH levels as well as visible metastases represented by the SIOPEN scores. Beyond this mere descriptive value, our analysis suggested predictive value of a new computational biomarker *μ*, able to significantly better predict outcomes for patients.

Tumor growth is a complex biological process, that includes tumor proliferation, regulation of abnormalities concerning stem cells (40), neoangiogenesis (32,33), microenvironment interactions (40,41), immune interactions with tumor cells and dysregulation immune system (4,41–43). These complex interacting processes are regulated by many genes or epigenetic regulators (44) currently still being investigated. How to model these complex properties remains an open debate. We have used a semi-mechanistic approach relying on both clinical and radiological data to model neuroblastoma growth. Such mechanistic models of metastasis have already been successfully used for different cancer types such as kidney (24), breast (26,27) or lung (25). Although not included here, these models can incorporate the effects of multiple therapies (i.e. surgery, chemotherapy) or tumor-tumor interactions to describe and predict tumor dynamics (26,45,46). Moreover, the limited number of parameters used in these models allows a quick translation to potential clinical applications.

A very limited number of studies have focused on the mathematical modeling of neuroblastoma genesis, growth, or metastatic evolution. Ciccolini et al. (47) have reported a mechanistic model of neuroblastoma growth, using a classical Gompertzian model. Their model was used to optimize gemcitabine metronomic chemotherapy administration but did not intend to model metastatic dissemination. Elsewhere, He et al. (48) coupled a complex vasculature model fitting the dynamic growth of the human neuroblastoma cell line IMR32 in mice and a pharmacokinetics/pharmacodynamics model of bevacizumab, an anti-VEGF agent, to determine the best dosing regimen for this treatment and predict its effectiveness. Kasemeier-Kulesa *and al*. (49) have developed a molecular network model of developmental genes and signaling pathways with a 6 gene inputs logic model, using the discrete Boolean logic, and based on 4 cell states (differentiation, proliferation, angiogenesis, apoptosis). The model was able to predict the stage of the human neuroblastoma SHSY5Y and then the outcome of 77 early stage patients. Recently, Hidalgo et al. (50) modeled the whole cell signaling pathways data linking analysis of different pathways to molecular mechanisms involved in cancer physiopathology and patient survival. They identified numerous pathways implicated in the activation or deactivation of several cell functions responsible of poor outcomes in patients with neuroblastoma by for instance promoting of proliferation and apoptosis inhibition (TP53), angiogenesis (FASLG), or metastasis (THBS1, PTPN11 and cAMP AFDN). All the models proposed above are nevertheless not easily translatable in the clinic.

Alternatively, although not being the mainstay yet, individual molecular profiling has been studied in neuroblastoma (3,4). Several studies explored genome wide associations to predict outcomes for HRNB patients (51,52) but they are not used yet in day-to-day clinical practice. The relevance of the identified markers can also be questioned due to the lack of evidence of a causal relationship (53).

Our mechanistic model is a simplified representation of the metastatic process, reducing it to two basic phenomena: growth (*α*) and dissemination (*μ*). Our findings suggested growth to be exponential and fast, with primary tumors reaching 10 cm tumors in less than 3 months. This contrasts with previous studies where tumor growth kinetics were shown to slow down and deviate from an exponential pattern, following rather a gompertzian pattern (25,54). This is probably explained by the embryonic nature of this cancer, which makes it particularly aggressive. Dissemination was also characterized with a very high value of *μ* (median = 0.0446 metastases/cell/day) in comparison to other adult cancers (e.g. typical *μ* = 2.30 × 10^{−l2} metastases/cell/day in breast cancer). Notably, the inferred values of metastatic sizes at diagnosis and threshold values *S*_{v s} fall within biologically realistic ranges (median *S*_{v s} = 14 mm), although no a priori metastatic size information was used when calibrating the model. Only numbers of metastases above a parametric threshold (automatically fitted by the model) were used.

The parameter *μ* was found a better prognostic tool than the validated SIOPEN score at diagnosis and can be combined with LDH and MYCN status to predict OS. Interestingly, a high *μ* value is paradoxically an independent and statistically significant factor of better OS in our cohort. This might be explained by two possible hypotheses. First, patients with high *μ* may have an aggressive neuroblastoma with high replicative potential, which may result in a better sensitivity to chemotherapy and therefore better survival. Second, patients with high *μ* have a bigger total cancer burden, but this total mass could result in systemic inhibition of proliferation (46). This would be consistent with the fact that *μ* is a good prognosis factor for OS but not for PFS. Indeed, the visible mass might progress while suppressing the growth of smaller, invisible tumors. Assuming further that death results from the total mass present in the organism and not only the visible lesions could possibly explain why patients who progress are distinct from patients with the largest number of metastases (i.e., largest *μ*). To further confirm or invalidate these hypotheses, further mechanistic insights could be gained by linking *μ* to molecular analyses of the tumor. The micro-environment and more specifically the immune system might also be implicated in slow tumor progression and a host’s tumor long-term control.

The major limitation of our study is the limited (n=45) number of patients included in our analysis, due to the monocentric nature of our cohort. This nevertheless corresponds to all HRNB patients treated at our institution during over a 10 years period. This prevented us to extract a test set from the data before any analysis, on which to evaluate the predictive power of the model a posteriori. Nevertheless, we tested the predictive abilities of *μ* using cross-validation, i.e. on independent sets independent from the learning ones.

In this cross-validation process, the model changes on each fold, thus deciding on the “best” model (i.e. the ones with highest c-index across the 10 folds) still relies on entire data set. Critically, the mean c-index was found higher when using *μ* to predict survival, reaching a very good score (> 0.8). Further research should evaluate the predictive value of this final model on independent data.

## Conclusion

We developed a mechanistic mathematical model, using human data and a limited number of standard risk factors required in the clinic. We have chosen to use prognostic factors that are available at the time of diagnosis, in order to be able to provide upfront alternative therapeutic strategies adapted to the patients most likely to be unresponsive to a conventional high-risk neuroblastoma treatment. The model can reproduce tumor spreading of high-risk neuroblastoma in our patients and also predict patient prognosis, better than clinical variables only. It also led to the creation of a new risk score based on the parameter *μ*, which is associated with better OS outcome in our population. These findings must be confirmed in a larger cohort and the physiological substrate underlying this result should be explored.

## Data Availability

The data will be made available upon acceptance of the manuscript for publication in a peer-reviewed journal

## Supplementary figures and tables

## Footnotes

**Conflict of interest:**The authors declare no potential conflicts of interest