Mandibular dose-volume predicts time-to-osteoradionecrosis in an actuarial normal-tissue complication probability (NTCP) model: External validation of right-censored clinico-dosimetric and competing risk application across international multi-institutional observational cohorts and online graphical user interface clinical support tool assessment

Background: Existing studies on osteoradionecrosis of the jaw (ORNJ) have primarily used cross-sectional data, assessing risk factors at a single time point. Determining the time-to-event profile of ORNJ has important implications to monitor oral health in head and neck cancer (HNC) long-term survivors. Methods: Demographic, clinical and dosimetric data were retrospectively obtained for a clinical observational cohort of 1129 patients with HNC treated with radiotherapy (RT) at The University of Texas MD Anderson Cancer Center. ORNJ was diagnosed in 198 patients (18%). A multivariable logistic regression analysis with forward stepwise variable selection identified significant predictors for ORNJ. These predictors were then used to train a Weibull Accelerated Failure Time (AFT) model, which was externally validated using an independent cohort of 265 patients (92 ORNJ cases and 173 controls) treated at Guy’s and St. Thomas’ Hospitals. Findings: Our model identified that each unit increase in D25% is significantly associated with a 12% shorter time to ORNJ (Adjusted Time Ratio [ATR] 0·88, p<0·005); pre-RT dental extractions was associated to a 27% faster (ATR 0·73, p=0·13) onset of ORNJ; male patients experienced a 38% shorter time to ORNJ (ATR 0·62, p = 0·11). The model demonstrated strong internal calibration (integrated Brier score of 0·133, D-calibration p-value 0.998) and optimal discrimination at 72 months (Harrell’s C-index of 0·72). The model also showed good generalization to the independent cohort, despite a slight drop in performance. Interpretation: This study is the first to demonstrate a direct relationship between radiation dose and the time to ORNJ onset, providing a novel characterization of the impact of delivered dose not only on the probability of a late effect (ORNJ), but the conditional risk during survivorship. Funding: This work was supported by various funding sources including NIH, NIDCR, NCI, NAPT, NASA, BCM, Affirmed Pharma, CRUK, KWF Dutch Cancer Society, NWO ZonMw, and the Apache Corporation.


Introduction
Osteoradionecrosis of the jaw (ORNJ) is a severe iatrogenic sequela of radiotherapy (RT) that impacts patients treated for head and neck cancer (HNC) at an estimated prevalence of 4-15% 1 .Radiationassociated devascularization of bone and normal tissue injury lead to loss of cortical bone integrity which fails to heal, resulting in a constellation of symptoms that substantially reduce quality of life and limit oral function 2,3 .Depending on severity, ORNJ management ranges from conservative therapy (i.e., topical antimicrobials) to morbid surgical extirpation of nonviable tissue.However, these treatments often target the alleviation of disease symptoms rather than the root causes of ORNJ 4 .Additionally, the lack of early detection methods and risk assessment tools for ORNJ make prevention and detection difficult for medical professionals to preemptively manage the condition, often delaying care and resulting in costly, invasive interventions.
Patients are at a lifetime risk of developing ORNJ following RT, which is of increased concern for patients who are now living much longer following RT for HNC.The explosive growth of human papilloma virusassociated (HPV+) oropharyngeal cancer (OPC) often affecting non-smokers has resulted in more longterm survivors of patients treated with radiation for HNC.For this reason, determining the long-term (>2 year) time-to-event profile of ORNJ has important implications for professionals who may be monitoring the oral health of long-term survivors for many years after intensive disease surveillance has transitioned to survivorship 5 .Numerous studies [6][7][8][9] have examined the statistical correlation between ORNJ and various dosimetric, clinical, and demographic risk factors.In a previous investigation 10 , we pioneered and externally validated the first ORNJ Normal Tissue Complication Probability (NTCP) model.While these investigations offer invaluable insights that steer clinical decision-making and treatment plan optimization, they largely rely on cross-sectional datasets, where the potential for reporting bias is extant, and do not account for right-censoring (i.e., consideration of cases without event or who are under surveillance and have not had ORNJ but remain at risk).As Van den Bosch et al. 11 note "NTCPmodels are generally developed for a single complication grade at a single time point", thus overlooking the temporal variability in toxicity risk.
Understanding the influence of treatment decisions and risk factors on the timing of ORNJ is crucial for effective prevention and management.Treister et al. 12 carried out risk factor association analysis on a longitudinal ORNJ data set with data points at 6, 12 and 24 months and identified pre-RT extractions, higher RT dose and tobacco use as significant risk factors.The present study aims to model longitudinal associations to provide patient-specific ORNJ risk predictions over time.Non-parametric and semiparametric models, such as Cox proportional hazard, have been widely used.However, these methods may not fully capture the nuanced temporal dynamic of the event due to their broader assumptions about data distribution.Accelerated Failure Time (AFT) models offer a valuable parametric alternative, which enhances the interpretability of each factor's influence on the event onset.Weibull models are especially attractive as a parametric approach for time-to-event applications for risk-prediction [13][14][15] as an interpretable alternative to the clinically familiar non-parametric proportional hazards methods.
As part of a larger effort to leverage right-censored models to inform risk-based surveillance and prophylactic management trial enrollment 16 , as well as considering challenges to traditional NTCP models 17 , we have sought to undertake the following specific aims: i) Determine relative actuarial incidence of ORNJ over time and establish the relationship between mandibular dose-volume and actuarial progression-to-ORNJ. ii) Derive actuarial NTCP models for ORNJ that incorporate right-censored data for patient risk stratification.iii) Derive and externally validate a dose-aware time-to-event model with clinico-demographic factors.iv) Develop and test an online clinical decision support tool with a graphical user interface (GUI) for clinical implementation of the risk model, including formal stakeholder usability testing.

Methods
A multivariable time-to-event prediction model was developed on an internal dataset from MD Anderson Cancer Centre (MD Anderson) and externally validated on an independent cohort of a British population treated at Guy's and St Thomas' NHS Foundation Trust (GSTT).

Patient Selection
After institutional review board approval (RCR030800), data from a philanthropically funded observational cohort (Stiefel Oropharynx Cancer Cohort, PA14-0947) were extracted for retrospective acquisition.Patients in the internal MD Anderson cohort included all consented RT cases treated with curative intent from 2005 to 2022.Patients undergoing RT for HNC are closely followed up with clinical and radiological assessments every 3 to 6, 12, 18 to 24 months then approximately annually after the end of the RT course.An external cohort was obtained retrospectively from the HNC clinical database maintained at GSTT under the Northwest -Haydock Research Ethics Committee of the NHS Health Research Authority (REC reference 18/NW/0297, IRAS project ID: 231443); patients treated between 2011 and 2022 were included.The GSTT clinical protocol for HNC patients includes clinical follow up for five years.Control subjects in the GSTT cohort were retrospectively matched with a 2:1 ratio based on primary tumor site and treatment year.Incomplete or not available datasets were excluded.

Clinical Endpoint
The primary analysis framework focused on ORNJ as the sole event of interest.The primary endpoint was defined as the development of any physician-reported grade of ORNJ following the initiation of RT (i.e., ORNJ vs. no ORNJ), with the time to event (TTE) recorded as the interval (in months) from the start date of RT to the first documented instance of ORNJ in the patient's electronic medical health record.As the current datasets pre-date recent consensus recommendations 18,19 and used then-institutional standard clinical reporting, we designated all cases that were clinically and/or radiographically deemed ORNJ in the MD Anderson and GSTT cohorts as ORNJ and deferred to listed clinical documentation; this was necessary as divergent ORNJ grading systems (Tsai/Notani) were in-use (Appendix B).Patients without confirmed ORNJ diagnosis were right censored at the date of last contact.

Data & Statistical Analysis Multivariable Logistic Regression Analysis
Clinical and dosimetric parameter were collected from annotated sources at MD Anderson and GSTT using a described methodology and process (Appendix B).Multivariable logistic regression (LR) analysis with forward stepwise variable selection was carried out in R statistical software using the clinical and dosimetric data to identify the most significant variables for prediction of ORNJ to be considered for the Weibull AFT model.The LR analysis and subsequent NTCP modelling was carried as per the methodology previously used by van Dijk et al. 10

(Appendix D). AFT Model Development
A Weibull distribution is characterized by two main parameters: a scale parameter (λ), determining the distribution spread over time, and shape parameter (ρ) which indicates whether the rate of the event increases (ρ >1), decreases (ρ <1), or remains constant (ρ =1) over time.Considering covariates Χ 1 , Χ 2 , … Χ n , the function of the scale parameter can be expressed as λሺxሻ ൌ expሺβ ԛβ ଵ X ଵ ԛ ԛβ ଶ X ଶ ‫ڮ‬ ԛβ ୬ X ୬ ሻ, where β 0 is the intercept of the transformed scale when all covariates are at their reference level, while β 1, β 2, …., β n are the coefficients of the log-linear relationship between each covariate and the time to event [20][21][22][23] .The corresponding survival function for the Weibull AFT model is articulated as ቁ ቁ, where S (t;x) represents the probability of a patient surviving beyond time t without experiencing ORNJ, given their specific covariates x.
Adjusted Time Ratios (ATRs), calculated as the exponential of regression coefficients, were used to interpret the proportional impact of the model's covariates on the time to ORNJ for one unit increase in continuous variables or in relation to reference group(s) in categorical ones.
The analysis and WAFT model development was conducted in Python programming environment (version 3.11) using the 'WeibullAFTFitter' function of the Lifelines (version 0.28) survival analysis library 24 .

AFT Model evaluation
The Weibull AFT model was internally and externally validated.For internal validation, the dataset was randomly split into training (80%) and test (20%) subsets with balanced ORNJ status representation.The Weibull AFT model was fitted to the training data subset and internally evaluated on the test subset.Model performance was assessed in terms of overall performance, predictive accuracy and model calibration on both the internal and external datasets using time-independent metrics 25 .The Brier score evaluates the mean squared difference between the observed outcomes and the predicted probabilities of event occurrence.In the context of time to event analysis, the Integrated Brier score (IBS) provides a single summary measure of the model's prediction accuracy over time.The concordance index (Harrell's C-index) was used to measure the predictive accuracy of the model in terms of its ability to correctly rank the event times.Model calibration was assessed with the Distributional calibration (D-calibration), which is a measure of the calibration of the predicted survival curves (rather than a time-specific outcome prediction).

GUI Development and Prospective Assessment
The WAFT-based time-to-ORNJ online calculator graphical user interface (GUI) is available at https://uicevl.github.io/OsteoradionecrosisVis/(Figure 1), where the user can either obtain a predicted risk of developing ORNJ at a specific time point or visually assess the time-dependency of ORNJ risk with the different covariates of the ORNJ WAFT model (D25%, gender and pre-RT dental extractions).The usability of the GUI was prospectively evaluated on a test dataset by 25 users of different degrees of expertise and clinical specialties.A Qualtrics survey was designed with eight case-specific questions, the ten questions from the Brooke et al. 26

Role of the funding source
The non-commercial federal and philanthropic funding sources supported salary and resource provision, as well as trainee time protection, germane to aspects of this research, including conceptualization, methodology, software development, data curation, and project administration.Specifically, these funds 6 y , c f e e f e r , , s enabled core infrastructure, salary support, and resources necessary to carry out the research activities as well as enabling cross-institutional data collection.No funder was given editorial nor scientific review capacity for the enclosed work through either prior permission or post-analysis review.Data deposition was funded and undertaken in compliance with NIH policy for domestic federally-funded data, while EUderived data is precluded from publication owing to GDPR regulatory guidance.C1a), test (Figure C1b) and external (Figure C1c) datasets separately.

Multivariable Forward Stepwise Logistic Regression Analysis
All DVH metrics and the following clinical variables were considered in the multivariable analysis smoking status (binary, current vs. never/former), chemotherapy (yes vs. no), gender (binary, male vs female), pre-RT dental extractions (binary, yes vs. no), post-operative RT (binary, yes vs. no), tobacco pack years (continuous), age (continuous) and mandible volume (continuous).
The forward stepwise multivariable LR process was first applied considering the pre-selected variables This identified the following dosimetric and clinical variables as the highest predictors of ORNJ: V55Gy (percentage volume of the mandible that receives at least 55 Gy), D25% (minimum dose received by the most irradiated 25% of the mandible volume), pre-RT dental extractions and gender (supplementary Table D1).However, in this first forward stepwise LR process, the model's coefficient for the V55Gy parameter was largely altered when the D25% parameter was introduced, showing a correlation between the two dosimetric parameters: V55Gy and D25%.Also, the p-value for the V55Gy showed that this parameter was not statistically significant in the prediction of ORNJ.Consequently, the V55Gy parameter was excluded from the pre-selected variables subset and the modelling process repeated on the updated dataset (supplementary Table D2).Therefore, the final ORNJ NTCP model included D25%, pre-RT dental extractions and gender.Details of the ORNJ NTCP model performance are provided in supplementary Table D3.

Weibull AFT model
The ORNJ Weibull AFT (WAFT) model was trained and tested considering the entire time-to-event range in the MDACC cohort.Details of the WAFT model are provided in Table 1 and the resulting survival curves by variable are represented in Figure 3.The shape parameter of the model (X≈ 0•81) indicates a decreasing hazard rate for ORNJ over time among the study group.For the covariates, our findings suggest that each unit increase in D25% is significantly associated with an 12% shorter time to ORNJ (ATR 0•88, p<0•005).We also observed that patients who undergo dental extractions experience ORNJ at a rate 27% faster (shorter time to ORNJ) compared to those who do not undergo pre-RT dental extractions (ATR 0•73, p=0•13).A 38% (ATR 0•62, p=0•11) shorter time to ORNJ was observed in male patients.However, statistical significance of the dental extractions and gender variables was not conclusive.Cumulative hazard function and partial effects of the D25%, pre-RT dental extraction, and gender on survival outcomes were visually analyzed to aid interpretation of the influence of individual factors within the context of the WAFT model (supplementary Figure E1).

Weibull AFT model performance
Maximum discrimination performance of the model at internal validation (Harrell's C-score of 0•723) was observed at the 72 months predictive horizon (Figure 4a), which coincides with the timepoint where both groups, ORNJ and censored, exhibit the largest difference in the test dataset (supplementary Figure D1b).Model calibration at internal validation was good to excellent, with an integrated Brier score of 0•133 (Figure 4b) and, as shown in Figure 4c, successfully d-calibrated (p-value 0•998 > 0•05).Model performance decreased slightly when tested externally on the independent dataset (Appendix F).The distributional calibration plot (supplementary Figure F1c) shows that the model's predicted probabilities were consistently low compared to the actual outcomes, i.e., the model was underconfident in its predictions.  .

Discussion
Our work demonstrates for the first time a direct relationship between radiation dose and the time to development of osteoradionecrosis of the jaw (ORNJ) and is a novel characterization of the impact of delivered dose not only on the probability of a late effect (ORNJ), but the conditional risk during survivorship.
Osteoradionecrosis is an orphan disease 27  The actual time course of development of ORNJ also remains under-described.Using a more restricted criteria of "exposed bone", the most reliable cross-sectional cohort analysis of post-radiation events 12 showed, in 572 longitudinally followed participants, a cumulative rate of exposed bone of 6•1%, with all patients presenting with disease in <18 months; however, this high-quality dataset followed patients only until 24 months post-therapy.Other studies report varying median time-to-event, with clinical features such as oropharyngeal disease site 28 or dental extractions 29 associated with faster progression to ORNJ.However, formal toxicity modeling of the conditional probability of ORNJ has not been established until now.
In this study, we have successfully developed a novel time-to-event approach for predicting orodental toxicity, thus providing a more comprehensive estimation of disease trajectory to allow effective risk stratification and surveillance strategies.Additionally, we have developed and tested a WAFT-based time-to-ORNJ online calculator graphical user interface (GUI) with overall high usability scoring (Appendix G) that will facilitate clinical implementation of our model.
In a previous study 10 , a single dose-volume histogram (DVH) parameter-based traditional NTCP model was developed, demonstrating a corollary model using the received dose to 30% or more of mandibular regions of interest (mandible D30%) and pre-radiotherapy dental extraction as predictors of ORNJ.For the current study, prior to the time-to-event model development, we repeated the NTCP modelling exercise after careful manual revision of the dataset.Reassuringly, D25% (close to D30%) and pre-RT dental extractions were also identified as predictors in the updated NTCP model.
Traditional NTCP models not only rely on binarization of the clinical endpoint (e.g., the presence or absence of ORNJ) but also a fixed-interval truncation of surveillance interval without right-censoring.Thus, while there is abundant suggestion that increased radiation dose increases the risk of ORNJ, there is scant data regarding the relative relationship between pre-therapy dose or clinical factors on time to ORNJ development.Consequently, in this study, we aimed to expand on an existing ORNJ NTCP model 10 to incorporate the temporal information of ORNJ development, by using a novel application of a prediction model incorporating parametric modeling of continuous right-censored time-to-toxicityevent prediction of ORNJ.
We have previously shown that prediction models that account for right-censoring can provide differential variable selection compared to categorical classification methods 30 .Reassuringly, the use of right-censor ware TTE models nonetheless also validated our non-right-censored traditional NTCP model, and our results underscore the effect of the risk factors considered (i.e., D25%, dental extractions and gender).Put simply, these factors not only are correlates of ORNJ but are associated with faster interval of ORNJ development.This has important implication for post-RT surveillance in addition to pre-therapy dose reduction strategies.For example, enhanced surveillance imaging methods to monitor progression towards ORNJ, or risk-stratified prevention interventions are potentiated by the proposed model.
By internally and externally validating our model, we have demonstrated its reliability and applicability across diverse patient populations.Despite the slight drop in performance, most likely introduced by differences in treatment protocols and population characteristics, the external validation results indicate that the WAFT model generalized well to unseen data from an independent cohort.As opposed to the internal dataset, the external dataset was a matched cohort, with 2:1 control to case matching based on primary tumor site and treatment year, which could have resulted in a reduced variability of clinical characteristics.Moreover, time to event distributions were very different between the training and the external datasets (supplementary Figure D1): while the model was trained across predictive horizons beyond 204 months, the external test dataset was limited to a maximum of under 100 months.Additional external validation of the WAFT model on a larger and more diverse observational cohort will allow confirmation of the model's generalizability.
While there are a number of survival functions to be considered for time to ORNJ modelling, our choice of the WAFT model was based on its flexibility and suitability to the study's objectives.In future work, we will continue to explore comparisons with alternative distribution functions such as the log-linear or exponential distributions.
The presented results focused on a binary endpoint for ORNJ (i.e., any grade of ORNJ vs. no ORNJ).While this is clinically useful, future studies will expand our work to the prediction of different stages of ORNJ to allow for more personalized intervention and management protocols based on the predicted degree of ORNJ severity risk.
Finally, as van den Bosch et al. 11 note, there is a significant unmet need for novel higher-dimensional dose-aware toxicity prediction methods to address limitations of standard RT NTCP models as part of an effort to explore non-linear dose-response considerations, and the reality that multiple DVH parameters of the same organ-at-risk (OAR) may be informative have led to applications of "whole DVH" methods 17 .
In this study we used dosimetric variables extracted from DVH data.While DVH is still a widely used surrogate of radiation distribution, it does not incorporate spatial information.As a natural next step from the present work, we aim to combine the proposed time to ORNJ approach to NTCP modeling with spatial information as the dosimetric risk factor in a spatio-temporal ORNJ prediction model.
In conclusion, our ORNJ Weibull AFT (WAFT) model offers a significant advancement in predicting mandibular ORNJ risk following RT in HNC patients.Predicting the time to ORNJ allows for early identification and proactive management of high-risk patients with potential reduction of the severity of ORNJ and improvement of patient outcomes and quality of life.
SUS scale questionnaire and three additional open questions for additional feedback.More details on the survey and the SUS scale are provided in Appendix G.

Figure 1 .
Figure 1.Screenshot of the WAFT-based time-to-ORNJ online calculator GUI.

From a population
of 1259 MD Anderson patients with HNC, a total of 1129 patients were included in the final analysis, of with ORNJ was observed in 198 cases at the end of follow-up period, with a median time to event of 20•5 months (IQR 35•1).The median follow-up time for the censored group was 71•7 months (IQR 62•7).Actuarial time-to-event (denoting diagnosis of ORNJ as an event censoring at either death or last follow-up) is shown in Figure 2. The external validation GSTT cohort consisted of 92 ORNJ subjects and 173 matched controls.The median time to ORNJ was 13•6 months (IQR 20•3) and the median follow-up time for the control group was 47•3 months (IQR 24•2).

Figure 2 .
Figure 2. Frequency plot of actuarial time-to-event in months by ORNJ status for the MD Anderson dataset, where diagnosis of ORNJ is considered an event censoring either at death or last follow-up.Appendix C includes frequency plots for the training (FigureC1a), test (FigureC1b) and external (FigureC1c) datasets separately.

Figure 4 :
Figure 4: ORNJ WAFT model performance plots at internal validation.Discrimination performance variation over time is described by the Harrell's C-index (a).Overall model performance over time is described by the Brier score and integrated Brier score (IBS) (b).Model calibration is described by the Distributional calibration curve (c), which describes the computed squared difference between the observed and predicted number of events within different time intervals.
with a currently undefined prevalence, owing to variability in disease classification, and until 2023, lack of a formal International Classification of Disease specific designation [ICD-10 FB81.5], previously denoted as "Other osteonecrosis" without attribution to radiation therapy [ICD-10-CM Diagnosis Code M87.8].Historically, ORNJ (i.e., FB81.5 Osteonecrosis due to ionizing radiation & Specific anatomy:XA51B7 Mandible) has had over 20 distinct descriptive categories, grading systems, or diagnostic criteria.This ambiguity has led to highly variable estimates of the prevalence of ORNJ, with reports designating between 4-15% 1 of HNC RT cases.