Quality Assurance Assessment of Diffusion-Weighted and T2-Weighted Magnetic Resonance Imaging Registration and Contour Propagation for Head and Neck Cancer Radiotherapy

Background/Purpose: Adequate image registration of anatomic and functional MRI scans is necessary for MR-guided head and neck cancer (HNC) adaptive radiotherapy planning. Despite the quantitative capabilities of diffusion-weighted imaging (DWI) MRI for treatment plan adaptation, geometric distortion remains a considerable limitation. Therefore, we systematically investigated various deformable image registration (DIR) algorithms to co-register DWI and T2-weighted (T2W) images. Materials/Methods: We compared post-acquisition registration algorithms from three software packages (ADMIRE, Velocity, and 3D Slicer) applied to T2W and DWI MRI images in twenty HNC patients. In addition, we investigated implicit rigid registration (no algorithm applied) as a control comparator. Ground truth segmentations of radiotherapy structures (tumor and organs at risk) were generated by a physician expert on both image sequences. Three additional experts provided segmentations for five cases for interobserver variability studies. For each registration approach, structures were propagated from T2W to DWI images. These propagated structures were then compared with ground truth DWI structures using the Dice similarity coefficient (DSC), false-negative DSC, false-positive DSC, surface DSC, 95% Hausdorff distance, and mean surface distance. Results: 19 left submandibular glands, 18 right submandibular glands, 20 left parotid glands, 20 right parotid glands, 20 spinal cords, 9 brainstems, and 12 tumors were delineated. ADMIRE, the atlas-based auto segmentation DIR algorithm, demonstrated improved performance over implicit rigid registrations for most comparison metrics and structures (Bonferroni-corrected p < 0.05), while Velocity and 3D Slicer algorithms did not. Moreover, the ADMIRE methods significantly improved performance in individual and pooled analysis compared to all other methods. Interobserver variability analysis revealed no significant difference between observers (p > 0.05). Conclusions: Certain deformable registration software packages, such as those provided by ADMIRE, may be favorable for registering T2W and DWI images. These results are important to ensure the appropriate selection of registration strategies for MR-guided radiotherapy.

treatment also affords the ability to capture distinct patient anatomy with varying contrasts via weighted sequence acquisitions, such as T2-weighted (T2W) images, and functional information, such as through diffusion-weighted imaging (DWI). DWI has shown particular benefit in aiding treatment adaptation through improved detection of target volumes and assessment of treatment response [8]. Therefore, combined T2W and DWI acquisition enable the gathering of anatomic and functional information that can be used for adaptive MR-guided personalized RT.
Anatomical and functional sequences acquired in the same imaging session for MR-guided treatment often have minimal variation in patient position and geometry between sequence acquisitions. However, these multisequence acquisitions can be misaligned by motion artifacts from respiration or swallowing [4], susceptibility artifacts, chemical shift artifacts, ghosting artifacts [8], and geometric distortions [9]. Post-acquisition image registration, the process by which homologous image voxels from multi-temporal or multi-modal image sets are mapped to each other [10,11], is an important approach to align anatomical and functional sequences.
Rigid image registration involves global matching between image sets, while deformable image registration (DIR) uses optimization algorithms to adjust image transformation models. Most implementations of DIR involve a transformation that establishes a geometric correspondence between fixed and moving images, an objective function, and an optimization approach to maximize the similarity between images [12][13][14]. Importantly, even minor differences in patient anatomy can result in devastating dose administration in HNC [4,15], highlighting the need for consistent image co-registration when propagating segmentations of target volumes and OARs for radiotherapy treatment planning. Therefore, determining the impact of post-acquisition registration techniques (i.e., DIR) on multisequence MRI acquisitions is crucial for MR-guided treatment of HNC.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint While we have previously investigated intra-modality CT to CT registration [16] and intermodality CT to MRI registration [17], to our knowledge, there are no studies that investigate registration techniques for intra-acquisition MRI in HNC. Therefore, to facilitate further development and optimization of MR-guided RT adaptive planning technologies, we systematically analyzed registration methods in T2W and DWI MRI sequences acquired during the same imaging session.

Methods:
We developed a quality assurance workflow for evaluating and benchmarking the performance of different image registration algorithms for T2W and DWI images ( Tesla Siemens MRI simulator. Characteristics of the imaging sequences are shown in Table 2. For each image set (T2W image and DWI image), ground truth segmentations for the left and right submandibular glands, left and right parotid glands, cervical spinal cord, brainstem, and primary gross tumor volume were manually generated by a trained physician expert (radiologist with > 5 years of experience in HNC). In addition, in a subset of five cases, segmentations for all structures in both sequences were manually generated by three additional separate observers (two physicians and one medical student) for interobserver variability analysis. All  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint source [18]). A total of six image registration methods were investigated in this study: ADMIRE, ADMIRE_Skiplinear, Velocity, Velocity_Skiplinear, B-spline, and Affine. From ADMIRE and Velocity AI, we used a rigid image registration followed by a DIR (ADMIRE, Velocity) and a DIR alone (ADMIRE_Skiplinear, Velocity_Skiplinear). The ADMIRE DIR algorithms utilize an atlasbased approach with head pose correction, dense mutual-information, and final refinement using a deformable surface model [19]. From 3D Slicer, we used a B-spline registration (Bspline)a nonlinear parametric model commonly used for DIR [20] and an Affine registration (Affine). For all cases, the DWI image was used as the fixed image, and the T2W image was used as the moving image. As a control comparator for all cases, we also analyzed the raw images with no post-acquisition registration applied, i.e., an implicit rigid registration (Rigid). After the registration process, the ground truth segmentations from the T2W images were propagated to the corresponding DWI images to generate propagated structures ( Figure   1A-H). These propagated structures were then compared to the ground truth structures on the DWI image in the subsequent analysis. Before the analysis, all images and structure files were transformed into Neuroimaging Informatics Technology Initiative format in 3D Slicer. Moreover, to maintain adequate comparisons between structures generated on T2W images and DWI images, all images and structures were cropped to the image with the smaller field of view, e.g., DWI image. Finally, because there were small variations in the inferior and superior slices of the cervical spinal cord and brainstem structures between image sets, we cropped these structures so that the heights of the propagated segmentation and ground truth segmentation were equal ( Figure 1I). We utilized a similar process for the interobserver variability analysis, in which segmentations were propagated from T2W to DWI images and then compared to ground truth segmentations generated by each individual observer.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint Stockholm, Sweden). After performing a Shapiro-Wilk test [23], we found that our data were not normally distributed (p<0.05). Therefore, we used nonparametric statistical tests for our analysis. For each metric and each structure, we compared registration algorithms against the implicit rigid registration using one-sided Wilcoxon signed-rank tests (alternative hypothesis of greater than the null hypothesis for DSC and S-DSC and alternative hypothesis of less than the null hypothesis for FP-DSC, FN-DSC, 95% HD, and MSD) with Bonferroni adjustments for multiple comparisons [24]. Similarly, we pooled metrics for OARs for sub-analysis and performed pair-wise analysis using previously described Wilcoxon signed-rank tests with Bonferroni corrections. For interobserver variability analysis of registration algorithms, we implemented a Kruskal-Wallis one-way analysis of variance test [25] for all four observers across all structures and evaluation metrics. For all statistical analyses, p-values less than 0.05 were considered significant. All statistical analyses were performed in Python v.3.7 [26]. Slicer based methods had similar results those of the implicit rigid registration (Figure 2).

Comparison of Registration
Specifically, the ADMIRE_Skiplinear method offered the best overall performance, with significant improvements (p<0.05 on one sided Wilcoxon signed rank test) in 5/6 metrics for the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint left submandibular gland, 2/6 metrics for the right submandibular gland, 5/6 metrics for the left parotid gland, 4/6 metrics for the right parotid gland, 4/6 metrics for the spinal cord, 1/6 metrics for the brainstem, and 3/6 metrics for the tumor (Figure 3). When metrics were pooled across structures, similar trends emerged where the ADMIRE and ADMIRE_Skiplinear methods demonstrated the best performance compared to the other methods, with DSC gains over the implicit rigid registration of up to 0.04 in the OARs and tumor (Table 3)   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

Interobserver Variability Analysis:
We performed an interobserver variability analysis to determine if there were any significant differences between observers for a given registration method. Metric value comparisons between all observers were non-significant for all structures ( Figure 5); therefore, our study had no major interobserver variability.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
In this study, we systematically analyzed a variety of registration algorithms and compared them to implicitly registered images from multisequence MRI acquisitions for image-guided treatment applications. Our results highlight that specific registration algorithms can improve upon implicitly registered image registration quality, as shown by measuring the similarity of propagated ground truth segmentations from T2W images to DWI images compared to ground truth segmentations on DWI images (Figures 2 and 3).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint The best overall results were obtained using the ADMIRE software's deformable registration algorithm ADMIRE_Skiplinear (Table 3), with most metrics and structure combinations having better performance than the implicit rigid registration (Figure 3). Interestingly, the application of an additional rigid registration before the ADMIRE algorithm (ADMIRE) often decreased performance (Figure 2), though not significantly (Figure 4). This minorly decreased performance may be secondary to slight differences in alignment imposed by the added rigid registration step. While we tested other deformable methods (i.e., Velocity and 3D Slicer), with the exception of a few outliers, they did not demonstrate significantly improved performance for most metric and structure combinations when compared to the implicit rigid registration.
Moreover, the Velocity based methods were often worse than the implicit rigid registration (Figure 4), which may be due to the DIR algorithmic implementation being unable to accommodate large variations in intensity domains of the T2W and DWI images. Importantly, almost all structures of interest individually and on pooled analysis showed increased DSC for the ADMIRE based methods (Table 3). This result indicates that the ADMIRE based methods provide significantly improved volumetric overlap, which may warrant their use during intraacquisition MRI sequences for MR-guided treatment. It is worth noting the spinal cord is especially sensitive to distortion-causing artifacts [27], making it a particularly challenging structure to co-register adequately. While the general performance for the spinal cord was lower than that of other structures, the ADMIRE based methods were still able to offer significantly improved performance compared to the implicit rigid registration (Figures 2 and 3); cases with lower performance tended to have a larger degree of spinal curvature than cases with higher performance (Appendix A). Therefore, while the ADMIRE based methods should still be preferred over implicit rigid registration, special caution should be used in quality assurance of these algorithms when used for spinal cord segmentations. Notably, all estimated metrics between any registration algorithm and the implicit rigid registration showed no significant . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint differences using segmentations generated by different observers (Figure 5), indicating that our data are not confounded by interobserver variability.
While several previous studies have investigated the relative performance of registration algorithms in various anatomical sites [28][29][30], there is a general lack of investigations of head and neck imaging. However, a few recent important studies have investigated registration quality assessment in head and neck imaging using radiotherapy structure analysis similar to our current study [16,17]. For example, Mohamed et al. [16] investigated the registration quality of diagnostic CT to simulation CT in HNC where images were acquired at different time points and with different scan settings and found that certain DIR methods demonstrated improved performance over a control group (rigid registration) for OAR and target conformance for most comparison metrics, similar to our study. Oppositely, Kiser et al. [17] showed that for CT and T2W MRI scans acquired with standard treatment immobilization techniques, MRI to CT DIR was not superior to rigid registration, with neither technique producing clinically satisfactory results (DSCs of 0.62 -0.65). Importantly, the ADMIRE algorithms investigated in our study produce potentially clinically meaningful results as we observe significant performance gains across various structures that may impact MR-guided treatments.  [32]. This result was further echoed in Eriksson et al., which confirmed that fast elastic image registration was the best technique for T1-weighted to T2W anatomic sequence registration [33]. Our results are consistent with these observations that selecting appropriate deformable techniques offers significantly improved performance for intra-acquisition registration.
There are several limitations to our study. We limited our analysis of intra-acquisition registration techniques in MRI to T2W and DWI sequences since these are the most germane to current MR-guided RT applications. However, several additional sequences can be studied to investigate these phenomena. For example, a study by Chen et al. in brain imaging investigated the best DWI-derived scalar images for T1-weighted to DWI image co-registration [34]. The authors discovered that anisotropic power images provide the most consistent registration compared to all other images, with b0 images offering average performance. In this study, we only tested b0 images, which were readily available and common for DWI workflows in HNC.
Future iterations of this study should investigate other DWI-derived scalar images. Additionally, we have investigated a few select volumetric overlap and surface distance metrics since these are the most ubiquitous metrics used in evaluating segmentation quality for RT applications [35].
However, dosimetric studies on OARs and target volumes may be warranted to determine the ultimate impact of registration quality on a RT plan. Finally, we have limited our analysis to intraacquisition images collected during the same image acquisition session. However, for MRguided RT applications, registration techniques are also relevant for images taken at different time points. Therefore, future studies should investigate these registration techniques applied to different imaging time points in an MR-guided RT workflow.

Conclusions
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021. 12.13.21267735 doi: medRxiv preprint In summary, this is the first study to investigate intra-acquisition MRI registration quality in HNC patients. We identify a deformable registration technique from the ADMIRE software package that offers the most significant gains in registration quality for T2W to DWI image registration compared to other methods. Our results are a crucial first step towards registration quality assurance for MR-guided treatment approaches that implement multi-sequence acquisitions combining anatomical and functional imaging.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 14, 2021. ; https://doi.org/10.1101/2021.12.13.21267735 doi: medRxiv preprint