Reproducibility of three-dimensional landmarking and Frankfort horizontal plane construction: a comparison of conventional and novel landmarks in presurgical computed tomography scans

Objectives To assess manual landmarking repeatability and reproducibility (R&R) of a set of three-dimensional (3D) landmarks and to evaluate R&R of vertical cephalometric measurements using two Frankfort Horizontal (FH) planes as references for horizontal 3D imaging reorientation. Methods Thirty-three landmarks, divided into "conventional", "foraminal" and "dental", were manually located twice by 3 experienced operators on 20 computed tomography (CT) scans of orthognathic surgery patients. R&R of the landmark localization were computed according to the ISO 5725 standard. These landmarks were then used to construct 2 FH planes: a conventional FH plane (orbitale left, porion right and left) and a newly proposed FH plane (midinternal acoustic foramen, orbitale right and left). R&R of vertical cephalometric measurements were computed using these 2 FH planes as horizontal references for CT reorientation. Results Landmarks showing a 95% confidence interval (CI) of repeatability and/or reproducibility > 2mm were found exclusively in the "conventional" landmarks group. Vertical measurements showed excellent R&R (95% CI < 1mm) with either FH plane as horizontal reference. However, the 2 FH planes were not found to be parallel (absolute angular difference of 2.41{degrees}, SD 1.27{degrees}). The average time needed to landmark one CT scan was 14 {+/-} 3 minutes. Conclusions The "dental" and "foraminal" landmarks tended to be more reliable than the "conventional" landmarks. Despite the poor overall reliability of the landmarks orbitale and porion, the construction of the conventional FH plane using 3 landmarks provided a reliable horizontal reference for 3D craniofacial CT scan reorientation.


Introduction
Diagnosis and planning of orthodontic and maxillofacial treatments rely heavily on X-ray imaging. Twodimensional (2D) X-rays are routinely used but result in a flattening of three-dimensional (3D) craniofacial structures. In some clinical cases of dentofacial deformities -especially patients undergoing surgical orthodontic treatments (orthognathic surgery) -Computed Tomography (CT) or Cone Beam CT (CBCT) scans are useful (1). For example, 3D imaging makes it possible to assess complex asymmetry and to obtain highly accurate orthognathic surgery planning that can subsequently be used for the manufacturing of surgical guides (1)(2)(3)(4)(5). The diagnostic value of these scans would increase if they could be used to perform 3D cephalometric analysis, which would require the localization of landmarks (6). At the time being, however, no set of 3D landmarks has been deemed sufficiently reproducible and repeatable for 3D cephalometry (6,7).
Most of the time, three-dimensional cephalometric landmarks previously tested in repeatability and reproducibility studies derived from classic 2D analysis (7). Some of these landmarks have been shown to be poorly reproducible in 3D, especially orbitale (Or), porion (Po), gonion (Go), condylion (Co) and ramus (Ra) (8)(9)(10)(11)(12)(13)(14)(15)(16)(17). The localization of midsagittal landmarks has generally been shown to be reliable, mostly in datasets of patients showing no asymmetries (6,7). Several authors suggested using "new" landmarks which cannot be localized on 2D X-rays. More specifically, landmarks located on the craniofacial foramens are presumably easy to identify and should provide good reproducibility (7,12,13). However, few studies have tested the reproducibility of the new landmarks, and their reliability has not been tested yet in the context of presurgical orthodontic patients (13,18).
The main goal of cephalometric landmarking is to measure distances and angles between landmarks and planes so as to obtain a cephalometric analysis. In order to provide clinically relevant measurements that can be decomposed in the 3 planes of space (i.e. anteroposterior, vertical and transversal), 3D images need to be reoriented in a generic coordinate system (19,20). The Frankfort Horizontal (FH) plane, used for standardizing and unifying the measurements (19)(20)(21), is the most commonly used horizontal reference for this coordinate system. Its 3D clinical value has been demonstrated for assessing craniofacial morphology and evaluating soft-tissue and skeletal cants (22,23). This plane is conventionally defined in 3D by the 3 following points: left orbitale (Or-L), right porion (Po-R) and left porion (Po-L) (21). Hence, this reference plane is based on landmarks that are known to be poorly reproducible in 3D, suggesting that the conventionally defined FH plane is poorly reproducible (18). However, landmark reproducibility does not necessarily result in plane reproducibility, as the latter depends on the direction of landmark errors (12). To our knowledge, no . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint study has yet tested the repeatability and reproducibility of vertical cephalometric measurements using the conventional FH plane (constructed from 3 landmarks) as a reference for horizontal head reorientation.
Looking for a new plane which would remain parallel with the conventional FH plane but be based on more reliable landmarks, Pittayapat et al. suggested a novel FH plane, in which the internal acoustic foramina (IAF) would replace Po (18). Results from experiments performed on CBCT scans of dry human skulls revealed that the localization of IAF provided better reproducibility than that of Po.
Moreover, the authors suggested that another new FH plane, based on mid-IAF, Or-R and Or-L, might replace the conventional FH plane, the angular difference found between the two planes being inferior to 1 degree. These results have not been validated yet on 3D scans of living human subjects.
In this context, using a dataset of preoperative CT scans, the aims of our study were: 1-to assess landmarking repeatability and reproducibility of a set of 33 landmarks containing "conventional", "foraminal" and "dental" landmarks; 2-to assess repeatability and reproducibility of vertical cephalometric measurements using either the conventional or the newly proposed FH planes as references for horizontal head reorientation; 3-to assess the parallelism between the conventional and the newly proposed FH planes.

Dataset
We performed a random selection of 20 CT scans (7 males, 13 females, mean age 25 ± 8 years) in a database of 134 consecutive orthognathic surgery patients (49 males, 85 females, mean age 27 ± 10 years) from a single Maxillofacial Surgery Department. We used a random number generator to obtain a random sequence of 20 numbers, which was used to select the sample of CT scans included in this study. Allocation was performed by one operator (#1) and supervised by a second operator (#2) at the beginning of the study. All selected subjects showed marked skeletal deformities: 14 skeletal class II (10 short faces, 4 long faces) and 6 skeletal class III (2 short faces, 4 long faces). Six subjects exhibited mandibular asymmetry (2 severe, 4 slight) and 2 subjects showed syndromic dentofacial deformities. A set of 5 random CT scans not included in this study was used for operator training prior to landmarking. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint The 20 CT scans were acquired on a Discovery CT750 HD scanner (GE Healthcare, Chicago, USA) set at 100kVp, 50mAs, exposure time 730ms, slice thickness 0.625mm, slice increment 0.320mm. Field of view ranged from 200 to 267mm and pixel size ranged from 0.39 to 0.52mm. Scans were not reoriented after their acquisition. Segmentation of the bones (upper skull, mandible) and upper/lower teeth was performed prior to the study according to an industry-certified semi-automatic process (Materialise, Leuven, Belgium). The study was approved by an Institutional Review Board (IRB No. CRM-2001-051).

Landmark annotation
The 33 landmarks ( Figure 1) were divided into 3 groups: "conventional" ( Table 1) "foraminal" ( Table 2) and "dental" (Table 3) Manual reorientation of the CT scans was performed based on the Frankfort Horizontal plane construction obtained from the annotation process. A calibration session was organized before the study began, and the instructions were repeated to the operators once more before the second annotation session. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint (MPR) views. The operators had neither access to each other's results nor to their first session's results when performing the second session. For each session and each CT scan, results were exported as an .xml file containing the x-, y-, z-coordinates of each landmark. The time needed for each CT scan annotation was recorded by the operators and exported in a spreadsheet.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint Or-L). The origin was set at mid-porion; the -x axis followed the sagittal plane (from right to left); they axis followed the frontal plane (from front to back); and the -z axis followed the axial plane (from toe to head) ( Figure 1). The landmarking results were then referenced in the new coordinate system before performing the statistical analysis.

Landmark repeatability and reproducibility
For each landmark, repeatability and reproducibility standard deviations (SD) were computed according to the ISO 5725 standard of the International Organization for Standardization (24). Upon initial inspection of the results, the standard's recommendations were followed for clear outlier points, whose annotations were considered as missing data. The reliability of each landmark in the -x, -y and -z directions was then estimated, considering a 95% confidence interval (CI) of 2*SD of repeatability and reproducibility. Modified Bland-Altman plots, showing the deviations of the landmark positions from their means for the 20 CT scans, were computed for each landmark and direction (25,26). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Repeatability and reproducibility of vertical measurements with the conventional FH plane and the newly proposed FH plane
The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint

Parallelism between conventional and newly proposed FH planes
In order to assess whether the conventional FH plane and the new FH plane were parallel, the orthogonal projections of points IAF-R and IAF-L were computed on the mean conventional FH plane (as defined previously) for each subject. We then computed the absolute angular differences between the conventional FH plane and the novel FH plane, using trigonometry to calculate the angles between the normals to the planes.

Time needed for landmark localization
Mean time needed and standard deviation for landmark localization were computed.

Landmark repeatability and reproducibility
Outliers were identified for mental foramen points right/left placed by operator #3 during the first annotation session (subjects 4 to 20) and were considered as missing data (Supplementary Materials 2). Repeatability and reproducibility results for the 33 landmarks are shown in Table 4 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint

Repeatability and reproducibility of conventional and newly proposed FH planes
The results of the repeatability and reproducibility analysis of vertical measurements of the landmarks using the 2 different FH planes as horizontal reference are shown in Table 5. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint

Parallelism between conventional and novel FH planes
When using the mean conventional FH plane as horizontal reference, the mean absolute vertical measurements ± SD of IAF-L and IAF-R were 2.68 ± 2.51mm and 2.78 ± 2.29mm, respectively.
Measurement results for each subject and each repetition are shown in Figure 3. The absolute angular difference between the conventional and the novel FH planes was 2.41° (SD 1.27°).

Figure 3: Vertical measurements of IAF left (on the left) and right (on the right) for each subject and
repetition, using the mean conventional FH plane as horizontal reference.

Time needed for landmark localization
The average time required to landmark one CT scan was 14:48 ± 03:45 minutes.

Discussion
The reliability of 3D cephalometric landmarking and Frankfort Horizontal plane construction is a recurrent clinical issue in orthodontics and orthognathic surgery planning. In this study, we performed a repeatability and reproducibility analysis of conventional and 3D-specific cephalometric landmarks using a database of 20 randomly selected routine presurgical CT scans.
The first aim of our study was to assess landmarking reliability in a set of 33 landmarks containing "conventional", "foraminal" and "dental" landmarks. As in previously published studies, we ranked the landmarks based on the 95% CI results: landmark with clinically acceptable error when the 95% CI was below 1mm; landmark useful in most analyses when the 95% CI was between 1 and 2mm (highlighted in orange in Table 4); landmark to be used with caution when the 95% CI was above 2mm (highlighted in red in Table 4) (12,14). Using this classification, all "dental" and "foraminal" landmarks showed a clinically acceptable error or were considered useful in most analyses (16O, 36O, IF-L, IAF-L, IAF-R). The is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint group of "conventional" landmarks showed several landmarks to use with caution (B point, gonion right and left (Go-R, Go-L), Or-R, Or-L, Po-R, Po-L). These findings are in line with previously published reproducibility studies, in which "conventional" landmarks resulting from 2D cephalometric analysis are subject to caution (8)(9)(10)(11)(12)(13)(14)(15)(16)(17). As shown in the Bland-Altman plots (Figure 2), "foraminal" landmarks IAF-L, IF-L and MF-L lead to better repeatability and reproducibility than "conventional" landmarks Or-L and Po-L. We chose not to perform statistical tests such as intraclass correlation coefficients or paired t-tests because of their proven inadequacy in measuring landmarking reliability (25,26). Outliers were only found in mental foramen landmarks. The definition of this specific "new" landmark was initially ill-understood by one of the operators, who located the landmark at the distal end of the foramen instead of the mesial end, as was agreed upon (Supplementary Materials 2). This illustrates the challenges encountered with landmark identification, which requires very precise definitions in order to be reliable (13). The fact that our dataset was made of presurgical CT scans did not appear to influence the results negatively when compared with non-surgical subjects in the literature (8,9,(11)(12)(13)(14)(15)(16).
Our second objective was to assess the repeatability and reproducibility of vertical 3D cephalometric measurements using either the conventional FH plane or the newly proposed FH plane as reference for horizontal head reorientation. The measurements showed excellent repeatability and reproducibility (95% CI<1mm), using either FH plane as reference. These results tend to prove that despite the poor reliability of the landmarks used to construct them, both planes can be used as reliable horizontal references for head reorientation. An explanation could be that the poor reproducibility of Or, Po and IAF points mainly concerns the -x and -y coordinates. Our results show that a simple method using only 3 landmarks can provide a reliable horizontal reference for 3D head reorientation. Other methods, such as manual reorientation of the 3D model along the FH plane (19) or computation of additional semi-automated landmarks (20) have been shown to be reliable, but are more complex in terms of implementation.
Our third objective was to assess whether the conventional FH plane and the "new" FH plane were parallel. Our results regarding the vertical positions of IAF-L and IAF-R to the conventional FH plane, as well as the angular differences between the 2 planes, show that the planes are not parallel for most subjects. The vertical distance of IAF-L and IAF-R to the conventional FH plane showed significant variations in our cohort (Figure 3). This tends to invalidate the relevance of this "new" is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint to facilitate the comparison between our data and Pittayapat et al.'s, the angular distances between the conventional and all newly proposed FH planes are provided in Supplementary Materials 3. The discrepancies between the two studies might be explained by a different definition of the IAF-R and IAF-L points. We tried our best to follow the instructions given by the aforementioned authors to define these points (Supplementary Materials 1), but a more precise anatomical description might be needed.
The average duration of CT scan landmarking confirms that this is a time-consuming procedure, This study has two main limitations. Firstly, it was performed on CT scans instead of CBCT scans, which are much more commonly used for 3D cephalometric studies. We made this choice because CT scanning is the imaging modality currently used for orthognathic surgery planning in our department. We chose to include only presurgical patients in this study because they display a variety of significant craniofacial deformities for which 3D cephalometry could be very beneficial for in-depth evaluation (1,28). Secondly, the placement of most of our landmarks was carried out on the CT scans' 3D surface models, using the MPR views for adjustments and verifications. As has been reported previously, while the use of 3D surface models make the annotation process easier and more robust, it implies prior segmentation of the CT scans (11). Performing the segmentation process manually is tedious and time-consuming, but full automatization of the task, an active and promising research field, could resolve this problem (29). Given that most of the annotations were made on 3D surface models, we hypothesize that using CBCT scans instead would yield similar results.
Overall, our repeatability and reproducibility study on CT scans showed that "dental" and "foraminal" 3D landmarks tended to be more reliable than "conventional" cephalometric 3D landmarks in presurgical patients. Despite the poor overall reliability of the landmarks orbitale and porion in 3D, the conventional FH plane is a reliable horizontal reference for head reorientation and vertical measurements. The new FH plane, using IAF instead of porion, provided a reliable horizontal reference but was not found to be parallel to the conventional FH plane. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint       Table 3: Definition of "dental" landmarks localized in our study (FDI World Dental Federation notation for teeth numbering) Table 4: 95% confidence interval (2*SD) of repeatability and reproducibility of the landmarks (mm), following the ISO 5725 standard. Values between 1 and 2mm are highlighted in orange, and values superior to 2mm are highlighted in red. Repet., repeatability; Repro., reproducibility; SD, standard deviation; L/R: Left/Right.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 7, 2021. ; https://doi.org/10.1101/2021.08.05.21261623 doi: medRxiv preprint