Digital re-classification of equivocal dysplastic urothelial lesions using morphologic and immunohistologic analysis

A precise diagnostic of precursor dysplastic urothelial lesions is critical for patients but it can be a challenge for pathologists. Multiple immunohistologic markers (panel) improve ambiguous diagnostics but results are subjective, with a high degree of observational variability. Our research objective was to evaluate how a classification algorithm may help morphology diagnostic. Data coming from 45 unequivocal cases of flat urothelial lesions (the training set: 20 carcinomas in situ, 8 dysplastic and 17 reactive lesions) were used as ground truth in training a random tree classification algorithm. 50 atypia of unknown significance diagnostics (diagnostic set) were digitally re-classified based on morphological and immunohistochemical features as possible carcinoma in situ (20), dysplastic (17) and reactive atypia cases (13). The main sorting criterium was morphologic (nuclear area). A four-markers panel was used for a precise classification (74% correctly classified, 93% accuracy, 76% precision, averaged ROC=0.828). 3 cases were false negative. The performance of the immunohistologic panel was evaluated based on a stain index, calculated for CD20, p53, Ki67 and observed for CD44. Within training set, the immunohistologic performance was high. In the diagnostic set both the percentage of high stain index for each marker and the percentage of cases with 2-3 strong markers were low, explaining the initial high number of equivocal cases. In conclusion, digital analysis of morphologic and immunohistologic features may bring clarification in classification of equivocal urothelial lesions. Computational pathology supports diagnostic process as it can measure features and handle data in a precise, reproducible and objective way. In our proof of concept study, a low number of cases and the (deliberate) absence of clinical data were main limitations. Validation of the method on a high number of cases, use of genomics and clinical data are essential for improving the reliability of machine learning classification


Introduction
A precise diagnostic of flat urinary bladder lesions, critical for the treatment and prognosis of patients, can be difficult even for experienced uropathologists [1]. The most recent and widely (RH) and flat hyperplasia as part of a diagnostic continuum [2]. It is recommended to separate these diagnostic entities based on morphologic features [3] but a clear distinction is not always easy as bioptic tissue is scarce or fragmented, the normal urothelium histology is complex and important morphological features might be shared between normal, reactive and malign tissue [4]. CIS is described as a high-grade flat intraepithelial lesion composed of malignant cells with large hyper-chromatic nuclei (5-6x larger than a lymphocyte), serious pleomorphism and frequent mitoses, having a high propensity of developing muscle invasive cancers [5]. DL is tissue with recognizable pre-neoplastic cytological and architectural changes that are under the threshold of diagnostic of CIS [6]. RH shows smaller nuclei (2-3x the diameter of a lymphocyte nucleus), organized tissue structure, less nucleolar persistence [7] and is often associated with benign urologic pathology. A new diagnostic category, "Atypia of Unknown Significance" (AUS), sometimes named "urothelial proliferation of uncertain malignant potential" was recently added in an attempt to nest cases where a clear differentiation is difficult. AUS is still a subject of controversy and some authors recommend to avoid it as there is no robust evidence that patient outcomes are different from reactive atypia [8,9]. Because of overlapping morphology, a wide degree of subjectivity and inter-observational variability was reported mainly in equivocal cases [10,11]. Immunohistochemistry (IHC), a histology technique that allows a precise detection of specific proteins based on the antibody-antigen reaction, may improve precision when morphologic characteristics are not enough for a clear diagnostic, when the history of the disease is unknown or there is an unusual morphologic presentation [12]. After 20 years of experience, caution is now suggested in IHC interpretation of urothelial lesions [13] as normal tissue may share some degree of staining patterns and there is no single specific/sensitive marker that can make a difference between diagnostics. A way to circumvent the IHC ambiguities is provided by the concomitant use (panels) of several IHC markers [3]. Concomitant use of 4 markers (IHC panel composed of CK20, CD44, p53, and KI-67) demonstrated a clear adjuvant value to morphology [14][15][16]. CD20 (cytoplasmatic) is widely recognized as the most reliable CIS associated marker [17]. KI67 (nuclear) is seen as an aggressiveness and prognostic indicator [18], usually evaluat-ed in connection with CD20 [19]. p53 (nuclear) is present in more than 50% of CIS cases and is often evaluated in conjunction with CD20 as well [20]. CD44 (membranous) is considered an exclusion marker [21,22]. Even with the help of IHC panels, a precise diagnostic is not always possible and many cases are still classified as AUS.
Computational pathology (CP) help improving the histologic exam accuracy [23]. Digital image analysis (DIA) can assess any type of microscopic image, including IHC [24]. DIA consists of image data acquisition and ground truth generation, image analysis (object detection, segmentation and recognition) and finally statistical evaluation of digital data [25]. When data is complex or equivocal, machine learning (ML) techniques can use mathematical models (algorithms) for statistical analysis. In the case of flat urothelial lesions diagnostic, CP can use an algorithm to analyze DIA data and generate associations in an objective, measurable and reproducible way.
Our research objective was to evaluate how a decision tree algorithm can reclassify the AUS diagnostics after a supervised learning instruction process [26], using both morphology and IHC (4-panel) data.

Material and methods
In this retrospective non-interventional analysis, precursor urothelial diagnostics found over a period of 3 years were re-evaluated by the principal investigator. All cases had a similar histology evaluation (Hematoxylin eosin and panel IHC) and were processed and stained using the same methodology (details concerning IHC characteristics and technique are provided in annex 1). Aside demographic info (sex and age), no other clinical information was available for this study.
Based on initial diagnostic classification, 2 datasets were created. An unequivocal "training" dataset (CIS, DL and RL) and equivocal "diagnostic" dataset (AUS). Images from both datasets were digitally analyzed in 2 steps: Data gathering (microscopic image capture, image segmentation, features measurement) was followed by data processing (data analysis, algorithm training and re-classification).

Data gathering
Microscopic images were captured using standard microscopy (objective x20 and x40, camera 12MP) and stored on a desktop computer.
1. For morphology analysis, x20 images were manually segmented [27,28] [29]. Binary images were used for digital measurement of nuclear area surface and roundness (length over width ratio) (image 1). The nuclear differentiation area threshold was based on previously published data [30,31]. Structural changes of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206524 doi: medRxiv preprint layers and nuclear atypia were evaluated using a semiquantitative scale (yes, no), by direct observation on binary images.
2. IHC markers evaluation was performed on x40 segmented images, in 2 steps. First, cytoplasmatic and nuclear markers images were segmented (3

or 4 labels). A Region of
Interest (ROI) was delineated by human pathologist in an area of maximal stain intensity.
Using a dedicated plugin (IHC profiler), the total stained surface was measured in terms of pixel intensity of DAB stain (image 3) [33]. Finally, an IHC stain index (SI) was calculated for each IHC marker (adding only high positive and positive pixels and multiplying the sum by ODI value).
3. CD44, as membranous marker, was measured semi-quantitatively (positive, absent or patchy). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020.

Data processing
A random decision tree classification algorithm was used for re-classification of all AUS cases.
The "training" dataset (including all cases with a clear diagnostic of CIS, RH, DL and 5 cases of normal urothelium -50 instances) was considered ground truth and was used for algorithm training. The "diagnostic" dataset included all cases of equivocal atypia discovered in the clinic in the same period of time (50 instances).
As both datasets were small (100 instances, 16 attributes each) we opted for an open source machine learning toolkit (WEKA) software solution that is widely accepted in bioinformatics, has a good graphical interface and requires no programming [34]. A random tree classification algorithm was selected over the classic "c4.5" as graphical representation of the decision is as intuitive but cross validating classification results were better. Datasets attributes, quantification methodology and cross-validation results are detailed in annex 1. The structure of the random tree is presented in image 4.
Based on intensity of stain for each marker, we recorded the percentage of positive cases in each group (SI threshold for CD20 and p53 > 50% and for Ki67 >15%). We also counted the concomitant presence of strong stain for each marker in neoplastic against non-neoplastic groups in both training and diagnostic sets. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Statistics.
All data was stored in a standard spreadsheet. Numerical data (age, nuclear diameter and nuclear area) was analyzed using MedCalc Statistical software (version 19.0.7, MedCalc Software bvba, Ostend, Belgium). Basic statistic results were expressed as mean ± standard deviation.
Differences between means were tested for significance with a p-value set at p<0.05.

Ethics.
Patients provided written, informed consent before surgery specifically agreeing with the processing and analyze of the pathology urothelium samples. The retrospective, non interventional IHC study protocol was approved by the hospital IRB (1123/2020).  Table 1 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. .  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. .

Discussion
Urothelial carcinoma is the 10th most common and the second most frequent genitourinary malignancy, worldwide [35]. As urothelial carcinoma has a high recurrence rate, an early, precise differentiation of precursor neoplasms from benign reactive atypia is critical [36]. Because many flat lesions morphologic and IHC features may overlap or are shared by normal urothelium, the diagnostic is sometimes equivocal with a possible increase of medico-legal risk [37].
Our research objective was to evaluate how a machine learning algorithm can reclassify ambiguous precursor flat lesions diagnostics that were previously included under the AUS diagnostic category. A random decision tree classification algorithm was trained using data from all un- In the classification process, as expected, the nuclear area played a central role in differentiating precursor malign from benign diagnostics. Further, layer preservation, circularity and IHC CD44 0%y/0%pt/100%n 0%y/11%pt/89%n 0%y/46%pt/54%n . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. . calculated index were used at different levels of classification. Reported differences between the nuclear area measured in diagnostic set were by 9.7% smaller than in the training set (p=0. 35 For HL patients, results were negative for all inclusion markers except Ki67 in 5 cases. All cases in the diagnostic dataset were CD44 negative (13 cases were considered "patchy"). Table 4 Table 4 IHC panel performance based on staining intensity for different diagnostic groups in training and diagnostic datasets (after ML re-classification) Overall, the random tree algorithm classification had a good performance (74% correctly classified, 93% accuracy, 76% precision) when measured against the ground truth. The receiver operating characteristics (ROC) for CIS was 0.817, for DL was 0.741 and for HL 0.866. Important, only 3 cases were "false negative" (one CIS and one DL cases were classified as HL and one HL as DL) (confusion matrix . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. . Table 3   Table 3. Confusion matrix, Random tree classification, training data set, (*) false negative cases meaning malign cases classified as benign or benign as malign. Mis-classification within same classes were not penalized (CIS as DL or HL as normal) In conclusion, urothelial flat lesion classification can be a diagnostic challenge on bioptic material. Usually, a distinction between malign and benign cases is made using morphologic features and IHC is used mainly for confirmatory reasons. Both diagnostic methods are subjective and may have inter and intra observational variability and may generate ambiguous diagnostics. Our objective was to evaluate how CP can be used to circumvent flat lesions diagnostics uncertainty.
Digital data from 45 unequivocal urothelial flat lesions data were used for training a ML algorithm (random tree). Main criteria for algorithm classification were morphological and a calculated IHC stain index coming from a classical panel of 4 IHC markers. The performance of the classification algorithm was good: 93% accuracy and 76% precision, averaged ROC 0.828.
Only 3 cases were classified as "false negative". Based on this "ground truth", 50 equivocal diagnostics (AUS) were reclassified as 40% CIS, 34% DL and 26% HL. The IHC panel performance in classification was finally judged based on each of the 4 markers SI: in the training set, the panel performed well but in the diagnostic set, the performance was overall low. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. . 37. Troxel DB., Trends in Pathology Malpractice claims, Am J Surg Path 2012 e1-e5 Annex Table 1a   Table 1a.

IHC Markers Characteristics
Histologic staining was performed on 4-µm-thick sections from formalin-fixed, paraffin-embedded tissue blocks. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206524 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. No AUS data was provided for training in order to avoid the risk of overfitting. Neoplastic cases (CIS and DL) were considered false negative only if classified as benign. 7% of neoplastic and 5.8% of benign cases were falsely classified as negative.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.04.20206524 doi: medRxiv preprint