Artificial Intelligence Neural Network Consistently Interprets Lung Ultrasound Artifacts in Hospitalized Patients: A Prospective Observational Study

Background: Interpretation of lung ultrasound artifacts by clinicians can be inconsistent. Artificial intelligence (AI) may perform this task more consistently. Research Question Can AI characterize lung ultrasound artifacts similarly to humans, and can AI interpretation be corroborated by clinical data? Study Design and Methods: Lung sonograms (n=665) from a convenience sample of 172 subjects were prospectively obtained using a pre-specified protocol and matched to clinical and radiographic data. Three investigators scored sonograms for A-lines and B-lines. AI was trained using 142 subjects and then tested on a separate dataset of 30 patients. Three radiologists scored similar anatomic regions of contemporary radiographs for interstitial and alveolar infiltrates to corroborate sonographic findings. The ratio of oxyhemoglobin saturation:fraction of inspired oxygen (S/F) was also used for comparison. The primary outcome was the intraclass correlation coefficient (ICC) between the median investigator scoring of artifacts and AI interpretation. Results: In the test set, the correlation between the median investigator score and the AI score was moderate to good for A lines (ICC 0.73, 95% CI [0.53-0.89]), and moderate for B lines (ICC 0.66, 95% CI [0.55-0.75]). The degree of variability between the AI score and the median investigator score for each video was similar to the variability between each investigators score and the median score. The correlation among radiologists was moderate (ICC 0.59, 95% CI [0.52-0.82]) for interstitial infiltrates and poor for alveolar infiltrates (ICC 0.33, 95% CI [0.07-0.58]). There was a statistically significant correlation between AI scored B-lines and the degree of interstitial opacities for five of six lung zones. Neither AI nor human-scored artifacts were consistently associated with S/F. Interpretation: Using a limited dataset, we showed that AI can interpret lung ultrasound A-lines and B-lines in a fashion that could be clinically useful.

hypothesized that an AI network could be trained to decode lung ultrasound artifacts in a fashion similarly 106 to humans. To test this hypothesis we prospectively enrolled patients in a study investigating the 107 correlations among human and AI scoring of lung US. We corroborated AI sonographic findings with 108 selected clinical and radiographic variables. The specific reverberation artifacts chosen were A-lines and B-lines.

125
Patients were scanned with a point-of-care ultrasound system (X-Porte, Fujifilm Sonosite, Bothell, WA), 126 using a linear array probe (HFL38xp/13-6 MHz) and the following presets: depth 6 cm, near field gain 127 0%, far field gain 100%, mechanical index 0.5, tissue index 0.2, tissue harmonics off. Patient's were . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  (Table 1). were scored as being present (had probability scores greater than 50%), the bold A-line option was 183 chosen. In another example, if "few" and "many/coalescing" B-lines were scored as being present, the 184 clip was scored as having "many/coalescent" B-lines. This determination was made a priori, and is 185 consistent with clinician scoring. It was not anticipated prior to data interpretation that there would be 186 clips where AI was unable to identify a B-line pattern with a probability greater than 50%. In these . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 5, 2023. ; https://doi.org/10.1101/2023.03.02.23286687 doi: medRxiv preprint instances we chose the B-line pattern with the highest probability, even if that probability was less than

190
Clinical Data

191
In the test set, the following demographic and clinical descriptors were obtained at the time of each exam: diagnosis for specific conditions is included in the online supplement(S1 Table).

198
Statistical Analysis

199
The primary outcome was the intraclass correlation coefficient (ICC) between the median investigator . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. years; the majority were admitted to an ICU (40%); and there was a relatively equal mix of men (53%) 217 and women (47%). The most common diagnoses were decompensated heart failure (n=7) and COVID-19 218 pneumonia (n=7), with bacterial pneumonia, chronic obstructive lung disease (COPD) exacerbations, 219 pleural effusion, and interstitial lung disease making up a minority of diagnoses (Table 3).
220 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 5, 2023.   Table). Although many clinicians use radiology to inform . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 5, 2023. ; https://doi.org/10.1101/2023.03.02.23286687 doi: medRxiv preprint patient care of respiratory diseases, the ICC among radiologist scoring of both interstitial and alveolar

255
infiltrates was moderate to poor (Table 7). association was found between the density of interstitial opacities and the strength of A-lines as scored by . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 5, 2023. ; https://doi.org/10.1101/2023.03.02.23286687 doi: medRxiv preprint either AI or investigators using an ANOVA (S6,7 Tables). Box plots of B-lines versus interstitial denser interstitial markings on chest radiographs (Figure 3).  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

297
Human sonographers show significant variability in the scoring of lung ultrasound artifacts. In spite of 298 this unwanted scoring heterogeneity, point-of-care ultrasound is commonly used to inform patient care 299 decisions. We also observed variability in our human scoring of lung ultrasound artifacts, furthermore the 300 degree of variability was in line with existing evidence on this topic[7,18].

302
Unlike human scoring, a fully trained AI network holds the promise of yielding highly reproducible 303 results. In the present study, we observed a moderate correlation between AI and investigator 304 interpretation of A-lines, indicating that AI interpreted clips similarly to investigators for this artifact.

305
There was a weaker correlation between AI and investigator scoring of B-lines, although the degree of

315
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 5, 2023. ; https://doi.org/10.1101/2023.03.02.23286687 doi: medRxiv preprint the test set as having the highest B-line severity (3 or "white lung"), perhaps because too few clips of this 320 severity were included in the training set. However, ultrasound clips scored by clinicians as having the 321 highest severity B-line score had wide confidence intervals, which may indicate that this ultrasound 322 finding is not a reliable indicator of worsening interstitial disease.

324
It was less clear how alveolar opacities on chest radiographs would correlate with lung ultrasound 325 interpretation. In this dataset, the degree of alveolar opacification, as adjudicated by radiologists, was 326 inversely correlated with the boldness of A-lines as interpreted by AI in three of six lung zones.

327
Somewhat surprisingly, the B-line artifact was not a reliable predictor of alveolar infiltrates on chest 328 radiographs (S12,13 Tables). . These studies often do not attempt to analyze AI artifact identification beyond its 335 similarity to human interpretation [23].

337
There are two novel aspects to the present study that extend previous observations. First, we tasked AI 338 with characterizing more artifacts in more detail than previous studies. Second, we matched AI ultrasound 339 artifact interpretations not only to human interpretation of the same sonogram, but also to that of an . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 5, 2023. represented with clips taken at the first and fourth BLUE points as opposed to clips more caudally located 360 on the thorax, which may have contributed to the lower reliability in AI rating at these anatomic locations.

361
Fifth, only two artifacts were measured in this study, and some patients' pathology cannot be 362 characterized using these artifacts alone, such as those with pleural effusions. Sixth, we recognize that in 363 very obese patients, there may be more three centimeters of soft tissue between the skin and the pleural 364 line which might have limited our ability to detect A-line artifacts. In this uncommon occurrence, we 365 attempted to compress the probe against the skin until the skin to pleural distance spanned less than 3 cm.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 5, 2023. ; https://doi.org/10.1101/2023.03.02.23286687 doi: medRxiv preprint the accuracy of the data analysis. TF, LH, VP, GG, DS, FB, RD, HT, PL, DM, KZ, and BD acquired data.

394
TF, GG, AK, JG, and BD contributed to the conception and design of the study. TF and GG performed 395 statistical analyses. All authors participated in the interpretation of the data, provided critical feedback 396 and final approval for submission, and took responsibility for the accuracy, completeness, and protocol 397 adherence of data and analyses. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 5, 2023. ; https://doi.org/10.1101/2023.03.02.23286687 doi: medRxiv preprint