Diagnosing Influenza Infection from Pharyngeal Images using Deep Learning: Machine Learning Approach

Background: Influenza is a major global burden of disease, causing annual epidemics and occasionally, pandemics. Given that influenza primarily infects the upper respiratory system, influenza infection may be able to be diagnosed by applying deep learning to pharyngeal images. Objective: We aimed to develop a deep learning model to diagnose influenza infection using the data on pharyngeal images and clinical information. Methods: We recruited patients who visited clinics and hospitals due to influenza-like symptoms. In the training stage, we developed a diagnostic prediction artificial intelligence (AI) model based on deep learning to predict polymerase chain reaction (PCR)-confirmed influenza from pharyngeal images and clinical information. In the validation stage, we assessed the diagnostic performance of the AI model. In the additional analysis, we compared the diagnostic performance of the AI model with that of three physicians, and also interpreted the AI model using the importance heatmaps. Results: A total of 7,831 patients were enrolled at 64 hospitals between Nov 1, 2019 and Jan 21, 2020 in the training stage, and 659 patients (including 196 patients with PCR-confirmed influenza) at 11 hospitals between Jan 25, 2020 and Mar 13, 2020 in the validation stage. The area under the receiver operating characteristic curve of the AI model was 0.90 (95% confidence interval, 0.87-0.93), and its sensitivity and specificity were 76% (70-82%) and 88% (85-91%), respectively, outperforming three physicians. In the importance heatmaps, the AI model often focused on follicles on the posterior pharyngeal wall. Conclusions: We developed the first AI model that can accurately diagnose influenza from pharyngeal images, which has the potential to assist physicians make timely diagnosis.


Introduction
According to the Global Burden of Disease Study 2016, influenza is a major global burden of disease estimated to be the cause of 39.1 million acute lower respiratory infection episodes and 58,200 deaths [1]. It is estimated that influenza is responsible for 291,243 to 645,832 seasonal respiratory deaths (4.0-8.8 per 100,000 individuals) occur annually [2]. The timely and accurate diagnosis of influenza has the potential to prevent wide transmission of virus within the population and subsequent epidemic and pandemic, as well as the unnecessary prescription of antibiotics in primary care, which is a cause of emerging antibiotic-resistant bacteria. Moreover, early interventions such as hydration and antiviral drugs are expected to reduce the mortality risk among high-risk patients, including the elderly and individuals with comorbidities.
Although there are various tests currently used to diagnose influenza infection, the COVID-19 pandemic and the surge in the use of telemedicine highlighted the importance of accurately diagnosing it without increasing the risk of spreading virus through physical interactions. The gold-standard method of the diagnosis of influenza infection is the reversetranscription polymerase chain reaction (RT-PCR) of nasopharyngeal aspirates or swabs [3,4]; however, RT-PCR is not easily performed in primary care, and time to results could hamper the timely diagnosis and preventive/treatment interventions. A more commonly used test is the rapid immunochromatographic antigen detection tests, but compared with RT-PCR, their validity is modest and varies across studies [5,6]. Both of these tests cannot be performed through telemedicine, whereas the sensitivity and specificity of diagnosing influenza only with clinical information are suboptimal [7,8]. Given that more patients are diagnosed through telemedicine in recent years, an alternative test of influenza that could be conducted through telemedicine is warranted.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07. 19.22276126 doi: medRxiv preprint To address this important knowledge gap, we developed a deep learning model to diagnose influenza infection using the data on pharyngeal images and clinical information. We tested the performance of the diagnostic prediction artificial intelligence (AI) model using the data on the real-world patient population, and also compared it with the diagnostic performance of three physicians. We also investigated the regions of the pharynx on which the AI model focused to differentiate between individuals with and without influenza infection.

Pilot study to develop a medical camera to capture standardized pharyngeal images
For our pilot study, we recruited 4,765 patients aged 6-90 years with influenza-like symptoms who visited 37 clinics or hospitals between Nov 28, 2018 and Feb 4, 2019 (registered as jRCTs032180041). We developed a pharyngeal camera with a light emitting diode light source and a disposable clear cap to hold down the tongue of patients, to capture images of the pharynx in a standardized manner (Figure 1). In this pilot study, we adjusted the size of the pharyngeal camera and tongue depressors to be suitable for many patients. The device contains a full high-definition digital camera and is connected via Wi-Fi to a cloud service for the analysis of the pharyngeal images, together with clinical information. We also improved the image quality of the camera, such as resolution, brightness, and contrast, during this pilot study. We used a rapid continuous shooting function to obtain high-quality pharyngeal images in a short time by avoiding motion blur. The camera can capture an image every 0.3 seconds, and 30 sequential images are captured per shot.

Study design and participants
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2022 The current study included a training stage (registered as jRCTs032190120) and validation stage (registered as Pharmaceuticals and Medical Devices Agency clinical trial identification code AI-02-01). We enrolled patients with influenza-like symptoms who visited clinics or hospitals and satisfied the following inclusion and exclusion criteria at 64 hospitals between Nov 1, 2019 and Jan 21, 2020 in the training stage, and 11 hospitals between Jan 25, 2020 and Mar 13, 2020 in the validation stage. The list of study sites is shown in the Supplementary Table 1.
The inclusion criteria were (i) patients with written consent to participate in the study provided by themselves or their parent (if they were younger than 18 years), (ii) those aged six years or older, and (iii) those that satisfied at least one of the following four conditions in the training stage and at least two in the validation stage: (a) body temperature of 37.0°C or higher, (b) systematic influenza-like symptoms, such as joint pain, muscle pain, headache, tiredness, and appetite loss, (c) respiratory symptoms, such as cough, sore throat, and nasal discharge or congestion, and (d) an episode of close contact with patients with influenza or influenza-like symptoms within three days, or in any other scenario in which the consulting physician suspected influenza infection. The exclusion criteria included (i) those with fluctuating teeth, (ii) those with severe oral lesions, (iii) those with severe nausea, (iv) those with difficulty in opening the mouth sufficiently for the use of the camera (e.g., small mouth, temporomandibular joint pain, incompatibility of dentures, disturbed consciousness, or respiratory failure), (v) those who had participated in another clinical trial within seven days prior, those who were scheduled to participate in another clinical trial (excluding postmarketing surveillance), or those with difficulty in follow-up for mental, family, social, geographical, or other reasons, (vi) pediatric patients who clearly did not agree to participate in the study, and (vii) those judged to be inappropriate to participate in the study by the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint responsible physician at each site. Additionally, patients with only poor-quality images were excluded from the analysis.
In the training stage, we aimed to collect clinical information and pharyngeal images of PCRconfirmed influenza positive and negative patients in a ratio of roughly 1:1 for the most efficient supervised learning of the AI model. There is no consensus on the size of samples (i.e., number of patients) that should be used to train an AI model; thus, we arbitrarily set them to 7,500 patients, including 3,750 influenza PCR-positive and 3,750 PCR-negative. In the validation stage, we aimed to determine the lower bound of the 95% one-sided confidence interval (CI) of sensitivity to achieve 70% or higher, and that of specificity to achieve 85% or higher. With a one-sided p value of 5% and power of 85%, assuming an actual sensitivity of 80% and specificity of 90% as suggested by our training stage, we calculated the required sample sizes to be 137 for influenza PCR-positive cases and 323 for PCR-negative cases.
Therefore, we planned to stop the recruitment of the study participants on the day that both 150 positive cases and 350 negative cases were obtained.
In Japan, the first case of the SARS-CoV-2 infection (COVID-19) was reported on Jan 15, 2020, and the first wave of the pandemic occurred from late March 2020. During the study period, in the validation stage, we asked the participating clinics or hospitals to report any suspected cases of COVID-19 in the study participants. There were no such reports from any study site throughout the research, which suggests that our study was not affected by the COVID-19 pandemic.

Collection of pharyngeal images, clinical information, and nasopharyngeal specimens
In addition to the pharyngeal images of the study participants, the following clinical information was obtained using a standardized case report form based on electronic data . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint capture: age; sex; time (hours) from symptom onset; highest body temperature before study site visit; episode of close contact; status and date of the most recent influenza vaccination; use of antipyretics; subjective symptoms, including tiredness, appetite loss, chill, sweating, joint pain, muscle pain, headache, nasal discharge or congestion, cough, sore throat, and digestive symptoms; and objective findings at the study sites, including body temperature, pulse rate, and tonsillar findings (tonsillitis, white moss, and redness) according to the consulting physician.
Furthermore, nasopharyngeal swabbing was conducted to obtain nasopharyngeal specimens from the participants, which were sent to the central clinical laboratory (LSI Medience Corporation, Tokyo, Japan) for RT-PCR, which is the gold standard (reference standard) for diagnosis of influenza infection. We standardized the process of collecting the nasopharyngeal specimens among the study sites using a manual.

Development of the AI model to predict PCR-confirmed influenza
An ensemble AI model (version 2.0) was developed to predict the probability of PCRconfirmed influenza using pharyngeal images and clinical information. This model consisted of three main machine learning models: a multi-view convolutional neural network (MV-CNN), multi-modal convolutional neural network (MM-CNN), and boosting models. The three models were trained on the same training dataset from the training stage and combined using ridge regression [9].
First, the MV-CNN was trained using SE-ResNext-50 as an image feature extractor pretrained on ImageNet [10,11]. The MV-CNN architecture used several pharyngeal images that contained views from various angles [12]. In pharyngeal imaging, the tongue and uvula often overlap with the posterior pharyngeal wall. The MV-CNN addressed this issue by . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint gathering information from various image angles. One to five of the most appropriate images per patient were selected by an automatic image quality evaluation system using a lightweight CNN model [13]. The system was trained using image quality criteria defined by a physician among the authors (MF) in the training stage. The input images for the MV-CNN were resized and then augmented (e.g., flipped, rotated, blurred, and contrast changed) to improve accuracy and generalization performance. To prevent overfitting, well-established training strategies were used, including batch normalization, learning rate decay, and crossvalidation. To manage various pharyngeal magnification rates, MV-CNNs with multiple image sizes were trained and their scores were combined by averaging them.
Second, the MM-CNN was developed based on the MV-CNN to process both multi-view pharyngeal images and clinical information as input data [14,15]. In detail, the final classification layer of the MV-CNN was extended and connected to the neural network to manage clinical information. The image feature extractor of the MM-CNN was initialized with the trained MV-CNN weights. Then, the same training and ensemble strategy as the MV-CNN were applied.
Third, boosting models were trained based on the prediction results of the MV-CNN and clinical information. LightGBM and CatBoost were selected as boosting models [16,17].
Finally, the probability of influenza was obtained by integrating each prediction from the MV-CNN, MM-CNN, and boosting models using ridge regression. The ridge regression parameters were determined using cross-validation.

Statistical analysis
In the training stage, we compared the clinical characteristics of the study participants according to the PCR test result (positive or negative) using t-tests for continuous variables . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 1 0 with a normal distribution (age, highest body temperature before the study site visit, body temperature at visit, and pulse rate), the Mann-Whitney U test for continuous variables with a non-normal distribution (time from symptom onset), and chi-square tests for categorical variables. We repeated these analyses in the validation stage.
In the training stage, using a five-fold cross-validation method, we conducted receiver operating characteristic (ROC) curve analysis to measure the discrimination ability of (i) the probability score of the MV-CNN, which uses only pharyngeal images in prediction, (ii) the probability score of the clinical information AI, which is an AI model that uses all the other aforementioned clinical information (except for the pharyngeal images) in prediction, and (iii) the probability score of the ensemble AI model using both the pharyngeal images and clinical information. We also measured the reclassification ability of the pharyngeal images by comparing the clinical information AI model and the ensemble AI model by calculating the continuous net reclassification improvement (NRI) and integrated discrimination improvement (IDI) [18].
In the validation stage, we also conducted ROC analysis and calculated the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for influenza infection, according to a selected cut-off point.
We performed statistical analysis using R (version 4.1.1) and Python (version 3.8.5). P values of < 0.05 were considered to be statistically significant. A third-party organization (Statcom Co., Ltd., Tokyo, Japan) performed the sample size estimation, and calculation of the area under the receiver operating characteristic curve (AUROC) and validity (sensitivity, specificity, PPV, and NPV) in the validation stage.

Additional analysis
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 1 1 We conducted two types of additional analysis. First, we compared the performance of the AI-assisted diagnosis camera with that of three physicians. For this analysis, we used the existing data (pharyngeal images and clinical information) of 200 patients (100 influenza PCR-positive cases and 100 PCR-negative cases), which was randomly selected from the study participants in the training stage. Three physicians among the authors (SO, MF, and MIk), who were blinded to patients' identifiers and their PCR test results, assessed the data to give an influenza prediction score between 0 and 1 (i.e., between 0% and 100%). We applied the diagnostic prediction AI model to these existing data, and compared the AUROC of the diagnostic prediction AI model with that of each physician and the average prediction score of the three physicians. We recalculated the AUROC of the AI model for these 200 patients for a fair comparison.
Second, we attempted to interpret the mechanisms of the MV-CNN prediction to differentiate between influenza cases and non-cases from the pharyngeal images. We modified guided is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint CNN among the 100 PCR-positive cases and 100 PCR-negative cases, and compared the groups using chi-square tests.

The training stage
In the training stage, we obtained informed consent from 9,029 patients with influenza-like symptoms who visited one of 64 clinics or hospitals between Nov 1, 2019 and Jan 21, 2020.
Among them, 199 patients (2.2%) felt nauseous during the examination when the pharyngeal images were being captured, including one patient with severe nausea and 14 patients (0.16%) who vomited. We did not complete the image-capturing procedure for these 15 patients (0.16%). Among the remaining 9,014 patients, we selected 7,831 patients (mean age  Compared with the PCR-negative cases, the PCR-positive cases yielded the following results: the average age was slightly lower; time from symptom onset to the study site visit was shorter; the proportion of close contact, use of antipyretics, and most subjective symptoms were higher; and the temperature and pulse rate were higher, whereas the proportion of recent influenza vaccinations, digestive symptoms, and tonsillar findings were lower. There was no difference in the proportion of sex and sore throat between the groups.
Using the training dataset, we established the ensemble AI model to estimate the probability of influenza for individual patients. The feature importance of each variable in the LightGBM . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint model and CatBoost model is shown in the Supplementary Figures 1 and 2, which suggest that pharyngeal images were the most important variable in the diagnostic prediction AI model, followed by body temperature and cough.
In the five-fold cross-validation, the AUROC of the MV-CNN probability score for pharyngeal images was 0.76 (95% CI 0.75-0.77) and that of the AI model with clinical information (i.e., all the clinical information in Table 1) was 0.83 (95% CI 0.82-0.84) (Figure 2). Taken together, the AUROC of the diagnostic prediction AI model was 0.87 (95% CI 0.86-0.87), which means that the AUROC significantly increased as a result of adding pharyngeal images to the AI model with clinical information (p < 0.001). Regarding reclassification ability, the continuous NRI was 0.25 (95% CI 0.22-0.29) among PCRpositive cases and 0.33 (95% CI 0.30-0.36) among PCR-negative cases, and IDI was 0.08 (95% CI 0.07-0.08), which also indicates that the accuracy of the diagnostic prediction AI model significantly improved as a result of adding pharyngeal images to the AI model with clinical information.

The validation stage
In the validation stage, we obtained informed consent from 706 patients with influenza-like symptoms who visited one of 11 clinics or hospitals between Jan 25, 2020 and Mar 13, 2020, which comprised a safety analysis set. Among them, 12 patients (1.7%) felt nauseous during the examination when the pharyngeal images were being captured, including one patient (0.1%) with severe nausea for whom we did not complete the image-taking procedure.
Additionally, 33 patients (4.7%) did not satisfy the predefined criteria of the protocol for the full analysis set, mostly because of trouble in saving the pharyngeal images at the study sites.
Furthermore, 13 patients were excluded by the automated image quality evaluation system that removed low-quality pharyngeal images. Thus, we used the pharyngeal images and . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. women 51.7%) for the following analysis. Similar to the training stage, compared with noncases, the PCR-confirmed cases yielded the following results: the average age was slightly lower, the proportion of close contact and several subjective symptoms (tiredness, chill, nasal discharge/obstruction, and cough) was higher, and the temperature (both before the clinic/hospital visit and on site) and pulse rate were higher, whereas the proportion of tonsillar findings was lower ( Table 1).

The additional analysis
In our additional analysis, among the 200 randomly selected patients (100 influenza PCRpositive cases and 100 PCR-negative cases), the AUROC of the diagnostic prediction AI model was 0.89 (95% CI 0.84-0.93) and was higher than that of each of the three physicians (0.76, 0.73, and 0.74, respectively). It was also higher than that of the average prediction score of the three physicians (0.79, 95% CI 0.73-0.85) (Figure 3). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
In this study, we developed an AI-assisted diagnosis camera with a diagnostic prediction model for influenza. In the training stage, we found that the pharyngeal images contributed significantly to the improvement of the diagnosis prediction AI model compared with the clinical information AI. In the validation stage, the AUROC of the diagnostic prediction AI model was 0.90 (95% CI 0.87-0.93), with a sensitivity and specificity of 76% (95% CI 70-82%) and 88% (95% CI 85-91%), respectively. In our additional analysis, the AI-assisted camera performed better than three physicians in predicting influenza. Furthermore, in the importance heatmaps, we found that the AI model often focused on follicles to differentiate between PCR-positive and negative cases.
Clinical characteristics associated with PCR-confirmed influenza infection among people with influenza-like symptoms were examined in two previous studies [7,8]. Both of these studies concluded that fever and cough were the best predictors of influenza diagnosis.
However, the sensitivity and specificity of the combination of these two factors were suboptimal, at 78% and 55% in one study [7], and 64% and 67% in another study, respectively [8]. In our study, considering the feature importance of each variable in the LightGBM model and CatBoost model (Supplementary Figures 1 and 2), body temperature and cough were highly ranked among the clinical information, whereas the feature importance of pharyngeal images was even larger.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022 Recently, several AI-assisted diagnostic prediction models have been proposed for influenza diagnosis [19][20][21][22]. A single-center study from Japan reported a machine learning-based infection screening system incorporating a random tree algorithm that used vital signs [19].
The researchers reported a sensitivity of 81-96% and NPV of 81-96% in their training datasets (specificity and PPV were not reported), but the performance of the model was not validated outside the center. The University of Pittsburgh Medical Center Health System reported machine learning classifiers for influenza detection from emergency department free-text reports [20][21]. Among the 31,268 emergency department reports from four hospitals, the AUROCs of the seven machine learning classifiers for influenza detection ranged from 0.88 to 0.93 [21], which was better than an expert-built Bayesian model [20].
These studies were also limited because performance outside the health care system of the University of Pittsburgh was unknown. More recently, a Korean study reported an influenza screening system based on deep learning using a combination of epidemiological and patientgenerated health data from a mobile health app [22]. However, the gold standard in the study was the clinical diagnosis of influenza at a clinic reported by app users instead of laboratorybased confirmed influenza. Notably, none of the previous studies included an assessment of pharyngeal images in their diagnostic prediction models [19][20][21][22]. The novelty of our study is that we have developed the first AI-assisted diagnosis camera for influenza and prospectively validated its performance through a Good Clinical Practice-based clinical trial process.
We showed that pharyngeal images significantly improved the discrimination and reclassification ability of the diagnostic prediction AI model. Additionally, we considered the mechanisms by which the AI model differentiated between true influenza cases and noninfluenza cases using pharyngeal images. To the best of our knowledge, there has been no established approach to quantitatively scale regions of images on which the AI focuses.
Indeed, most previous studies on AI-assisted diagnosis cameras showed only representative . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint images highlighted using Grad-CAM or saliency maps to speculate on possible mechanisms of AI classification [23][24][25]. In our study, we attempted to quantify these regions by calculating the proportion of patients with images highlighted by the AI model for each part of the pharynx among the influenza PCR-positive cases and negative cases. Consequently, we found that the AI model mainly focused on follicles on the posterior pharyngeal wall.
Notably, this finding is in line with previous case reports and case series that suggest that follicles on the posterior pharyngeal wall are specific to influenza infection and useful for the diagnosis of influenza [26][27][28][29]. Physical examination, including visual inspection of the pharynx, generally requires the experience of individual physicians, and physical examination skills may vary widely among physicians. Our study suggests that AI could minimize the variation and may help to standardize physical examination skills among physicians.
Additionally, when attempting to discriminate between diseases, doctors may be able to learn from AI systems where to focus in their visual examination.
Our study has limitations. First, we recruited study participants with influenza-like symptoms from a large number of clinics and hospitals in Japan to increase the generalizability of our study. However, there may be a country or cultural difference in terms of people with influenza-like symptoms seeking medical care from healthcare providers. In Japan, with its universal health care coverage, people have relatively easy and timely access to clinics or hospitals compared with those in other countries. Therefore, generalizing our findings to different clinical care settings in different countries would require caution and may require independent assessment. Second, our additional analysis of the comparison between the AIassisted diagnosis camera and three physicians was not pre-planned in the study protocols (jRCTs032190120 and Pharmaceuticals and Medical Devices Agency clinical trial identification code AI-02-01), although these physicians were blinded to patients' identifiers and their PCR results. Finally, in addition to the pharyngeal images, we collected as many . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022. ; 1 8 relevant clinical variables (suggested by previous large studies [7,8]) as possible to establish an accurate diagnostic prediction AI model. However, there may be other useful variables for the prediction of true influenza diagnosis that were not collected in our study. For example, some studies have suggested that the population level trend of influenza outbreaks in an area is useful for predicting an individual patient's influenza infection [22]. Further improvement of the AI-assisted diagnosis camera by including additional variables, in addition to an improvement of the AI models to analyze pharyngeal images, are justified.
In conclusion, we developed the first AI-assisted diagnosis camera for influenza and prospectively validated its high performance. We found that the AI model often focused on follicles, which confirmed previous case reports and series that suggested that visual inspection of the pharynx would help to diagnose influenza infection. 1 9 and the clinical studies were led by the medical team, including Miho Nakamura. All the members of Aillis and their dedicated efforts have made this study possible, including, but not limited to, the aforementioned teams.

Funding
This study was funded by the New Energy and Industrial Technology Development Organization (NEDO), Japan (grant number 30STS713) and Aillis, Inc. The NEDO had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The role of Aillis, Inc. in the pilot, training, and validation stages of the study was to provide research equipment and testing costs. Aillis, Inc. was not involved in the clinical study operation, data collection process, auditing, and quality control and assurance of the study. Employees of Aillis, Inc. designed and conducted the study; managed, analysed, and interpreted the data; prepared, reviewed, and approved the article; and were involved in the decision to submit the article.

Conflict of Interest
SO is the CEO and HK is the CSO of Aillis, Inc. and they hold stock in the company. MF, MS, WT, MIk, and HK are employees of Aillis, Inc. YT and MIw received consultant fees from the company to supervise the study and draft the manuscript.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 19, 2022 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)      . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2022. ; https://doi.org/10.1101/2022.07.19.22276126 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 19, 2022