Performance Analysis of Speech Recognition Models in Automated Scoring of the QuickSIN Test

Arman Hassanpour; Yan Jiang; Paula Folkeard; Ewan Macpherson; Susan D. Scollie; Vijay Parsa

doi:10.1101/2025.07.25.25332211

Abstract

Purpose Best practices in audiology recommend assessing speech understanding in noisy environments, especially for those with communication difficulties. Speech-in-noise (SiN) assessments such as the QuickSIN are used for validating signal processing in hearing aids (HAs) and are linked to HA satisfaction. This project seeks to enhance QuickSIN test efficiency by applying recent advancements in automatic speech recognition (ASR) technologies.

Method Twenty-three adults with sensorineural hearing loss were fitted bilaterally with Unitron Moxi HAs and were administered the QuickSIN test in low and high reverberation environments. Testing was performed with two different HA programs: an omnidirectional program and a fixed directional microphone program. QuickSIN sentences were presented from 0° azimuth and competing babble from either 0°, laterally from 90° or 270°, or simultaneously from 90°, 180°, and 270° azimuths. Participants’ verbal responses to QuickSIN stimuli were scored by an audiologist and were recorded in parallel for offline transcription and scoring by ASR models from Amazon, Microsoft, NVIDIA, and Picovoice. The ASR-derived QuickSIN scores were compared to the corresponding audiologist-derived scores.

Results Repeated Measures ANOVA results revealed that all ASR models overestimated the QuickSIN scores across most test conditions. Bland-Altman analyses showed that the Amazon ASR model had the least bias and the narrowest range for the limits of agreement, in comparison to the manual scoring by an experienced audiologist.

Conclusions Some ASR models, such as Amazon, demonstrated performance comparable to that of an audiologist in automatically scoring QuickSIN tests. However, further refinements are necessary to increase the robustness of the ASR models in scoring low SNR loss test conditions.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was supported by the Ontario Research Fund Grant RE08-072 (PI: Dr. Susan Scollie) and the NSERC Discovery Grant to Dr. Vijay Parsa.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The Health Sciences Research Ethics Board (HSREB) of Western University gave ethical approval for this work. Approval was issued on April 23, 2024, for the study titled 'Speech in Noise Test Scoring Using Automatic Speech Recognition', Project ID 124196, Review Reference 2024-124196-91975.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

Data Availability Statement: All data produced and analyzed in the present study including QuickSIN scores, ASR transcriptions, and related statistical analyses are available upon reasonable request to the corresponding author. Due to ethical constraints associated with participant privacy and institutional research ethics board (REB) approval, raw audio recordings are not publicly shared but may be made available under appropriate data sharing agreements. Supplementary performance metrics and tables are included in the manuscript and supplementary material.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.