Solicited Cough Sound Analysis for Tuberculosis Triage Testing: The CODA TB DREAM Challenge Dataset

Cough is a common and commonly ignored symptom of lung disease. Cough is often perceived as difficult to quantify, frequently self-limiting, and non-specific. However, cough has a central role in the clinical detection of many lung diseases including tuberculosis (TB), which remains the leading infectious disease killer worldwide. TB screening currently relies on self-reported cough which fails to meet the World Health Organization (WHO) accuracy targets for a TB triage test. Artificial intelligence (AI) models based on cough sound have been developed for several respiratory conditions, with limited work being done in TB. To support the development of an accurate, point-of-care cough-based triage tool for TB, we have compiled a large multi-country database of cough sounds from individuals being evaluated for TB. The dataset includes more than 700,000 cough sounds from 2,143 individuals with detailed demographic, clinical and microbiologic diagnostic information. We aim to empower researchers in the development of cough sound analysis models to improve TB diagnosis, where innovative approaches are critically needed to end this long-standing pandemic.


ABSTRACT
Cough is a common and commonly ignored symptom of lung disease.Cough is often perceived as difficult to quantify, frequently self-limiting, and non-specific.However, cough has a central role in the clinical detection of many lung diseases including tuberculosis (TB), which remains the leading infectious disease killer worldwide.TB screening currently relies on self-reported cough which fails to meet the World Health Organization (WHO) accuracy targets for a TB triage test.Artificial intelligence (AI) models based on cough sound have been developed for several respiratory conditions, with limited work being done in TB.To support the development of an accurate, pointof-care cough-based triage tool for TB, we have compiled a large multi-country database of cough sounds from individuals being evaluated for TB.The dataset includes more than 700,000 cough sounds from 2,143 individuals with detailed demographic, clinical and microbiologic diagnostic information.We aim to empower researchers in the development of cough sound analysis models to improve TB diagnosis, where innovative approaches are critically needed to end this long-standing pandemic.

BACKGROUND AND SUMMARY
Tuberculosis remains the leading infectious disease killer globally, partly due to public health systems' inability to accurately diagnose millions of infected individuals every year. 1 Insufficient access to high-quality TB screening and diagnosis is recognized as one of the most important gaps in the cascade of care. 2 Here we describe a cough sound database including detailed demographic, clinical and microbiologic information for the development of AI-based sound classification TB triage models. The "missing millions" of undiagnosed patients living with active TB disease represent an heterogeneous group including those who did not access triage or diagnosis testing or weren't appropriately referred for effective treatment.Improving the accuracy, portability, point-of-care amenability and connectivity of diagnostic tools and algorithms would have significant value.Most health systems build their TB programs on a combination of complementary screening followed by diagnostic tests.The WHO's target product profile (TPP) for a community-based TB triage test suggests that it should be at least 90% sensitive and 70% specific. 6According to the 2021 WHO TB screening guidelines, symptom-based screening with questionnaires, including cough, is 42% sensitive. 7Besides having poor accuracy these guidelines have operational challenges that impede its sustained and uniform implementation within resource-challenged TB programs.Other tools such as digital chest X-rays combined with computer-aided detection (CAD) algorithms have also been evaluated in the context of TB triage.This approach was shown to be highly sensitive but had variable specificity and remains difficult to deploy due to limited availability of chest X-ray platforms at primary-level health facilities. 8Whether in the context of community-based outreach screening or healthcare facility-based evaluation prior to confirmatory testing, cough classification models could complement or replace other triage strategies including symptom-based screening.
. We historically have been unable to objectively monitor cough sounds and consequently reduced this data-rich symptom into subjective and dichotomous information (e.g., cough versus no cough, chronic versus acute, better versus worse).Advances in acoustics and machine learning (ML) have enabled the identification and recording of human coughs in real-world acoustic environments (cough detection) as well as differentiation of coughs from patients with distinct clinical conditions or at different stages of disease (cough classification).As part of the emerging field of Acoustic Epidemiology, this has the potential to develop novel screening or diagnostic assays with simple digital recording devices, such as a smartphone, tablet or watch. 5Proof-ofconcept studies previously showed that cough associated with TB contains a specific acoustic signature which can be recognized by ML models.A study by Pahar et al.    suggests that a cough-based TB screening model can discriminate TB cough sounds from those associated with other lung conditions with 93% sensitivity and 95% specificity, exceeding the WHO TPPs. 9In a study combining cough sound analysis and patients' clinical characteristics, Yellapu et al. report that ML can be used to detect TB with 90% sensitivity and 85% specificity. 10Those pilot studies report on ML models which were designed on small datasets and were not validated in external populations.
Given the potential impact on performance of local disease epidemiology and population ethnicity among other confounders, large and diverse cough datasets are needed to replicate those studies.
We collected and are here releasing a dataset including 733,756 cough sounds from 2,143 patients across 7 countries with accurately annotated demographic, clinical and microbiologic diagnostic information.These data were initially used to enable and evaluate the CODA TB DREAM Challenge which invited participants to develop algorithms for prediction of TB diagnosis.The training data are now available for general use, and researchers are invited to leverage acoustic and clinical data to further develop and evaluate sound classification models for TB screening against a held-out test partition. 11We aim to enable the development of models which could achieve the WHO TPP performance targets for the current 'community-based TB triage test' or the forthcoming TPP for a TB screening test. 6,12This data set has limitations which include some selection bias since it was collected from a symptomatic presumptive TB .population.The developed models which will be developed may hence not perform as well if used for asymptomatic screening at population level.Accordingly, more data should be collected from community screening activities.

Participants
A total of 2,143 participants were recruited from two parent studies described below.To be eligible, participants had to be 18 years or older and have a new or worsening cough for at least two weeks.took place at outpatient clinics in India, Madagascar, the Philippines, South Africa, Tanzania, Uganda, and Vietnam.All participants provided informed consent.A summary of participant demographics and country distribution are available in Table 1.

Rapid Research in Diagnostic Development TB Network (R2D2 TB Network) study:
The R2D2 TB Network study evaluates novel TB diagnostics in various stages of development among people with presumptive TB in five low-and middle-income countries: Uganda, South Africa, Vietnam, the Philippines and India. 13 The microbiologic reference standard is considered the primary reference standard.Full details of the reference standards are described in Table 3.
Cough Recording.Cough sounds were collected using smartphones loaded with the Hyfe research app. 14Specific phone models used in the different participating sites are presented in Supplementary Materials 1. Hyfe research app is designed to listen for explosive sounds and record ~0.5 seconds sound fragments corresponding to putative cough sounds.Hyfe research app uses a server-based convolutional neural network (CNN) model to classify explosive sounds as coughs and recordings of these cough sounds are saved on a protected health information (PHI)-regulated server for analysis.
This model has been shown to be 96% sensitive and 96% specific for cough detection using human-labeled sounds as a reference standard. 15Smartphones were positioned on tripods in rooms within the clinic.Participants were asked to cough five times (solicited cough) while standing 60-90 cm from the tripod; participants who managed to produce at least three coughs were retained in the dataset.Some participants produced more than five coughs due to a triggered coughing fit and those additional coughs were also collected and included in the dataset.Solicited and triggered coughs could not be labeled distinctively and are treated the same in the dataset.After enrollment and onboarding, a subset of participants (n = 565) were also asked to carry a study phone for two weeks and collect longitudinal coughs sounds in an outpatient setting.Those sounds are labeled as longitudinal and made available within the dataset.A tally of solicited and longitudinal cough sounds per data partition are available in Table 4.

Data Partitioning
The dataset was split into a training (n=1,105) and validation set (n=1,038).The dataset was randomly partitioned evenly between the training and testing set at the level of the participant (i.e., all of a participant's cough sounds are in either the training or validation set).

Data Pre-Processing
Cough sounds: The sound recordings available in this dataset have not undergone preprocessing beyond their identification as a cough sound by the Hyfe research app CNN model.
. Clinical Data: Data from all participating sites were collected with standardized questionnaires and definitions.Data formatting was harmonized in the open access database.

Dataset Description
Sage Bionetworks independently verified the variable balance between the training and validation sets as demonstrated in Table 1.A breakdown of key demographics and microbiologic reference standard results by country are shown in Table 5.

DATA RECORDS
De-identified participant demographic and clinical data, including TB reference standard results, cough sound WAV files, and a datafile linking participant IDs to sound file IDs were exported to a dedicated project in Synapse.Synapse is a general-purpose data and analysis sharing service where members can work collaboratively, analyze data, share insights, and have attributions and provenance of those insights to share with others.Synapse is developed and operated by Sage Bionetworks 16 .A total of 1,105 participants' data are made available for access and download as a training dataset.
The validation set is withheld, but models can be evaluated against the validation set via the instructions provided in the Synapse project.
All training set files are stored and are accessible via the Synapse platform with associated metadata and documentation and can be accessed at the following URL: www.synapse.org/TBcough-https://doi.org/10.7303/syn31472953.

TECHNICAL VALIDATION
All cough collection periods were observed by study staff and cough sounds were spotchecked for accurate recording.Patient metadata was reviewed by study staff for accuracy.The data described in this article were collected using the Hyfe Research app which uses a proprietary algorithm to identify cough sounds.We used a prediction score of 0.8 from this algorithm to filter potential non-cough sounds.To validate the precision of the Hyfe algorithm, a standalone computer vision and deep learning model was trained using Log Mel spectrogram images from ESC-50 and Coswara datasets. 17,18 "VGG16" CNN based pre-trained model was trained for accurate classification of cough sounds and achieved a model accuracy of approximately 96% on Hyfe cough recordings. 19Most of the recordings that were classified incorrectly had a Hyfe prediction score less than 0.8.

USAGE NOTES
Users can register to evaluate predictive models of TB diagnosis against the held-out test partition via the instructions on the Synapse project.
The Digital Cough Monitoring for screening, diagnosis and clinical follow-up of tuberculosis and other respiratory diseases project: This project was designed to embed digital cough monitoring within existing health facility-based TB diagnostic cohorts in Madagascar and Tanzania.Ethical approval for this study was obtained from institutional review boards (IRB) in Canada and in each study site.In Canada, approval was obtained from the Centre de Recherche du Centre Hospitalier de l'Université de RIF Ultra result was indeterminate or trace-positive, received a second sputum Xpert MTB/RIF Ultra test.Results from those assays were combined to determine TB status according to two reference standards: a microbiologic reference standard and a sputum Xpert reference standard.The sputum Xpert reference standard is restricted to Xpert MTB/RIF Ultra results on sputum samples.The microbiologic reference standard includes culture results, allowing for more individuals to be classified as TB positive.
Ethical approval for this study was obtained from institutional review boards (IRB) in the US and in each study site.In the US, approval was obtained from the University of California San Francisco IRB (# 20-32670).In Vietnam, approval was obtained from the Ministry of Health Ethical Committee for National Biological Medical Research (94/CN-HĐĐĐ), the National Lung Hospital Ethical Committee for Biological Medical Research (566/2020/NCKH) and the Hanoi Department of Health, Hanoi Lung Hospital Science and Technology Initiative Committee (22/BVPHN).In India, approval was obtained from Data Collection Demographic and clinical data.At enrollment into the parent studies, participants underwent a baseline questionnaire, clinical examination, and sputum collection for TB testing.Study staff also recorded participants' age, gender, height, weight, smoking status and duration of cough.HIV diagnosis was made either based on participant selfreport of a positive HIV diagnosis or a positive test result.A summary of the available variables is shown in Table 2. TB Reference Standard Testing.Both Xpert MTB/RIF Ultra PCR and mycobacterial culture (Lowenstein-Jensen solid medium or MGIT liquid medium) were performed on sputum collected from all participants.Any participant whose first sputum Xpert MTB/

Table 1 -Participant demographics across training and test sets 353
The scoring mechanism can evaluate two different types of models:(1)those that use only cough sounds, or(2)those which also incorporate clinical metadata variables which have been provided in the training dataset (sex, age, height, weight, reported duration of cough, prior TB diagnosis and type, hemoptysis, heart rate, temperature, weight loss, smoking in the last week, fever and night sweats).Models are submitted to the scoring queues as Docker images.Full instructions and example code is available on Synapse project website (www.synapse.org/TBcough).Given the number of files represented in the data, users should consider downloading the data via one of the programmatic Synapse clients (available in R or Python).For convenience, Python code for downloading the data is provided in the Synapse project wiki.The training dataset size is (0.43 GB) for the solicited coughs and (31.6 GB) for the longitudinal coughs.To access the data, individuals must become Certified and Validated users of Synapse and maintain an active account on Synapse: http://www.synapse.org.They must also submit an Intended Data Use Statement and agree to the Terms of Use of the dataset.