%0 Journal Article %A Shreyash Sonthalia %A Muhammad Aji Muharrom %A Levana L. Sani %A Olivia Herlinda %A Adrianna Bella %A Dimitri Swasthika %A Panji Hadisoemarto %A Diah Saminarsih %A Nurul Luntungan %A Astrid Irwanto %A Akmal Taher %A Joseph L. Greenstein %T COVID-19 Likelihood Meter: a machine learning approach to COVID-19 screening for Indonesian health workers %D 2021 %R 10.1101/2021.10.15.21265021 %J medRxiv %P 2021.10.15.21265021 %X The COVID-19 pandemic poses a heightened risk to health workers, especially in low- and middle-income countries such as Indonesia. Due to the limitations to implementing mass RT-PCR testing for health workers, high-performing and cost-effective methodologies must be developed to help identify COVID-19 positive health workers and protect the spearhead of the battle against the pandemic. This study aimed to investigate the application of machine learning classifiers to predict the risk of COVID-19 positivity (by RT-PCR) using data obtained from a survey specific to health workers. Machine learning tools can enhance COVID-19 screening capacity in high-risk populations such as health workers in environments where cost is a barrier to accessibility of adequate testing and screening supplies. We built two sets of COVID-19 Likelihood Meter (CLM) models: one trained on data from a broad population of health workers in Jakarta and Semarang (full model) and tested on the same, and one trained on health workers from Jakarta only (Jakarta model) and tested on an independent population of Semarang health workers. The area under the receiver-operating-characteristic curve (AUC), average precision (AP), and the Brier score (BS) were used to assess model performance. Shapley additive explanations (SHAP) were used to analyze feature importance. The final dataset for the study included 3979 health workers. For the full model, the random forest was selected as the algorithm of choice. It achieved cross-validation mean AUC of 0.818 ± 0.022 and AP of 0.449 ± 0.028 and was high performing during testing with AUC and AP of 0.831 and 0.428 respectively. The random forest model was well-calibrated with a low mean brier score of 0.122 ± 0.004. A random forest classifier was the best performing model during cross-validation for the Jakarta dataset, with AUC of 0.824 ± 0.008, AP of 0.397 ± 0.019, and BS of 0.102 ± 0.007, but the extra trees classifier was selected as the model of choice due to better generalizability to the test set. The performance of the extra trees model, when tested on the independent set of Semarang health workers, was AUC of 0.672 and AP of 0.508. Our models yielded high predictive performance and may have the potential to be utilized as both a COVID-19 screening tool and a method to identify health workers at greatest risk of COVID-19 positivity, and therefore most in need of testing.Competing Interest StatementDr. Greenstein's participation in this study was as an unpaid consultant for Nalagenetics. All opinions expressed and implied in this manuscript do not represent or reflect the views of the Johns Hopkins University or the Johns Hopkins Health System.Funding StatementThe project is partly funded and supported by Yayasan Satriabudi Dharma Setia by providing free RT-PCR tests for hospital workers.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Hospital of University of Indonesia (Number: KET-644/UN2.F1/ETIK/PPM.00.02/2020, Date: 22 June 2020).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors. %U https://www.medrxiv.org/content/medrxiv/early/2021/10/20/2021.10.15.21265021.full.pdf