RT Journal Article SR Electronic T1 Identification of Key Influencers for Secondary Distribution of HIV Self-Testing among Chinese MSM: A Machine Learning Approach JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.04.19.21255584 DO 10.1101/2021.04.19.21255584 A1 Jing, Fengshi A1 Ye, Yang A1 Zhou, Yi A1 Ni, Yuxin A1 Yan, Xumeng A1 Lu, Ying A1 Ong, Jason J A1 Tucker, Joseph D A1 Wu, Dan A1 Xiong, Yuan A1 Xu, Chen A1 He, Xi A1 Huang, Shanzi A1 Li, Xiaofeng A1 Jiang, Hongbo A1 Wang, Cheng A1 Dai, Wencan A1 Huang, Liqun A1 Mei, Wenhua A1 Cheng, Weibin A1 Zhang, Qingpeng A1 Tang, Weiming YR 2021 UL http://medrxiv.org/content/early/2021/04/20/2021.04.19.21255584.abstract AB Background HIV self-testing (HIVST) has been rapidly scaled up and additional strategies further expand testing uptake. Secondary distribution has people (indexes) apply for multiple kits and pass these kits to people (alters) in their social networks. However, identifying key influencers is difficult. This study aimed to develop an innovative ensemble machine learning approach to identify key influencers among Chinese men who have sex with men (MSM) for HIVST secondary distribution.Method We defined three types of key influencers: 1) key distributors who can distribute more kits; 2) key promoters who can contribute to finding first-time testing alters; 3) key detectors who can help to find positive alters. Four machine learning models (logistic regression, support vector machine, decision tree, random forest) were trained to identify key influencers. An ensemble learning algorithm was adopted to combine these four models. Simulation experiments were run to validate our approach.Results 309 indexes distributed kits to 269 alters. Our approach outperformed human identification (self-reported scales cut-off), exceeding by an average accuracy of 11·0%, could distribute 18·2% (95%CI: 9·9%-26·5%) more kits, find 13·6% (95%CI: 1·9%-25·3%) more first-time testing alters and 12·0% (95%CI: -14·7%-38·7%) more positive-testing alters. Our approach could also increase simulated intervention efficiency by 17·7% (95%CI: -3·5%-38·8%) than human identification.Conclusion We built machine learning models to identify key influencers among Chinese MSM who were more likely to engage in HIVST secondary distribution.Key Findings (can also be found in Figure.2-Infographic) Our proposed ensemble machine learning approach outperformed human identification (self-reported scales cut-off) in accuracy & F1 by classification metrics and in intervention efficiency by simulation experiments. Our model could also distribute more kits, find more first-time/positive-testing alters than human identification.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study is funded by National Natural Science Foundation of China [NSFC 81903371], U.S. National Institutes of Health [NIAID K24AI143471], UNC Center for AIDS Research [NIAID 5P30AI050410], and SESH Global projects.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Our dataset is from a survey and our ethical review of biomedical research has been obtained from the Ethics Committee of Zhuhai Center for Disease Control and Prevention prior to study enrollment (Number: ZhuhaiCDC-201901). For the survey data collection, all participants have be provided online consents and sign it electronically prior to taking part in our studies.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesTraining set data for machine learning modeling will be made available to others after obtaining the relevant data sharing agreement and finishing the future quasi-experiment.