PT - JOURNAL ARTICLE AU - Seyedeh N. Khatami AU - Chaitra Gopalappa TI - A reinforcement learning model to inform optimal decision paths for HIV elimination<sup>1</sup> AID - 10.1101/2021.07.11.21260328 DP - 2021 Jan 01 TA - medRxiv PG - 2021.07.11.21260328 4099 - http://medrxiv.org/content/early/2021/07/14/2021.07.11.21260328.short 4100 - http://medrxiv.org/content/early/2021/07/14/2021.07.11.21260328.full AB - The ‘Ending the HIV Epidemic (EHE)’ national plan aims to reduce annual HIV incidence in the United States from 38,000 in 2015 to 9,300 by 2025 and 3,300 by 2030. Diagnosis and treatment are two most effective interventions, and thus, identifying corresponding optimal combinations of testing and retention-in-care rates would help inform implementation of relevant programs. Considering the dynamic and stochastic complexity of the disease and the time dynamics of decision-making, solving for optimal combinations using commonly used methods of parametric optimization or exhaustive evaluation of pre-selected options are infeasible. Reinforcement learning (RL), an artificial intelligence method, is ideal; however, training RL algorithms and ensuring convergence to optimality are computationally challenging for large-scale stochastic problems. We evaluate its feasibility in the context of the EHE goal.We trained an RL algorithm to identify a ‘sequence’ of combinations of HIV-testing and retention-in-care rates at 5-year intervals over 2015-2070, which optimally leads towards HIV elimination. We defined optimality as a sequence that maximizes quality-adjusted-life-years lived and minimizes HIV-testing and care-and-treatment costs. We show that solving for testing and retention-in-care rates through appropriate reformulation using proxy decision-metrics overcomes the computational challenges of RL. We used a stochastic agent-based simulation to train the RL algorithm. As there is variability in support-programs needed to address barriers to care-access, we evaluated the sensitivity of optimal decisions to three cost-functions.The model suggests to scale-up retention-in-care programs to achieve and maintain high annual retention-rates while initiating with a high testing-frequency but relaxing it over a 10-year period as incidence decreases. Results were mainly robust to the uncertainty in costs. However, testing and retention-in-care alone did not achieve the 2030 EHE targets, suggesting the need for additional interventions. The results from the model demonstrated convergence. RL is suitable for evaluating phased public health decisions for infectious disease control.Competing Interest StatementThe authors have declared no competing interest.Funding StatementFinancial support for this study was provided partially by a grant from National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R01AI127236. The funding agreement ensured the authors independence in designing the study, interpreting the data, writing, and publishing the report.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Not applicableAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThis model is a simulation-based optimization model. Data for the simulation and optimization model are gathered from different sources indicated in the manuscript.