RT Journal Article SR Electronic T1 Generating interpretable predictions about antidepressant treatment stability using supervised topic models JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.03.18.20038232 DO 10.1101/2020.03.18.20038232 A1 Michael C. Hughes A1 Melanie F. Pradier A1 Andrew Slavin Ross A1 Thomas H. McCoy, Jr A1 Roy H. Perlis A1 Finale Doshi-Velez YR 2020 UL http://medrxiv.org/content/early/2020/03/20/2020.03.18.20038232.abstract AB Importance In the absence of readily-assessed and clinically-validated predictors of treatment response, pharmacologic management of major depressive disorder (MDD) often relies on trial and error.Objective To utilize electronic health records to identify predictors of treatment response, while preserving interpretability of predictions despite large numbers of covariates.Design Retrospective cohort study.Setting Two academic medical centers in Boston, including outpatient primary and specialty care clinics.Participants 81,630 adults with a coded diagnosis of MDD.Exposure Treatment with 1 or more of 11 standard antidepressants.Main Outcomes and Methods Stable treatment, intended as a proxy for treatment effectiveness, defined as continued prescription of an antidepressant for 90 days. We trained supervised topic models to extract 10 interpretable covariates from coded clinical data for stability prediction. Then, using data from one hospital system (Site A) we trained generalized linear models and ensembles of decision trees to predict stability outcomes from topic features that summarize patient history. We evaluated on held-out patients from Site A as well as all individuals from a second hospital system (B).Results Among the 81,630 adults (31% male; age 18-80 with mean 48.46), we identified 55,303 who reached a stable treatment regimen during follow-up. For held-out patients from Site A, mean area-under-the-receiver-operating-characteristic-curve (AUC) discrimination for general stability outcome was 0.627 (95% confidence interval (CI) 0.615 - 0.639) for our supervised topic model with 10 covariates. In evaluation on site B, our approach achieved similar AUC of 0.619 (95% CI 0.610 - 0.627). Building models to predict stability specific to a particular drug did not improve upon predicting general stability, even when using a harder-to-interpret ensemble classifier and 9,256 coded covariates (specific AUC = 0.647, 95% CI 0.635-0.658; general AUC = 0.661, 95% CI 0.648-0.672). Topics coherently captured clinical concepts associated with treatment response.Conclusions and Relevance Coded clinical data available in electronic health records facilitated prediction of general treatment response, but not response to specific medications. While greater discrimination is likely required for clinical application, our results provide a simple and transparent baseline for such studies.Funding Oracle Labs, Harvard SEAS, and National Institute of Mental Health.Links Supplement document providing more results, links to interactive visualizations, and detailed procedures for reproducibility https://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_Supplement.pdfSTROBE checklist: https://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_STROBE_checklist.pdfOpen-source code for our proposed machine learning methods https://github.com/dtak/prediction-constrained-topic-modelsQuestion How well can coded clinical data from electronic health records be used to predict achievement of a stable antidepressant regimen in major depressive disorder?Findings In this in silico cohort study of 81,630 adults, we identified 55,303 who reached a stable antidepressant treatment regimen. Predictions using generalized linear models or ensembles of decision trees applied to diagnosis, procedure, and medication codes, as well as low-dimensional summaries of these codes via supervised topic models, achieved area under receiver operating characteristic curve values of ∼0.62-0.65; treatment-specific models performed no better than general treatment outcome models.Meaning Coded clinical data can facilitate prediction of antidepressant treatment outcomes, but medication-specific models do not outperform general response prediction models.Competing Interest StatementAuthors MCH and FDV report gifts from Oracle that supported this work. MFP reports financial support from Harvard SEAS. Author THM reports grants from Stanley Center, grants from Brain & Behavior Research Foundation during the conduct of the study. Author RHP reports grants from National Human Genome Research Institute, grants from National Institute of Mental Health during the conduct of the study; personal fees from Genomind, personal fees from Psy Therapeutics, personal fees from RID Ventures, personal fees from Outermost Therapeutics, and personal fees from Takeda, outside the submitted work. Clinical Protocols https://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_Supplement.pdf https://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_Supplement.pdfhttps://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_STROBE_checklist.pdf Funding StatementNo funding source contributed to any aspect of study design, data collection, data analysis, or data interpretation. The corresponding author (MCH) had full access to all the data in the study. All authors shared the final responsibility for the decision to submit for publication.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesStudy data consisted of deidentified electronic health records from academic medical centers, but cannot be made available in general due to IRB restrictions. Please contact the authors with questions or concerns.