PT - JOURNAL ARTICLE AU - Lapp, Zena AU - Han, Jennifer H AU - Wiens, Jenna AU - Goldstein, Ellie JC AU - Lautenbach, Ebbing AU - Snitkin, Evan S TI - Machine learning models to identify patient and microbial genetic factors associated with carbapenem-resistant <em>Klebsiella pneumoniae</em> infection AID - 10.1101/2020.07.06.20147306 DP - 2020 Jan 01 TA - medRxiv PG - 2020.07.06.20147306 4099 - http://medrxiv.org/content/early/2020/11/04/2020.07.06.20147306.short 4100 - http://medrxiv.org/content/early/2020/11/04/2020.07.06.20147306.full AB - Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a critical-priority antibiotic resistance threat that has emerged over the past several decades, spread across the globe, and accumulated resistance to last-line antibiotic agents. While CRKP infections are associated with high mortality, only a small subset of patients acquiring CRKP colonization will develop clinical infection. Here, we sought to determine the relative importance of patient characteristics and CRKP genetic background in determining patient risk of infection. Machine learning models classifying colonization vs. infection were built using whole-genome sequences and clinical metadata from a comprehensive set of 331 CRKP isolates collected across 21 long-term acute care hospitals over the course of a year. Model performance was evaluated based on area under the receiver operating characteristics curve (AUROC) on held-out test data. We found that patient and genomic features were predictive of clinical CRKP infection to similar extents (AUROC IQRs: patient=0.59-0.68, genomic=0.55-0.61, combined=0.62-0.68). Patient predictors of infection included the presence of indwelling devices, kidney disease and length of stay. Genomic predictors of infection included presence of the ICEKp10 mobile genetic element carrying the yersiniabactin iron acquisition system, and disruption of an O-antigen biosynthetic gene in a sub-lineage of the epidemic ST258 clone. Altered O-antigen biosynthesis increased association with the respiratory tract, and subsequent ICEKp10 acquisition was associated with increased virulence. These results highlight the potential of integrated models including both patient and microbial features to provide a more holistic understanding of patient clinical trajectories.Importance Multidrug resistant organisms, such as carbapenem-resistant Klebsiella pneumoniae (CRKP), colonize alarmingly large fractions of patients in endemic regions, but only a subset of patients develop life-threatening infections. While patient characteristics influence risk for infection, the relative contribution of microbial genetic background to patient risk remains unclear. We used machine learning to determine whether patient and/or microbial characteristics can discriminate between CRKP colonization vs. infection across multiple healthcare facilities and found that both patient and microbial factors were predictive. Examination of informative microbial genetic features revealed features associated with respiratory colonization and higher rates of infection. The methods and findings presented here provide a foundation for future epidemiological, clinical, and biological studies to better understand bacterial infections and clinical outcomes.Competing Interest StatementJHH was employed at the University of Pennsylvania during the conduct of this study. She is currently an employee of, and holds shares in, the GSK group of companies.Funding StatementThis research was supported by a CDC Cooperative Agreement FOA #CK16-004-Epicenters for the Prevention of Healthcare Associated Infections, and the National Institutes of Health R01 AI139240-01 and 1R01 AI148259-01. ZL received support from the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 1256260. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The funding bodies had no role in the design of the study or collection, analysis, and interpretation of data, or in writing the manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study that collected this data was reviewed and approved by the Institutional Review Board of the University of Pennsylvania with a waiver of informed consent.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll code and data that is not protected health information is on GitHub. https://github.com/Snitkin-Lab-Umich/ml-crkp-infection-manuscript