TY - JOUR T1 - Patient-Level Clinical Expertise Enhances Prostate Cancer Recurrence Predictions with Machine Learning JF - medRxiv DO - 10.1101/2022.03.22.22272635 SP - 2022.03.22.22272635 AU - Jacqueline Jil Vallon AU - Neil Panjwani AU - Xi Ling AU - Sushmita Vij AU - Sandy Srinivas AU - John Leppert AU - Mohsen Bayati AU - Mark K. Buyyounouski Y1 - 2022/01/01 UR - http://medrxiv.org/content/early/2022/03/25/2022.03.22.22272635.abstract N2 - With rising access to electronic health record data, application of artificial intelligence to create clinical risk prediction models has grown. A key component in designing these models is feature generation. Methods used to generate features differ in the degree of clinical expertise they deploy (from minimal to population-level to patient-level), and subsequently the extent to which they can extract reliable signals and be automated. In this work, we develop a new process that defines how to systematically implement patient-level clinician feature generation (CFG), which leverages clinical expertise to define concepts relevant to the outcome variable, identify each concept’s associated features, and finally extract most features on a per-patient level by manual chart review. We subsequently apply this method to identifying and extracting patient-level features predictive of cancer recurrence from progress notes for a cohort of prostate cancer patients. We evaluate the performance of the CFG process against an automated feature generation (AFG) process via natural language processing techniques. The machine learning outcome prediction model leveraging the CFG process has a mean AUC-ROC of 0.80, in comparison to the AFG model that has a mean AUC-ROC of 0.74. This relationship remains qualitatively unchanged throughout extensive sensitivity analyses. Our analyses illustrate the value of in-depth specialist reasoning in generating features from progress notes and provide a proof of concept that there is a need for new research on efficient integration of in-depth clinical expertise into feature generation for clinical risk prediction.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was funded by National Science Foundation award CMMI: 1554140Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of Stanford University gave ethical approval for this workI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSummary statistics of the data are available upon reasonable request to the corresponding author. The raw patient data includes PHI and is thereby protected by privacy laws and cannot be shared. ER -