TY - JOUR T1 - Determining Personalized Community Health Needs by Feature Selection and Clustering JF - medRxiv DO - 10.1101/2020.02.21.20024612 SP - 2020.02.21.20024612 AU - Matthew Agar-Johnson Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/02/23/2020.02.21.20024612.abstract N2 - The Center for Disease Control, through the Community Health Data Initiative (CHDI), has released a large dataset by county detailing the overall health indicators, demographics, and major risk factors and causes of morbidity and mortality in the US. In order to address the heterogeneity of community healthcare in the US, k-Means clustering was performed on the CHDI dataset to determine community subtypes in terms of health challenges and outcomes. The optimal number of eight clusters was determined by the Elbow Method, and clusters were analyzed to determine significant differences in demographic. In order to determine community-specific healthcare solutions and directions, feature selection and modeling of healthcare outcomes was performed for each of the eight subtypes using LASSO regression. It was determined that different features significantly impact health outcomes in the different clusters, providing information about the unique health challenges faced by these different types of communities. LASSO regression using the entire unclustered dataset yielded significantly poorer results on the sub-clusters in terms of model performance, further supporting the claim that modeling community-specific needs is a vital step for delivering accurate and adequate community healthcare. These results have the potential to inform policymaking at the local/municipal level, as well as inform the approaches taken by primary practitioners to address community needs.Competing Interest StatementThis study is published independently and does not carry the endorsement of Carnegie Mellon University.Funding StatementNo funding was used for this study.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data is publicly available from the CDC website at https://healthdata.gov/dataset/community-health-status-indicators-chsi-combat-obesity-heart-disease-and-cancer https://healthdata.gov/dataset/community-health-status-indicators-chsi-combat-obesity-heart-disease-and-cancer ER -