RT Journal Article SR Electronic T1 Integrating Machine Learning-Based Variable Selection into Heat Vulnerability Index Design JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2026.03.29.26349672 DO 10.64898/2026.03.29.26349672 A1 Qu, Shuyue A1 Sillmann, Jana A1 Barrett, Benjamin W. A1 Graffy, Peter M. A1 Poschlod, Benjamin A1 Brunner, Lukas A1 Mansour, Raed A1 Szombathely, Malte von A1 Hay-Chapman, Finley A1 Horton, Teresa H. A1 Chan, Jennifer A1 Rao, Sheetal Khedkar A1 Woods, Kyra A1 Kho, Abel N A1 Horton, Daniel E. YR 2026 UL http://medrxiv.org/content/early/2026/03/31/2026.03.29.26349672.abstract AB As climate change intensifies, health risks from extreme heat are rising. Accurate assessment of heat vulnerability at high spatial resolution is crucial for developing effective adaptation strategies, particularly in socioeconomically heterogeneous urban settings. However, the identification of key indicators underlying heat vulnerability remains challenging. Using Chicago, Illinois (USA) as a case study, we systematically compare different variable selection strategies in community-level heat vulnerability assessments. We take the conventional unsupervised principal component analysis (PCA)-based Heat Vulnerability Index (HVI) as a baseline, and compare it with supervised approaches that incorporate variable selection, including machine learning algorithms (Lasso regression, Random Forest, and XGBoost) as well as traditional statistical methods (simple linear regression and polynomial regression). Using the vulnerability indicator subsets identified by each variable selection method, we construct multiple HVIs and evaluate their performance against heat-related excess mortality. Our work indicates that supervised variable selection improves the performance of HVIs in capturing heat-related health risks. Among all methods, the Random Forest-based variable selection algorithm achieves the best overall results, highlighting the potential of machine learning to enhance heat vulnerability assessment tools. Our results demonstrate that poverty rate, lack of air conditioning, and proportion of residents aged 65 and above are robust determinants of heat vulnerability in Chicago.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was funded by the School of Integrated Climate and Earth System Sciences at University of Hamburg; the Office of Global Initiatives at the McCormick School of Engineering and the Paula M. Trienens Institute for Sustainability and Energy at Northwestern University; and the Buffett Institute for Global Affairs Defusing Disasters Working Group. JS, BP, LB and MvS acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy - EXC 2037 "CLICCS - Climate, Climatic Change, and Society" - Project Number 390683824.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Institutional Review Board of Northwestern University gave ethical approval for this work (protocol number STU00219292).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesMeteorological data and socio-demographic data are publicly available from relevant data providers. Mortality data were obtained through institutional access and are not publicly available. Access to mortality data may be requested from the original data provider, subject to their approval. https://www.census.gov/programs-surveys/acs.html https://daymet.ornl.gov/ https://data.cityofchicago.org/ https://www.usgs.gov/centers/eros/science/national-land-cover-database