An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution

Qian Di; Heresh Amini; Liuhua Shi; Itai Kloog; Rachel Silvern; James Kelly; M Benjamin Sabath; Christine Choirat; Petros Koutrakis; Alexei Lyapustin; Yujie Wang; Loretta J Mickley; Joel Schwartz

doi:10.1016/j.envint.2019.104909

An ensemble-based model of PM_2.5 concentration across the contiguous United States with high spatiotemporal resolution

Environ Int. 2019 Sep:130:104909. doi: 10.1016/j.envint.2019.104909. Epub 2019 Jul 1.

Authors

Affiliations

¹ Department of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United States; Research Center for Public Health, Tsinghua University, Beijing, China. Electronic address: qiandi@mail.harvard.edu.
² Department of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United States.
³ Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, Israel.
⁴ Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, United States.
⁵ U.S. Environmental Protection Agency, Office of Air Quality Planning & Standards, Research Triangle Park, NC, United States.
⁶ Department of Biostatistics, Harvard T.H. Chan School of Public Heath, Boston, MA, United States.
⁷ NASA Goddard Space Flight Center, Greenbelt, MD, United States.
⁸ University of Maryland, Baltimore County, Baltimore, MD, United States.
⁹ John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United States.

Abstract

Various approaches have been proposed to model PM_2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM_2.5 at a resolution of 1 km × 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM_2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R² of 0.86 for daily PM_2.5 predictions. For annual PM_2.5 estimates, the cross-validated R² was 0.89. Our model demonstrated good performance up to 60 μg/m³. Using trained PM_2.5 model and predictor variables, we predicted daily PM_2.5 from 2000 to 2015 at every 1 km × 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km × 1 km grids to downscale PM_2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM_2.5 for every 1 km × 1 km grid cell. This PM_2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM_2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance.

Keywords: Ensemble model; Fine particulate matter (PM(2.5)); Gradient boosting; Neural network; Random forest.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Air Pollutants / analysis*
Air Pollution / statistics & numerical data*
Algorithms
Environmental Monitoring / methods*
Machine Learning
Models, Statistical*
Particulate Matter / analysis*
United States

Substances

Air Pollutants
Particulate Matter

Grants and funding

EPA999999/ImEPA/Intramural EPA/United States