edoc

A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide

Chen, Jie and de Hoogh, Kees and Gulliver, John and Hoffmann, Barbara and Hertel, Ole and Ketzel, Matthias and Bauwelinck, Mariska and van Donkelaar, Aaron and Hvidtfeldt, Ulla A. and Katsouyanni, Klea and Janssen, Nicole A. H. and Martin, Randall V. and Samoli, Evangelia and Schwartz, Per E. and Stafoggia, Massimo and Bellander, Tom and Strak, Maciek and Wolf, Kathrin and Vienneau, Danielle and Vermeulen, Roel and Brunekreef, Bert and Hoek, Gerard. (2019) A comparison of linear regression, regularization, and machine learning algorithms to develop Europe-wide spatial models of fine particles and nitrogen dioxide. Environment international, 130. p. 104934.

[img] PDF - Published Version
Available under License CC BY-NC-ND (Attribution-NonCommercial-NoDerivatives).

2917Kb

Official URL: https://edoc.unibas.ch/71312/

Downloads: Statistics Overview

Abstract

Empirical spatial air pollution models have been applied extensively to assess exposure in epidemiological studies with increasingly sophisticated and complex statistical algorithms beyond ordinary linear regression. However, different algorithms have rarely been compared in terms of their predictive ability. This study compared 16 algorithms to predict annual average fine particle (PM; 2.5; ) and nitrogen dioxide (NO; 2; ) concentrations across Europe. The evaluated algorithms included linear stepwise regression, regularization techniques and machine learning methods. Air pollution models were developed based on the 2010 routine monitoring data from the AIRBASE dataset maintained by the European Environmental Agency (543 sites for PM; 2.5; and 2399 sites for NO; 2; ), using satellite observations, dispersion model estimates and land use variables as predictors. We compared the models by performing five-fold cross-validation (CV) and by external validation (EV) using annual average concentrations measured at 416 (PM; 2.5; ) and 1396 sites (NO; 2; ) from the ESCAPE study. We further assessed the correlations between predictions by each pair of algorithms at the ESCAPE sites. For PM; 2.5; , the models performed similarly across algorithms with a mean CV R; 2; of 0.59 and a mean EV R; 2; of 0.53. Generalized boosted machine, random forest and bagging performed best (CV R; 2; ~0.63; EV R; 2; 0.58-0.61), while backward stepwise linear regression, support vector regression and artificial neural network performed less well (CV R; 2; 0.48-0.57; EV R; 2; 0.39-0.46). Most of the PM; 2.5; model predictions at ESCAPE sites were highly correlated (R; 2; > 0.85, with the exception of predictions from the artificial neural network). For NO; 2; , the models performed even more similarly across different algorithms, with CV R; 2; s ranging from 0.57 to 0.62, and EV R; 2; s ranging from 0.49 to 0.51. The predicted concentrations from all algorithms at ESCAPE sites were highly correlated (R; 2; > 0.9). For both pollutants, biases were low for all models except the artificial neural network. Dispersion model estimates and satellite observations were two of the most important predictors for PM; 2.5; models whilst dispersion model estimates and traffic variables were most important for NO; 2; models in all algorithms that allow assessment of the importance of variables. Different statistical algorithms performed similarly when modelling spatial variation in annual average air pollution concentrations using a large number of training sites.
Faculties and Departments:09 Associated Institutions > Swiss Tropical and Public Health Institute (Swiss TPH)
09 Associated Institutions > Swiss Tropical and Public Health Institute (Swiss TPH) > Department of Epidemiology and Public Health (EPH) > Environmental Exposures and Health Systems Research > Physical Hazards and Health (Röösli)
UniBasel Contributors:de Hoogh, Kees and Vienneau, Danielle
Item Type:Article, refereed
Article Subtype:Research Article
Publisher:Elsevier
ISSN:0160-4120
Note:Publication type according to Uni Basel Research Database: Journal article
Language:English
Identification Number:
edoc DOI:
Last Modified:10 Jul 2019 14:25
Deposited On:10 Jul 2019 14:25

Repository Staff Only: item control page