Visualizing count data regressions using rootograms
Date Issued
2016-01-01
Author(s)
Zeileis, Achim
DOI
10.1080/00031305.2016.1173590
Abstract
The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, e.g., in finite mixture models. An empirical illustration revisiting a well-known data set from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models, the second, using data from finance, involves underdispersion. An proglang{R} implementation of our tools is available in the proglang{R}~package pkg{countreg}. It also contains the data and replication code. The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, e.g., in finite mixture models. An empirical illustration revisiting a well-known data set from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models, the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg . It also contains the data and replication code.
File(s)![Thumbnail Image]()
Loading...
Name
20171211184129_5a2ec3498b5c3.pdf
Size
382.87 KB
Format
Adobe PDF
Checksum
(MD5):8c7cedc50a9056e0208f0cdb7fa24750