Title: | Conway-Maxwell Poisson (COM-Poisson) Regression |
---|---|
Description: | Fit Conway-Maxwell Poisson (COM-Poisson or CMP) regression models to count data (Sellers & Shmueli, 2010) <doi:10.1214/09-AOAS306>. The package provides functions for model estimation, dispersion testing, and diagnostics. Zero-inflated CMP regression (Sellers & Raim, 2016) <doi:10.1016/j.csda.2016.01.007> is also supported. |
Authors: | Kimberly Sellers <[email protected]> Thomas Lotze <[email protected]> Andrew Raim <[email protected]> |
Maintainer: | Andrew Raim <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.8.1 |
Built: | 2024-11-03 04:36:41 UTC |
Source: | https://github.com/lotze/compoissonreg |
This package offers the ability to compute the parameter estimates for a COM-Poisson or zero-inflated (ZI) COM-Poisson regression and associated standard errors. This package also provides a hypothesis test for determining statistically significant data dispersion, and other model diagnostics.
This package offers the ability to compute COM-Poisson parameter
estimates and associated standard errors for a regular regression
model or a zero-inflated regression model (via the glm.cmp
function).
Further, the user can perform a hypothesis test to determine the statistically significant need for using COM-Poisson regression to model the data. The test addresses the matter of statistically significant dispersion.
The main order of functions for COM-Poisson regression is as follows:
Compute Poisson estimates (using glm
for Poisson regression
or pscl
for ZIP regression).
Use Poisson estimates as starting values to determine COM-Poisson
estimates (using glm.cmp
).
Compute associated standard errors (using sdev
function).
From here, there are many ways to proceed, so order is irrelevant:
Perform a hypothesis test to assess for statistically significant
dispersion (using equitest
or parametric.bootstrap
).
Compute leverage (using leverage) and deviance (using deviance).
Predict the outcome for new examples, using predict.
The package also supports fitting of the zero-inflated COM-Poisson model (ZICMP). Most of the tools available for COM-Poisson are also available for ZICMP.
As of version 0.5.0 of this package, a hybrid method is used to compute
the normalizing constant for the COM-Poisson density.
A closed-form approximation (Shmueli et al, 2005; Gillispie & Green, 2015)
to the exact sum is used if the given
is sufficiently large
and
is sufficiently small. Otherwise, an exact summation is used,
except that the number of terms is truncated to meet a given accuracy.
Previous versions of the package used simple truncation (defaulting to 100
terms), but this was found to be inaccurate in some settings.
See the package vignette for a more comprehensive guide on package use and explanations of the computations.
Kimberly Sellers, Thomas Lotze, Andrew M. Raim
Steven B. Gillispie & Christopher G. Green (2015) Approximating the Conway-Maxwell-Poisson distribution normalization constant, Statistics, 49:5, 1062-1073.
Kimberly F. Sellers & Galit Shmueli (2010). A Flexible Regression Model for Count Data. Annals of Applied Statistics, 4(2), 943-961.
Kimberly F. Sellers and Andrew M. Raim (2016). A Flexible Zero-Inflated Model to Address Data Dispersion. Computational Statistics and Data Analysis, 99, 68-80.
Galit Shmueli, Thomas P. Minka, Joseph B. Kadane, Sharad Borle, and Peter Boatwright (2005). A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution. Journal of Royal Statistical Society C, 54, 127-142.
Useful links:
Functions for the COM-Poisson distribution.
dcmp(x, lambda, nu, log = FALSE, control = NULL) rcmp(n, lambda, nu, control = NULL) pcmp(x, lambda, nu, control = NULL) qcmp(q, lambda, nu, log.p = FALSE, control = NULL) ecmp(lambda, nu, control = NULL) vcmp(lambda, nu, control = NULL) ncmp(lambda, nu, log = FALSE, control = NULL) tcmp(lambda, nu, control = NULL)
dcmp(x, lambda, nu, log = FALSE, control = NULL) rcmp(n, lambda, nu, control = NULL) pcmp(x, lambda, nu, control = NULL) qcmp(q, lambda, nu, log.p = FALSE, control = NULL) ecmp(lambda, nu, control = NULL) vcmp(lambda, nu, control = NULL) ncmp(lambda, nu, log = FALSE, control = NULL) tcmp(lambda, nu, control = NULL)
x |
vector of quantiles. |
lambda |
rate parameter. |
nu |
dispersion parameter. |
log |
logical; if TRUE, probabilities are returned on log-scale. |
control |
a |
n |
number of observations. |
q |
vector of probabilities. |
log.p |
logical; if TRUE, probabilities |
density,
cumulative probability,
quantiles,
generate random variates,
expected value,
variance,
value of the normalizing constant, and
upper value used to compute the normalizing constant under truncation method.
Kimberly Sellers
Kimberly F. Sellers & Galit Shmueli (2010). A Flexible Regression Model for Count Data. Annals of Applied Statistics, 4(2), 943-961.
Global options used by the COMPoissonReg package.
COMPoissonReg.control |
A default control data structure for the package. See the helper function get.control for a description of contents. |
getOption("COMPoissonReg.control")
A dataset investigating the impact of education level and level of anxious attachment on unwanted pursuit behaviors in the context of couple separation.
data(couple)
data(couple)
number of unwanted pursuit behavior perpetrations.
1 if at least bachelor's degree; 0 otherwise.
continuous measure of anxious attachment.
Loeys, T., Moerkerke, B., DeSmet, O., Buysse, A., 2012. The analysis of zero-inflated count data: Beyond zero-inflated Poisson regression. British J. Math. Statist. Psych. 65 (1), 163-180.
Likelihood ratio test for equidispersion
equitest(object, ...)
equitest(object, ...)
object |
a model object |
... |
other parameters which might be required by the model |
A generic function for the likelihood ratio test for equidispersion using the output of a fitted mode. The function invokes particular methods which depend on the class of the first argument.
Returns the test statistic and p-value determined from a
distribution with
degrees of freedom.
Thomas Lotze
A set of data on airfreight breakage (breakage of ampules filled with some biological substance are shipped in cartons).
data(freight)
data(freight)
number of ampules found broken upon arrival.
number of times carton was transferred from one aircraft to another.
Kutner MH, Nachtsheim CJ, Neter J (2003). Applied Linear Regression Models, Fourth Edition. McGraw-Hill.
Construct a control object to pass additional arguments to a number of functions in the package.
get.control( ymax = 1e+06, optim.method = "L-BFGS-B", optim.control = list(maxit = 150), hybrid.tol = 0.01, truncate.tol = 1e-06 )
get.control( ymax = 1e+06, optim.method = "L-BFGS-B", optim.control = list(maxit = 150), hybrid.tol = 0.01, truncate.tol = 1e-06 )
ymax |
Truncate counts to maximum value of |
optim.method |
Optimization method for maximum likelihood. See the
|
optim.control |
|
hybrid.tol |
Tolerance to decide when to use truncation method versus approximation method to compute quantities based on the normalizing constant. See details. |
truncate.tol |
Tolerance for truncation method. See details. |
A hybrid method is used throughout the package to compute the CMP normalizing
constant and related quantities. When is smaller than
hybrid.tol
, an asymptotic approximation is used; otherwise, infinite
series are truncated to finite summations. More information is given in the
COMPoissonReg
vignette.
The element ymax
protects against very long computations. Users
should beware when increasing this significantly beyond the default, as it
may result in a session which needs to be terminated.
List of controls.
Construct an object that specifies which indices of coefficients should remain fixed in maximum likelihood computation.
get.fixed(beta = integer(0), gamma = integer(0), zeta = integer(0))
get.fixed(beta = integer(0), gamma = integer(0), zeta = integer(0))
beta |
Vector of indices of |
gamma |
Vector of indices of |
zeta |
Vector of indices of |
Arguments are expected to be vectors of integers. These are interpreted as
the indices to keep fixed during optimization. For example,
beta = c(1L, 1L, 2L)
indicates that the first and second elements of
beta
should remain fixed. Note that duplicate indices are ignored.
The default value is the empty vector integer(0)
, which requests that
no elements of the given coefficient vector should be fixed.
List of vectors indicating fixed indices.
Construct initial values for coefficients.
get.init(beta = NULL, gamma = NULL, zeta = NULL)
get.init(beta = NULL, gamma = NULL, zeta = NULL)
beta |
Vector for |
gamma |
Vector for |
zeta |
Vector for |
The default value NULL
is interpreted as an empty vector, so that the
given component is absent from the model.
List of initial value terms.
Construct initial values for coefficients with zeros.
get.init.zero(d1 = 0, d2 = 0, d3 = 0)
get.init.zero(d1 = 0, d2 = 0, d3 = 0)
d1 |
Dimension of |
d2 |
Dimension of |
d3 |
Dimension of |
List of initial value terms containing all zeros.
Construct model matrices and offsets for CMP/ZICMP regression
get.modelmatrix(X = NULL, S = NULL, W = NULL, offset = NULL)
get.modelmatrix(X = NULL, S = NULL, W = NULL, offset = NULL)
X |
An |
S |
An |
W |
A |
offset |
An offset object. See helper function get.offset. |
List of model matrix terms.
Construct values for offsets.
get.offset(x = NULL, s = NULL, w = NULL)
get.offset(x = NULL, s = NULL, w = NULL)
x |
Vector of offsets to go with |
s |
Vector of offsets to go with |
w |
Vector of offsets to go with |
The default value NULL
is interpreted as a vector of zeros. At least
one component must be non-NULL so that the dimension can be determined.
List of offset terms.
Construct zero values for offsets.
get.offset.zero(n)
get.offset.zero(n)
n |
Number of observations. |
List of offset terms containing all zeros.
Fit COM-Poisson regression using maximum likelihood estimation. Zero-Inflated COM-Poisson can be fit by specifying a regression for the overdispersion parameter.
glm.cmp( formula.lambda, formula.nu = ~1, formula.p = NULL, data = NULL, init = NULL, fixed = NULL, control = NULL, ... )
glm.cmp( formula.lambda, formula.nu = ~1, formula.p = NULL, data = NULL, init = NULL, fixed = NULL, control = NULL, ... )
formula.lambda |
regression formula linked to |
formula.nu |
regression formula linked to |
formula.p |
regression formula linked to |
data |
An optional data.frame with variables to be used with regression formulas. Variables not found here are read from the envionment. |
init |
A data structure that specifies initial values. See the helper function get.init. |
fixed |
A data structure that specifies which coefficients should remain fixed in the maximum likelihood procedure. See the helper function get.fixed. |
control |
A control data structure. See the helper function
get.control. If |
... |
other arguments, such as |
The COM-Poisson regression model is
The Zero-Inflated COM-Poisson regression model assumes that is 0
with probability
or
with probability
,
where
glm.cmp
produces an object of either class cmpfit
or
zicmpfit
, depending on whether zero-inflation is used in the model.
From this object, coefficients and other information can be extracted.
Kimberly Sellers, Thomas Lotze, Andrew Raim
Kimberly F. Sellers & Galit Shmueli (2010). A Flexible Regression Model for Count Data. Annals of Applied Statistics, 4(2), 943-961.
Kimberly F. Sellers and Andrew M. Raim (2016). A Flexible Zero-Inflated Model to Address Data Dispersion. Computational Statistics and Data Analysis, 99, 68-80.
Fit COM-Poisson and Zero-Inflated COM-Poisson regression using a "raw"
interface which bypasses the formula-driven interface of glm.cmp
.
glm.cmp.raw(y, X, S, offset = NULL, init = NULL, fixed = NULL, control = NULL) glm.zicmp.raw( y, X, S, W, offset = NULL, init = NULL, fixed = NULL, control = NULL )
glm.cmp.raw(y, X, S, offset = NULL, init = NULL, fixed = NULL, control = NULL) glm.zicmp.raw( y, X, S, W, offset = NULL, init = NULL, fixed = NULL, control = NULL )
y |
A vector of counts which represent the response . |
X |
Design matrix for the 'lambda' regression. |
S |
Design matrix for the 'nu' regression. |
offset |
A data structure that specifies offsets. See the helper function get.offset. |
init |
A data structure that specifies initial values. See the helper function get.init. |
fixed |
A data structure that specifies which coefficients should remain fixed in the maximum likelihood procedure. See the helper function get.fixed. |
control |
A control data structure. See the helper function get.control. |
W |
Design matrix for the 'p' regression. |
See the glm.cmp.
Supporting Functions for COM-Poisson Regression
## S3 method for class 'cmpfit' summary(object, ...) ## S3 method for class 'cmpfit' print(x, ...) ## S3 method for class 'cmpfit' logLik(object, ...) ## S3 method for class 'cmpfit' AIC(object, ..., k = 2) ## S3 method for class 'cmpfit' BIC(object, ...) ## S3 method for class 'cmpfit' coef(object, type = c("vector", "list"), ...) ## S3 method for class 'cmpfit' nu(object, ...) ## S3 method for class 'cmpfit' sdev(object, type = c("vector", "list"), ...) ## S3 method for class 'cmpfit' vcov(object, ...) ## S3 method for class 'cmpfit' equitest(object, ...) ## S3 method for class 'cmpfit' leverage(object, ...) ## S3 method for class 'cmpfit' deviance(object, ...) ## S3 method for class 'cmpfit' residuals(object, type = c("raw", "quantile"), ...) ## S3 method for class 'cmpfit' predict(object, newdata = NULL, type = c("response", "link"), ...) ## S3 method for class 'cmpfit' parametric.bootstrap(object, reps = 1000, report.period = reps + 1, ...)
## S3 method for class 'cmpfit' summary(object, ...) ## S3 method for class 'cmpfit' print(x, ...) ## S3 method for class 'cmpfit' logLik(object, ...) ## S3 method for class 'cmpfit' AIC(object, ..., k = 2) ## S3 method for class 'cmpfit' BIC(object, ...) ## S3 method for class 'cmpfit' coef(object, type = c("vector", "list"), ...) ## S3 method for class 'cmpfit' nu(object, ...) ## S3 method for class 'cmpfit' sdev(object, type = c("vector", "list"), ...) ## S3 method for class 'cmpfit' vcov(object, ...) ## S3 method for class 'cmpfit' equitest(object, ...) ## S3 method for class 'cmpfit' leverage(object, ...) ## S3 method for class 'cmpfit' deviance(object, ...) ## S3 method for class 'cmpfit' residuals(object, type = c("raw", "quantile"), ...) ## S3 method for class 'cmpfit' predict(object, newdata = NULL, type = c("response", "link"), ...) ## S3 method for class 'cmpfit' parametric.bootstrap(object, reps = 1000, report.period = reps + 1, ...)
object |
object of type |
... |
other arguments, such as |
x |
object of type |
k |
Penalty per parameter to be used in AIC calculation. |
type |
Specifies quantity to be computed. See details. |
newdata |
New covariates to be used for prediction. |
reps |
Number of bootstrap repetitions. |
report.period |
Report progress every |
The function residuals
returns raw residuals when
type = "raw"
and quantile residuals when
type = "quantile"
.
The function predict
returns expected values of the outcomes,
eveluated at the computed estimates, when type = "response"
. When
type = "link"
, a data.frame
is instead returned with
columns corresponding to estimates of lambda
and nu
.
The function coef
returns a vector of coefficient estimates in
the form c(beta, gamma)
when type = "vector"
. When
type = "list"
, the estimates are returned as a list with named
elements beta
and gamma
.
The type
argument behaves the same for the sdev
function
as it does for coef
.
Supporting Functions for ZICMP Regression
## S3 method for class 'zicmpfit' summary(object, ...) ## S3 method for class 'zicmpfit' print(x, ...) ## S3 method for class 'zicmpfit' logLik(object, ...) ## S3 method for class 'zicmpfit' AIC(object, ..., k = 2) ## S3 method for class 'zicmpfit' BIC(object, ...) ## S3 method for class 'zicmpfit' coef(object, type = c("vector", "list"), ...) ## S3 method for class 'zicmpfit' nu(object, ...) ## S3 method for class 'zicmpfit' sdev(object, type = c("vector", "list"), ...) ## S3 method for class 'zicmpfit' vcov(object, ...) ## S3 method for class 'zicmpfit' equitest(object, ...) ## S3 method for class 'zicmpfit' deviance(object, ...) ## S3 method for class 'zicmpfit' residuals(object, type = c("raw", "quantile"), ...) ## S3 method for class 'zicmpfit' predict(object, newdata = NULL, type = c("response", "link"), ...) ## S3 method for class 'zicmpfit' parametric.bootstrap(object, reps = 1000, report.period = reps + 1, ...)
## S3 method for class 'zicmpfit' summary(object, ...) ## S3 method for class 'zicmpfit' print(x, ...) ## S3 method for class 'zicmpfit' logLik(object, ...) ## S3 method for class 'zicmpfit' AIC(object, ..., k = 2) ## S3 method for class 'zicmpfit' BIC(object, ...) ## S3 method for class 'zicmpfit' coef(object, type = c("vector", "list"), ...) ## S3 method for class 'zicmpfit' nu(object, ...) ## S3 method for class 'zicmpfit' sdev(object, type = c("vector", "list"), ...) ## S3 method for class 'zicmpfit' vcov(object, ...) ## S3 method for class 'zicmpfit' equitest(object, ...) ## S3 method for class 'zicmpfit' deviance(object, ...) ## S3 method for class 'zicmpfit' residuals(object, type = c("raw", "quantile"), ...) ## S3 method for class 'zicmpfit' predict(object, newdata = NULL, type = c("response", "link"), ...) ## S3 method for class 'zicmpfit' parametric.bootstrap(object, reps = 1000, report.period = reps + 1, ...)
object |
object of type |
... |
other arguments, such as |
x |
object of type |
k |
Penalty per parameter to be used in AIC calculation. |
type |
Specifies quantity to be computed. See details. |
newdata |
New covariates to be used for prediction. |
reps |
Number of bootstrap repetitions. |
report.period |
Report progress every |
The function residuals
returns raw residuals when
type = "raw"
and quantile residuals when
type = "quantile"
.
The function predict
returns expected values of the outcomes,
eveluated at the computed estimates, when type = "response"
. When
type = "link"
, a data.frame
is instead returned with
columns corresponding to estimates of lambda
, nu
, and
p
.
The function coef
returns a vector of coefficient estimates in
the form c(beta, gamma, zeta)
when type = "vector"
. When
type = "list"
, the estimates are returned as a list with named
elements beta
and gamma
, and zeta
.
The type
argument behaves the same for the sdev
function
as it does for coef
.
A generic function for the leverage of points used in various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
leverage(object, ...)
leverage(object, ...)
object |
a model object |
... |
other parameters which might be required by the model |
See the documentation of the particular methods for details.
The form of the value returned depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.
Thomas Lotze
(Deprecated) A generic function for the dispersion parameter estimate from the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
nu(object, ...)
nu(object, ...)
object |
a model object |
... |
other parameters which might be required by the model |
See the documentation of the particular methods for details.
The form of the value returned depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.
predict
A generic function for the parametric bootstrap from the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
parametric.bootstrap(object, reps = 1000, report.period = reps + 1, ...)
parametric.bootstrap(object, reps = 1000, report.period = reps + 1, ...)
object |
a model object |
reps |
Number of bootstrap repetitions. |
report.period |
Report progress every |
... |
other parameters which might be required by the model |
See the documentation of the particular methods for details.
The form of the value returned depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.
Thomas Lotze
A generic function for standard deviation estimates from the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
sdev(object, ...)
sdev(object, ...)
object |
a model object |
... |
other parameters which might be required by the model |
See the documentation of the particular methods for details.
The form of the value returned depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.
Thomas Lotze
Computes the density, cumulative probability, quantiles, and random draws for the zero-inflated COM-Poisson distribution.
dzicmp(x, lambda, nu, p, log = FALSE, control = NULL) rzicmp(n, lambda, nu, p, control = NULL) pzicmp(x, lambda, nu, p, control = NULL) qzicmp(q, lambda, nu, p, log.p = FALSE, control = NULL) ezicmp(lambda, nu, p, control = NULL) vzicmp(lambda, nu, p, control = NULL)
dzicmp(x, lambda, nu, p, log = FALSE, control = NULL) rzicmp(n, lambda, nu, p, control = NULL) pzicmp(x, lambda, nu, p, control = NULL) qzicmp(q, lambda, nu, p, log.p = FALSE, control = NULL) ezicmp(lambda, nu, p, control = NULL) vzicmp(lambda, nu, p, control = NULL)
x |
vector of quantiles. |
lambda |
rate parameter. |
nu |
dispersion parameter. |
p |
zero-inflation probability parameter. |
log |
logical; if TRUE, probabilities are returned on log-scale. |
control |
a |
n |
number of observations. |
q |
vector of probabilities. |
log.p |
logical; if TRUE, probabilities p are given as |
density,
cumulative probability,
quantiles,
generate random variates,
expected value. and
variance.
Kimberly Sellers, Andrew Raim
Kimberly F. Sellers and Andrew M. Raim (2016). A Flexible Zero-Inflated Model to Address Data Dispersion. Computational Statistics and Data Analysis, 99, 68-80.
Functions for the COM-Poisson distribution.
dzip(x, lambda, p, log = FALSE) rzip(n, lambda, p) pzip(x, lambda, p) qzip(q, lambda, p, log.p = FALSE) ezip(lambda, p) vzip(lambda, p)
dzip(x, lambda, p, log = FALSE) rzip(n, lambda, p) pzip(x, lambda, p) qzip(q, lambda, p, log.p = FALSE) ezip(lambda, p) vzip(lambda, p)
x |
vector of quantiles. |
lambda |
rate parameter. |
p |
zero-inflation probability parameter. |
log |
logical; if TRUE, probabilities are returned on log-scale. |
n |
number of observations. |
q |
vector of probabilities. |
log.p |
logical; if TRUE, probabilities |
density,
cumulative probability,
quantiles,
generate random variates,
expected value,
variance,
Kimberly Sellers