The R package COMPoissonReg supports Conway-Maxwell
Poisson (CMP) and Zero-Inflated Conway-Maxwell Poisson (ZICMP) models
for analysis of count data in a flexible manner, to account for data
dispersion relative to a Poisson model. The package provides regression
functionality in addition to basic distribution functions. Interested
users can refer to Sellers and Shmueli
(2010) and Sellers and Raim (2016)
regarding the underlying theoretical developments for the CMP and ZICMP
regressions, respectively. A full specification of the public
COMPoissonReg interface can be found in the manual. In
addition to package prerequisites Rcpp (Eddelbuettel 2013) and numDeriv
(Gilbert and Varadhan 2019),
ggplot2 (Wickham 2016) is
also used in this vignette.
One of the challenges of working with CMP and ZICMP lies in computing
the normalizing constant and related quantities. The normalizing
constant does not have a simple closed form in general and can quickly
increase or decrease magnitude as parameters are varied.
COMPoissonReg takes a hybrid approach of either truncating
the infinite series or making use of an approximation, depending on
parameter values.
The remainder of the vignette proceeds as follows. Section \(\ref{sec:cmp}\) describes functions to
support the CMP distribution, including numerical handling of the
normalizing constant. Section \(\ref{sec:zicmp}\) describes functions for
ZICMP. Finally, Section \(\ref{sec:reg}\) demonstrates regression
functions; Sections \(\ref{sec:cmp-reg}\) and \(\ref{sec:zicmp-reg}\) give specific
examples based on CMP and ZICMP outcomes, respectively. The
COMPoissonReg package is on CRAN at https://cran.r-project.org/package=COMPoissonReg and the
source code is on Github at https://github.com/lotze/COMPoissonReg.
Let \(Y \sim \text{CMP}(\lambda, \nu)\) be a Conway-Maxwell Poisson (CMP) random variable with density \[\begin{align*} f(y \mid \lambda, \nu) = \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad y \in \mathbb{N}, \quad Z(\lambda, \nu) = \sum_{r=0}^\infty \frac{\lambda^r}{(r!)^\nu}, \end{align*}\] where \(\lambda > 0\), \(\nu > 0\), and \(\mathbb{N}\) represents the nonnegative integers \(\{ 0, 1, 2, \ldots\}\). Three notable special cases of \(\text{CMP}(\lambda, \nu)\) help to demonstrate its flexibility in count modeling.
The case \(\nu = 1\) corresponds to \(\text{Poisson}(\lambda)\).
When \(\lambda \in (0,1)\) and \(\nu = 0\), the \(\text{CMP}(\lambda, \nu)\) distribution is \(\text{Geometric}(1-\lambda)\) with density \(f(y \mid \lambda) = (1 - \lambda) \lambda^y\) for \(y \in \mathbb{N}\), which is overdispersed relative to Poisson.
When \(\nu \rightarrow \infty\), \(\text{CMP}(\lambda, \nu)\) converges to a \(\text{Bernoulli}(\lambda / (1 + \lambda))\) distribution which is underdispersed relative to Poisson.
The normalizing constant \(Z(\lambda, \nu)\) presents some challenges in the practical use of CMP models and has been a topic of interest in the CMP literature. In general, there is no simple closed form expression for the series \(Z(\lambda, \nu)\). Shmueli et al. (2005) give the approximation \[\begin{align} Z(\lambda, \nu) &= \frac{ \exp(\nu \lambda^{1/\nu}) }{ \lambda^{(\nu-1)/2\nu} (2\pi)^{(\nu-1)/2} \nu^{1/2} } \left\{ 1 + O(\lambda^{-1/\nu}) \right\}, \label{eqn:approx} \end{align}\] where the \(O(\cdot)\) term vanishes as \(\lambda^{-1/\nu}\) becomes small. Approximations have been further studied and refined in subsequent literature; see for example Gillispie and Green (2015), Daly and Gaunt (2016), and Gaunt et al. (2019). The expression in \(\eqref{eqn:approx}\) emphasizes that the magnitude of \(Z(\lambda, \nu)\) explodes as \(\nu \rightarrow 0\) when \(\lambda > 1\). For example, \(Z(2, 0.075) \approx e^{780.515}\) is too large to store as a double-precision floating point number, and may evaluate to infinity if care is not taken. In contrast, \(Z(\lambda, \nu) \rightarrow 1/(1 - \lambda)\) when \(\lambda < 1\) and \(\nu \rightarrow 0\).
In practice, the COMPoissonReg package does not place
constraints on \(\lambda\) and \(\nu\), except to ensure that they are
positive, so that their values are driven by the data or the user’s
selection. A hybrid strategy motivated by \(\eqref{eqn:approx}\) is taken by
COMPoissonReg.
To compute \(Z(\lambda, \nu)\), suppose we are given a small tolerance \(\delta > 0\). If \[\begin{align} \lambda^{-1/\nu} < \delta, \label{eqn:can_approx} \end{align}\] the first term of \(\eqref{eqn:approx}\) dominates the second term, and we take \[\begin{align} Z(\lambda, \nu) &\approx \frac{ \exp(\nu \lambda^{1/\nu}) }{ \lambda^{(\nu-1)/2\nu} (2\pi)^{(\nu-1)/2} \nu^{1/2} } \nonumber \\ &=\exp\left\{ \nu \lambda^{1/\nu} - \frac{\nu-1}{2\nu} \log \lambda - \frac{\nu-1}{2} \log(2\pi) - \frac{1}{2} \log \nu \right\}. \label{eqn:z-approx} \end{align}\] as an approximation. Otherwise, the series is computed by truncating to a finite number of terms, which is described next. In either case, computations are kept on the log-scale as much as possible to accommodate numbers with potentially very large and very small magnitudes.
We approximate \(Z(\lambda, \nu)\) by a finite summation \(Z^{(M)}(\lambda, \nu) = \sum_{r=0}^M \lambda^r / (r!)^\nu\) if condition \(\eqref{eqn:can_approx}\) fails, so that the remainder is smaller than a given tolerance. The general approach is described in Appendix B of Shmueli et al. (2005). Robbins (1955) gives bounds for Stirling’s approximation as \[\begin{align*} \sqrt{2\pi} n^{n + 1/2} e^{-n} e^{1 / (12n + 1)} < n! < \sqrt{2\pi} n^{n + 1/2} e^{-n} e^{1 / (12n)}. \end{align*}\] Noting that \(e^{1 / (12n + 1)} \geq 1\) for \(n \geq 1\) and \(\sqrt{2\pi} e^{1 / (12n)} \leq e\) for \(n \geq 2\), we obtain simpler bounds \[\begin{align*} \sqrt{2\pi} n^{n + 1/2} e^{-n} \leq n! \leq e n^{n + 1/2} e^{-n}, \end{align*}\] which will be convenient in the following calculations.3 We may then bound the truncation error for \(Z^{(M)}(\lambda, \nu)\) using \[\begin{align} \lvert Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \rvert &= Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \nonumber \\ &= \sum_{r=M+1}^\infty \frac{\lambda^r}{(r!)^\nu} \nonumber \\ &\leq \sum_{r=M+1}^\infty \frac{\lambda^r}{(2\pi)^{\nu/2} r^{\nu r + \nu/2} e^{-r \nu}} \nonumber \\ &\leq \sum_{r=M+1}^\infty \frac{\lambda^r}{(2\pi)^{\nu/2} (M+1)^{\nu r + \nu/2} e^{-r \nu}} \nonumber \\ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \sum_{r=M+1}^\infty \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^r \label{eqn:geom} \\ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \sum_{r=0}^\infty \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^{r+M+1} \nonumber \\ &= (2\pi)^{-\nu/2} (M+1)^{-\nu/2} \left( \frac{\lambda e^{\nu}}{(M+1)^{\nu}} \right)^{M+1} \frac{1}{1 - \frac{\lambda e^{\nu}}{(M+1)^{\nu}}} \nonumber \\ &=: \Delta_M, \nonumber \end{align}\] assuming that \(|\lambda e^\nu / (M+1)^\nu| < 1\) so that the geometric series in \(\eqref{eqn:geom}\) converges. To ensure this convergence we choose \(M\) at least large enough so that \[\begin{align*} \lambda e^\nu / (M+1)^\nu < 1 \iff M > \lambda^{1/\nu} e - 1. \end{align*}\] For a small given number \(\epsilon > 0\), we may consider bounding the relative error by \[\begin{align} &\frac{\lvert Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu) \rvert}{Z^{(M)}(\lambda, \nu)} \leq \frac{\Delta_M}{Z^{(M)}(\lambda, \nu)} < \epsilon. \label{eqn:rel_error_bound} \end{align}\] The second inequality of \(\eqref{eqn:rel_error_bound}\) can be expressed on the log-scale using \[\begin{align} \log \Delta_M - \log Z^{(M)}(\lambda, \nu) < \log \epsilon, \label{eqn:adaptive-log-scale} \end{align}\] where \[\begin{align*} \log \Delta_M = -\frac{\nu}{2} \log(2\pi) - \nu \left( M+\frac{3}{2} \right) \log (M+1) + (M+1) (\nu + \log \lambda) - \log\left(1 - \frac{\lambda e^\nu}{(M+1)^\nu} \right). \end{align*}\] Therefore, we compute \(Z^{(M)}(\lambda, \nu)\) until at least \(M > \lambda^{1/\nu} e - 1\), increasing \(M\) and updating \(Z^{(M)}(\lambda, \nu)\) until \(\eqref{eqn:adaptive-log-scale}\) is satisfied. This is summarized as Algorithm \(\ref{alg:normconst-trunc}\).
The individual terms \(\lambda^r /
(r!)^\nu\) in the summation may be too large to store at their
original scale. Therefore, summation is carried out at the log-scale,
wherever possible, using the identity \[\begin{align}
\log(x + y) = \log x + \log(1 + \exp\{ \log y - \log x \});
\label{eqn:logadd}
\end{align}\] this is especially helpful when \(0 < y \ll x\), as \(\log x\) may be kept on the log-scale by
the first term of the right-hand side of \(\eqref{eqn:logadd}\), and the standard
library function log1p may be used to accurately compute
\(\log(1 + \phi)\) for very small \(\phi > 0\).
Many of the functions in the user interface of
COMPoissonReg take an optional control
argument, which can be constructed as follows.
The tolerances \(\delta\) and \(\epsilon\) are specified as
hybrid.tol and truncate.tol respectively.
Taking hybrid.tol to be a very small positive number
results in use of the truncated sum \(Z^{(M)}(\lambda, \nu)\), while
hybrid.tol = Inf uses the approximation method \(\eqref{eqn:z-approx}\), except in extreme
cases where \(\lambda^{-1/\nu}\)
evaluates to zero or \(\infty\)
numerically. The argument ymax specifies upper limit \(M\); this is a safety measure which
prevents very large computations unless the user opts to allow them.
When no control object is specified, a global default (via option
COMPoissonReg.control) is used.
> control = getOption("COMPoissonReg.control")
> control$ymax
[1] 1e+06
> control$hybrid.tol
[1] 0.01
> control$truncate.tol
[1] 1e-06The default may be replaced in your current session if desired.
The control object contains several other useful arguments to be discussed later in the vignette.
The ncmp function computes the normalizing constant
\(Z(\lambda, \nu)\) and returns its
value either on the original scale or the log-scale.
> ncmp(lambda = 1.5, nu = 1.2)
[1] 4.01341
> ncmp(lambda = 1.5, nu = 1.2, log = TRUE)
[1] 1.389641
> ncmp(lambda = 1.5, nu = 1.2, log = TRUE, control = get.control(hybrid.tol = 1e10))
[1] 1.373642
> ncmp(lambda = 1.5, nu = 1.2, log = TRUE, control = get.control(hybrid.tol = 1e-10))
[1] 1.389641Before proceeding, let us define a function to display errors and warnings which are intentionally triggered in the remainder of the vignette.
The function tcmp returns the truncation value \(M\) obtained from Algorithm \(\ref{alg:normconst-trunc}\).
> nu_seq = c(1, 0.5, 0.2, 0.1, 0.05, 0.03)
> tryCatch({ tcmp(lambda = 1.5, nu = nu_seq) }, warning = print_warning)
[1] simpleWarning in y_trunc(prep$lambda, prep$nu, tol = truncate.tol, ymax
[2] = ymax): Terms of normalizing constant CMP(1.5, 0.03) could not be
[3] bounded by a geometric series when y <= 1e+06. Consider adjusting the
[4] controls ymax, hybrid.tol, and truncate.tol Note that tcmp returns 1e6 and produces a
warning for the smallest nu value 0.03 because Algorithm
\(\ref{alg:normconst-trunc}\) has
reached ymax = 1e6 before the series could be bounded by a
geometric series. Here, it is likely that support values with
non-negligible mass are being left out. Let us increase
ymax to avoid this problem.
It is also possible to reach the second loop of Algorithm \(\ref{alg:normconst-trunc}\) where the
geometric series can be used, but ymax is not large enough
to satisfy \(\eqref{eqn:adaptive-log-scale}\). Here is
an example where this occurs.
> tcmp(lambda = 1.2, nu = 0.03, control = get.control(ymax = 1200))
Warning in y_trunc(prep$lambda, prep$nu, tol = truncate.tol, ymax = ymax):
Absolute relative error 1.13902e-05 was larger than tolerance 1e-06 with
CMP(1.2, 0.03) truncated to 1200. Consider adjusting the controls ymax,
hybrid.tol, and truncate.tol
[1] 1200Now that we have a somewhat robust computation for the normalizing constant, let us create a plot of the interesting behavior when \(\lambda > 1\) and \(\nu\) is decreasing.
library(ggplot2)
nu_seq = seq(0.03, 1.5, length.out = 20)
nc1 = ncmp(lambda = 0.5, nu = nu_seq, log = TRUE)
nc2 = ncmp(lambda = 1.05, nu = nu_seq, log = TRUE)
nc3 = ncmp(lambda = 1.20, nu = nu_seq, log = TRUE)
ggplot() +
geom_point(data = data.frame(x = nu_seq, y = nc1), aes(x = x, y = y), pch = 1) +
geom_point(data = data.frame(x = nu_seq, y = nc2), aes(x = x, y = y), pch = 2) +
geom_point(data = data.frame(x = nu_seq, y = nc3), aes(x = x, y = y), pch = 3) +
xlab("nu") +
ylab("log of normalizing constant") +
theme_bw()Log of normalizing constant for \(\lambda = 0.5\) (\(\circ\)), \(\lambda = 1.05\) (\(\Delta\)), and \(\lambda = 1.20\) (\(+\)).
We see that with \(\lambda = 1.2\) a value of \(Z(\lambda, 0.03) \approx e^{18.67}\) is obtained, which is an extremely large jump from the next value in the series \(Z(\lambda, 0.1074) \approx e^{3.2}\).
The respective functions for CMP density, variate generation, CDF,
and quantile functions are dcmp, rcmp,
pcmp , and qcmp. Their usage is similar to
distribution functions provided by the R stats package.
> dcmp(0, lambda = 10, nu = 0.9)
[1] 6.819476e-06
> dcmp(0:17, lambda = 10, nu = 0.9, log = TRUE)
[1] -11.895728 -9.593143 -7.914390 -6.600556 -5.545636 -4.691545
[7] -4.001543 -3.450278 -3.019190 -2.694107 -2.463848 -2.319369
[13] -2.253200 -2.259069 -2.331636 -2.466296 -2.659041 -2.906347
> dcmp(c(0, 1, 2), lambda = c(10, 11, 12), nu = c(0.9, 1.0, 1.1), log = TRUE)
[1] -11.895728 -8.602105 -6.071951> rcmp(50, lambda = 10, nu = 0.9)
[1] 10 13 8 18 17 12 13 15 13 10 20 13 13 14 15 5 18 14 14 17 9 20 17 10 10
[26] 16 11 13 10 12 14 12 15 14 11 14 10 11 14 8 13 18 6 13 18 18 11 13 15 9> pcmp(0:17, lambda = 10, nu = 0.9)
[1] 6.819476e-06 7.501423e-05 4.404609e-04 1.800073e-03 5.704532e-03
[6] 1.487703e-02 3.316443e-02 6.490125e-02 1.137420e-01 1.813448e-01
[11] 2.664516e-01 3.647872e-01 4.698497e-01 5.742973e-01 6.714341e-01
[16] 7.563328e-01 8.263482e-01 8.810232e-01> qq = seq(0, 0.95, length.out = 10)
> qcmp(qq, lambda = 10, nu = 0.9)
[1] 0 8 10 11 12 13 14 15 17 19The CMP distribution functions compute the normalizing constant via
Section \(\ref{sec:cmp-normconst}\).
The density dcmp uses the hybrid method, while
rcmp, pcmp, and qcmp use the
truncation method regardless of condition \(\eqref{eqn:can_approx}\). Variate
generation, CDF, and quantiles are then computed on CMP as if it were a
discrete distribution on the sample space \(\{
0, \ldots, M \}\). As with tcmp and
ncmp, a warning is emitted in cases where they may not
produce reliable results.
> tryCatch({ rcmp(1, lambda = 2, nu = 0.01) }, warning = print_warning)
[1] simpleWarning in r_cmp(n, prep$lambda, prep$nu, hybrid.tol,
[2] truncate.tol, ymax): Terms of normalizing constant CMP(2, 0.01) could
[3] not be bounded by a geometric series when y <= 1e+06. Consider
[4] adjusting the controls ymax, hybrid.tol, and truncate.tol The truncation method for qcmp can result in unreliable
quantiles when the requested probability is very close to 1. For
example, the actual quantile for probability \(1\) is \(\infty\), which can be expressed with no
computation, but the computed quantity will be the truncation value
\(M\). More generally, \(\eqref{eqn:rel_error_bound}\) implies that
\[\begin{align}
\frac{Z(\lambda, \nu) - Z^{(M)}(\lambda, \nu)}{Z(\lambda, \nu)} <
\epsilon
&\iff \Prob(Y > M) < \epsilon \nonumber \\
&\iff 1 - F(M \mid \lambda, \nu) < \epsilon \nonumber \\
&\iff F^{-}(1 - \epsilon \mid \lambda, \nu) < M.
\label{eqn:cmp-quantile-limit}
\end{align}\] Therefore, it is possible that quantiles
larger than \(1 - \epsilon\) may be
inaccurately computed with the truncated support.
COMPoissonReg gives a warning when these are requested.
> tryCatch({
+ qcmp(0.9999999, lambda = 1.5, nu = 0.5)
+ }, warning = print_warning)
[1] simpleWarning in qcmp(0.9999999, lambda = 1.5, nu = 0.5): At least one
[2] requested quantile was very close to 1. In particular, 1 of the given
[3] probabilities were greater than 1 - truncate_tol = exp(-1e-06), where
[4] truncate_tol = 1e-06. Associated results may be affected by truncation.
[5] Consider adjusting the controls ymax and truncate.tol or reducing logq.As a sanity check, let us ensure that the empirical density values,
cumulative probabilities, and quantiles of draws from rcmp
align with respective results computed via dcmp,
pcmp, and qcmp.
library(ggplot2)
n = 100000
lambda = 0.5
nu = 0.1
x = rcmp(n, lambda, nu)
xx = seq(-1, max(x)) ## Include -1 to ensure it gets probability zero
qq = seq(0, 0.99, length.out = 100)
fx = dcmp(xx, lambda, nu)
px = pcmp(xx, lambda, nu)
qx = qcmp(qq, lambda, nu)
qx_emp = quantile(x, probs = qq)ggplot() +
geom_bar(data = data.frame(x = x), aes(x = x, y = ..prop..), fill = "NA",
col = "black") +
geom_point(data = data.frame(x = xx[-1], fx = fx[-1]), aes(x, fx)) +
ylab("Density") +
theme_bw()
Warning: The dot-dot notation (`..prop..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(prop)` instead.
This warning is displayed once per session.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.Empirical density of draws (histogram) versus density computed via the dcmp function (points).
ggplot() +
stat_ecdf(data = data.frame(x = x), aes(x), geom = "step") +
geom_point(data = data.frame(x = xx, px = px), aes(x, px)) +
ylab("Probability") +
theme_bw()Empirical CDF of draws (solid line) versus CDF computed via the pcmp function (points).
ggplot() +
geom_point(data = data.frame(x = qq, qx_emp = qx_emp), aes(qq, qx_emp), pch = 1) +
geom_point(data = data.frame(x = qq, qx = qx), aes(qq, qx), pch = 3) +
xlab("Probability") +
ylab("Quantile") +
theme_bw()
Empirical quantiles of draws (o) versus quantiles computed
via the qcmp function (+).
For \(Y \sim \text{CMP}(\lambda,
\nu)\), we can consider computing the expectation and variance of
\(Y\) in two ways. First, if there is a
moderately-sized \(M\) where \(\{ 0, \ldots, M \}\) contains all but a
negligible amount of the mass of \(Y\),
we can compute the moments using truncated summations \[\begin{align*}
\E(Y) = \sum_{y=0}^M y \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)}, \quad
\E(Y^2) = \sum_{y=0}^M y^2 \frac{\lambda^y}{(y!)^\nu Z(\lambda, \nu)},
\quad
\Var(Y) = \E(Y^2) - [\E(Y)]^2.
\end{align*}\] Otherwise, a different approach is taken.
Notice that the expected value is related to the first derivative of the
log-normalizing constant via \[\begin{align*}
&\frac{\partial}{\partial \lambda} \log Z(\lambda, \nu)
= \frac{ \frac{\partial}{\partial \lambda} Z(\lambda, \nu) }{ Z(\lambda,
\nu) }
= \frac{1}{Z(\lambda, \nu)} \sum_{y=0}^\infty y
\frac{\lambda^{y-1}}{(y!)^\nu} \\
&\iff \E(Y) = \lambda \frac{\partial}{\partial \lambda} \log
Z(\lambda, \nu).
\end{align*}\] For the second derivative, \[\begin{align*}
&\frac{\partial^2}{\partial \lambda^2} \log Z(\lambda, \nu)
= \frac{ Z(\lambda, \nu) \frac{\partial^2}{\partial \lambda^2}
Z(\lambda, \nu) - [\frac{\partial}{\partial \lambda} Z(\lambda,
\nu)]^2}{ Z(\lambda, \nu)^2 }
= \frac{1}{Z(\lambda, \nu)} \sum_{y=0}^\infty y(y-1)
\frac{\lambda^{y-2}}{(y!)^\nu} - \left[ \frac{\E(Y)}{\lambda} \right]^2
\\
&\iff \lambda^2 \frac{\partial^2}{\partial \lambda^2} \log
Z(\lambda, \nu) = \E[Y(Y-1)] - [\E(Y)]^2 = \Var(Y) - \E(Y) \\
&\iff \Var(Y) = \lambda^2 \frac{\partial^2}{\partial \lambda^2} \log
Z(\lambda, \nu) + \E(Y).
\end{align*}\] Therefore, we may use first and second
derivatives of \(\log Z(\lambda,
\nu)\), taken with respect to \(\lambda\), to compute \(\E(Y)\) and \(\Var(Y)\). The ecmp and
vcmp functions implement this computation of the
expectation and variance, respectively. The control object is used to
determine whether to use the truncation or differentiation approach.
Condition \(\eqref{eqn:can_approx}\) is
used to determine whether to use the truncation approach (if false) or
differentiation approach (if true).
> ecmp(lambda = 10, nu = 1.2)
[1] 6.727397
> ecmp(lambda = 1.5, nu = 0.5)
[1] 2.815447
> ecmp(lambda = 1.5, nu = 0.05)
[1] 3334.757
> ecmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e-10))
[1] 3334.762
> ecmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e10))
[1] 3334.757> vcmp(lambda = 10, nu = 1.2)
[1] 5.679667
> vcmp(lambda = 1.5, nu = 0.5)
[1] 4.513041
> vcmp(lambda = 1.5, nu = 0.05)
[1] 66505.13
> vcmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e-10))
[1] 66505.03
> vcmp(lambda = 1.5, nu = 0.05, control = get.control(hybrid.tol = 1e10))
[1] 66505.13Provided that an enormously large truncation value \(M\) is not required, we may compute other
moments by truncated sums using tcmp.
Let \(S \sim \text{Bernoulli}(p)\) and \(T \sim \text{CMP}(\lambda, \nu)\) be independent random variables. Then \(Y = (1-S) T\) follows a Zero-Inflated Conway-Maxwell Poisson distribution \(\text{ZICMP}(\lambda, \nu, p)\) with density \[\begin{align*} f(y \mid \lambda, \nu, p) = (1 - p) \frac{\lambda^{y}}{(y!)^{\nu} Z(\lambda, \nu)} + p \cdot \ind(y = 0), \quad y \in \mathbb{N}. \end{align*}\] Like CMP, several interesting special cases are obtained.
Taking \(\nu = 1\) corresponds to Zero-Inflated Poisson \(\text{ZIP}(\lambda, p)\) with density \(f(y \mid \lambda, p) = (1 - p) e^{-\lambda} \lambda^{y} / y! + p \cdot \ind(y = 0)\).
When \(\lambda \in (0,1)\) and \(\nu \rightarrow 0\), \(\text{ZICMP}(\lambda, \nu)\) converges to a Zero-Inflated Geometric distribution with density \(f(y \mid \lambda, p) = (1 - p) (1 - \lambda) \lambda^y + p \cdot \ind(y = 0)\) for \(y \in \mathbb{N}\).
When \(\nu \rightarrow \infty\), \(\text{ZICMP}(\lambda, \nu, p)\) converges to a “Zero-Inflated Bernoulli” distribution with density which remains a Bernoulli distribution with an adjusted success probability. Here the \(\lambda\) and \(p\) parameters are not individually identifiable. Therefore, users may want to avoid zero-inflation in CMP data analyses with extreme underdispersion.
Finally, \(p = 0\) yields \(\text{CMP}(\lambda, \nu)\).
There is a close relationship between the CMP and ZICMP distributions, as we have seen from construction of ZICMP. The relationship between the CMP densities and variate generation mechanisms was given earlier in this section. Denote \(F(x \mid \lambda, \nu)\) and \(F(x \mid \lambda, \nu, p)\) as the CDFs of \(\text{CMP}(\lambda, \nu)\) and \(\text{ZICMP}(\lambda, \nu, p)\), respectively. We have \[\begin{align*} F(x \mid \lambda, \nu, p) = (1-p) F(x \mid \lambda, \nu) + p \cdot \ind(x \geq 0). \end{align*}\] For a given probability \(\phi \in [0,1]\), the associated CMP and ZICMP quantile functions are related via \[\begin{align} F^{-}(\phi \mid \lambda, \nu, p) &= \inf\{ x \in \mathbb{N} : F(x \mid \lambda, \nu, p) \geq \phi \} \nonumber \\ &= \inf\{ x \in \mathbb{N} : (1-p) F(x \mid \lambda, \nu) + p \cdot \ind(x \geq 0) \geq \phi \} \nonumber \\ &= \inf\{ x \in \mathbb{N} : F(x \mid \lambda, \nu) \geq (\phi - p) / (1 - p) \} \nonumber \\ &= F^{-}\left( \frac{\phi - p}{1 - p} \mid \lambda, \nu \right). \label{eqn:zicmp-quantiles} \end{align}\]
The respective functions for ZICMP density, variate generation, CDF,
and quantile functions are dzicmp, rzicmp,
pzicmp , and qzicmp. They make use of the CMP
implementation described in Section \(\ref{sec:cmp}\) such as the criteria to
either truncate or approximate the normalizing constant.
> qq = seq(0, 0.95, length.out = 20)
> rzicmp(20, lambda = 1.5, nu = 0.2, p = 0.25)
[1] 10 17 14 8 0 11 0 19 12 16 8 0 10 0 0 12 8 10 12 0
> dzicmp(c(0, 1, 2), lambda = 1.5, nu = 0.2, p = 0.25)
[1] 0.26630689 0.02446034 0.03194095
> pzicmp(c(0, 1, 2), lambda = 1.5, nu = 0.2, p = 0.25)
[1] 0.2663069 0.2907672 0.3227082
> qzicmp(qq, lambda = 1.5, nu = 0.2, p = 0.25)
[1] 0 0 0 0 0 0 2 3 4 5 6 7 8 9 11 12 13 15 17 20As with qcmp, the qzicmp function is
computed by the truncation method, and cannot accurately compute
quantiles very close to 1. More specifically, from the CMP distribution,
\(\eqref{eqn:cmp-quantile-limit}\)
gives \[\begin{align*}
M &> F^{-}(1 - \epsilon \mid \lambda, \nu) \\
&= F^{-}\left( \frac{\phi_\epsilon - p}{1 - p} \mid \lambda, \nu
\right) \\
&= F^{-}\left( \phi_\epsilon \mid \lambda, \nu, p \right)
\end{align*}\] where \(\phi_\epsilon = (1 - \epsilon)(1-p) + p\)
and the last equality uses \(\eqref{eqn:zicmp-quantiles}\). This
motivates a warning from qzicmp when the argument is larger
than \(\phi_\epsilon\).
> tryCatch({
+ qzicmp(0.9999999, lambda = 1.5, nu = 0.5, p = 0.5)
+ }, warning = print_warning)
[1] simpleWarning in qzicmp(0.9999999, lambda = 1.5, nu = 0.5, p = 0.5): At
[2] least one requested quantile was very close to 1. In particular, 1 of
[3] the given probabilities were greater than (1 - truncate.tol) * (1-p) +
[4] p, where truncate_tol = 1e-06. Associated results may be affected by
[5] truncation. Consider adjusting the controls ymax and truncate.tol or
[6] reducing logq. Let us repeat the sanity check from Section \(\ref{sec:cmp-dist}\) to ensure that the
empirical density values, cumulative probabilities, and quantiles line
up with the ones computed via dzicmp, pzicmp,
and qzicmp.
library(ggplot2)
n = 100000
lambda = 0.5
nu = 0.1
p = 0.5
x = rzicmp(n, lambda, nu, p)
xx = seq(-1, max(x)) ## Include -1 to ensure it gets probability zero
qq = seq(0, 0.99, length.out = 100)
fx = dzicmp(xx, lambda, nu, p)
px = pzicmp(xx, lambda, nu, p)
qx = qzicmp(qq, lambda, nu, p)
qx_emp = quantile(x, probs = qq)ggplot() +
geom_bar(data = data.frame(x = x), aes(x = x, y = ..prop..), fill = "NA",
col = "black") +
geom_point(data = data.frame(x = xx[-1], fx = fx[-1]), aes(x, fx)) +
ylab("Density") +
theme_bw()Empirical density of draws (histogram) versus density computed via the dzicmp function (points).
ggplot() +
stat_ecdf(data = data.frame(x = x), aes(x), geom = "step") +
geom_point(data = data.frame(x = xx, px = px), aes(x, px)) +
ylab("Probability") +
theme_bw()Empirical CDF of draws (solid line) versus CDF computed via the pzicmp function (points).
ggplot() +
geom_point(data = data.frame(x = qq, qx_emp = qx_emp), aes(qq, qx_emp), pch = 1) +
geom_point(data = data.frame(x = qq, qx = qx), aes(qq, qx), pch = 3) +
xlab("Probability") +
ylab("Quantile") +
theme_bw()
Empirical quantiles of draws (o) versus quantiles computed
via the qzicmp function (+).
The expected value and variance of \(Y \sim
\text{ZICMP}(\lambda, \nu, p)\) are apparent from the
construction \(Y = (1-S) T\) given
earlier in this section. Namely, \[\begin{align*}
\E(Y) = (1-p) \E(T)
\quad \text{and} \quad
\Var(Y) = (1-p) \left\{ \Var(T) + p[\E(T)]^2 \right\}
\end{align*}\] may be obtained using formulas for
iterated conditional expectations and variances. They are evaluated in
COMPoissonReg using the ezicmp and
vzicmp functions respectively. These functions make use of
the ecmp and vcmp functions described in
Section \(\ref{sec:cmp-ev}\) to compute
\(\E(T)\) and \(\Var(T)\).
Suppose there are \(n\) subjects
with outcomes \(y_1, \ldots, y_n \in
\mathbb{N}\) and covariates \(\vec{x}_i
\in \mathbb{R}^{d_1}\), \(\vec{s}_i \in
\mathbb{R}^{d_2}\), and \(\vec{w}_i \in
\mathbb{R}^{d_3}\) for \(i = 1, \ldots,
n\). The COMPoissonReg package fits both CMP and
ZICMP regression models.
The CMP regression model assumes that \[\begin{align*}
Y_i \indep \text{CMP}(\lambda_i, \nu_i), \quad i = 1, \ldots, n,
\end{align*}\] where \(\log
\lambda_i = \vec{x}_i^\top \vec{\beta}\) and \(\log \nu_i = \vec{s}_i^\top \vec{\gamma}\).
Writing \(\btheta = (\vec{\beta},
\vec{\gamma})\), the likelihood is \[\begin{align}
L(\btheta) =
\prod_{i=1}^n \left[
\frac{\lambda_i^{y_i}}{(y_i!)^{\nu_i} Z(\lambda_i, \nu_i)}
\right].
\label{eqn:likelihood-cmp}
\end{align}\] The ZICMP regression model assumes that
\[\begin{align*}
Y_i \indep \text{ZICMP}(\lambda_i, \nu_i, p_i), \quad i = 1, \ldots, n,
\end{align*}\] where \(\log
\lambda_i = \vec{x}_i^\top \vec{\beta}\), \(\log \nu_i = \vec{s}_i^\top \vec{\gamma}\),
and \(\logit p_i = \vec{w}_i^\top
\vec{\zeta}\). Writing \(\btheta =
(\vec{\beta}, \vec{\gamma}, \vec{\zeta})\), the likelihood is
\[\begin{align}
L(\btheta) =
\prod_{i=1}^n \left[(1 - p_i)
\frac{\lambda_i^{y_i}}{(y_i!)^{\nu_i} Z(\lambda_i, \nu_i)}
+ p_i \ind(y_i = 0)
\right].
\label{eqn:likelihood-zicmp}
\end{align}\] We will write \(d = d_1 + d_2 + d_3\) for the total
dimension of \(\btheta\). The \(n \times d_1\) design matrix whose rows
consist of \(\vec{x}_i\) will be
denoted \(\vec{X}\). Similarly, we will
write \(\vec{S}\) and \(\vec{W}\) as the \(n \times d_2\) and \(n \times d_3\) design matrices constructed
from \(\vec{s}_i\) and \(\vec{w}_i\), respectively. The
glm.cmp function provides a formula interface to fit models
of both types: \(\eqref{eqn:likelihood-cmp}\) and \(\eqref{eqn:likelihood-zicmp}\).
out = glm.cmp(formula.lambda, formula.nu = ~ 1, formula.p = NULL,
data = NULL, init = NULL, fixed = NULL, control = NULL, ...)The interface contains three formulas: formula.lambda
specifies the regression \(\vec{x}_i^\top
\vec{\beta}\) used for \(\lambda_i\), while formula.nu
and formula.p correspond to \(\vec{s}_i^\top \vec{\gamma}\) for \(\nu_i\) and \(\vec{w}_i^\top \vec{\zeta}\) for \(p_i\), respectively. ZICMP regression is
utilized when formula.p is set to something other than its
default NULL value; otherwise, CMP regression is assumed.
The data argument is used to pass a data.frame
explicitly rather than having the data be read from the local
environment. The init, fixed, and
control arguments and associated helper functions are
described below.
The init argument represents an initial value for the
optimizer. The following functions can be used to construct it.
> get.init(beta = c(1, 2, 3), gamma = c(-1, 1), zeta = c(-2, -1))
$beta
[1] 1 2 3
$gamma
[1] -1 1
$zeta
[1] -2 -1
attr(,"class")
[1] "COMPoissonReg.init"
> get.init.zero(d1 = 3, d2 = 2, d3 = 2)
$beta
[1] 0 0 0
$gamma
[1] 0 0
$zeta
[1] 0 0
attr(,"class")
[1] "COMPoissonReg.init"The fixed argument is used to specify indices of the
three coefficients which will remain fixed at their initial value during
optimization.
> get.fixed(beta = c(1L, 2L), gamma = c(1L))
$beta
[1] 1 2
$gamma
[1] 1
$zeta
integer(0)
attr(,"class")
[1] "COMPoissonReg.fixed"The specification above requests the first two elements of
beta and the first element of gamma to be
fixed. Notice that indices must be integers and that the default value
is an empty integer vector which is interpreted as “no elements are
fixed”. The fixed argument can usually be disregarded but
may be useful in some circumstances; an example is given in Section
\(\ref{sec:zicmp-reg}\).
Specifying the elements of init and fixed
may be somewhat awkward with the formula interface, as they require
knowledge of how formulas will be expanded into design matrices and
coefficients. It can be helpful to produce the design matrices using R’s
model.matrix function.
> model.matrix(formula.lambda, data = data)
> model.matrix(formula.nu, data = data)
> model.matrix(formula.p, data = data)The control argument has been introduced in Section
\(\ref{sec:cmp}\); regression modeling
makes use of several additional arguments. COMPoissonReg
uses optim to compute the maximum likelihood estimate (MLE)
\(\hat{\btheta}\) for \(\btheta\) under the specified model.
Several controls are provided to influence how
COMPoissonReg invokes optim; here are their
default values.
> control = getOption("COMPoissonReg.control")
> control$optim.method
[1] "L-BFGS-B"
> control$optim.control
$maxit
[1] 150The element optim.method is a string which is passed as
the method argument to optim, while
optim.control is a list passed as the control
argument of optim. Note that, for the latter, if an entry
is given for fnscale, it will be ignored and overwritten
internally by COMPoissonReg.
The covariance of \(\hat{\btheta}\)
is estimated by \(\hat{\vec{V}}(\hat{\btheta})
= -[\vec{H}(\hat{\btheta})]^{-1}\), where \(\vec{H}(\btheta) = \partial^2 \log L(\btheta) /
[\partial \btheta \partial \btheta^\top]\) is the Hessian of the
log-likelihood computed by optim. The standard error for
coefficient \(\theta_j\) in \(\btheta\) is then obtained as the square
root of the \(j\)th diagonal of \(\hat{\vec{V}}(\hat{\btheta})\).
We will now illustrate use of the regression tools using two examples whose data are included in the package. Note that these demonstrations are not intended to be complete regression analyses, and results may be slightly different than previously published analyses due to differences in the computations.
The freight dataset (Kutner et
al. 2003) was analyzed using CMP regression by Sellers and Shmueli (2010) and found to exhibit
underdispersion. The data describe \(n =
10\) instances where 1,000 ampules were transported via air
shipment. The outcome of interest is the variable broken
which describes the number of broken ampules in each shipment. The
covariate transfers describes the number of times the
carton was transferred from one aircraft to another during the
shipment.
Let us load and view the dataset.
> data(freight)
> print(freight)
broken transfers
1 16 1
2 9 0
3 17 2
4 12 0
5 22 3
6 13 1
7 8 0
8 15 1
9 19 2
10 11 0Before fitting a CMP regression, let us fit a Poisson regression
model \(Y_i \indep
\text{Poisson}(\lambda_i)\) with \[\begin{align*}
\log \lambda_i = \beta_0 + \beta_1 \cdot \text{transfers}_i.
\end{align*}\] This can be carried out with the standard
glm function.
> glm.out = glm(broken ~ transfers, data = freight, family = poisson)
> summary(glm.out)
Call:
glm(formula = broken ~ transfers, family = poisson, data = freight)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.35295 0.13174 17.86 < 2e-16 ***
transfers 0.26384 0.07924 3.33 0.000869 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 12.5687 on 9 degrees of freedom
Residual deviance: 1.8132 on 8 degrees of freedom
AIC: 50.395
Number of Fisher Scoring iterations: 4Next, let us fit a similar CMP regression model with \[\begin{align*} &\log \lambda_i = \beta_0 + \beta_1 \cdot \text{transfers}_i, \\ &\log \nu_i = \gamma_0, \end{align*}\] using only an intercept for \(\nu_i\).
> cmp.out = glm.cmp(broken ~ transfers, data = freight)
> print(cmp.out)
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) 13.8286 6.2405 2.2159 0.0267
X:transfers 1.4843 0.6892 2.1537 0.03127
S:(Intercept) 1.7550 0.4493 3.9065 9.365e-05
--
Transformed intercept-only parameters
Estimate SE
nu 5.7835 2.5982
--
Chi-squared test for equidispersion
X^2 = 9.1048, df = 1, p-value = 2.5494e-03
--
Elapsed: 0.05 sec Sample size: 10 formula interface
LogLik: -18.6449 AIC: 43.2898 BIC: 44.1975
Optimization Method: L-BFGS-B Converged status: 0
Message: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCHThe coefficients used in the \(\lambda_i\) formula are prefixed with an
X: label, while an S: label is used for
coefficients of the \(\nu_i\) formula.
Notice that estimates for X: coefficients from the CMP fit
are dissimilar to those from the Poisson fit; this may occur when the
estimate of \(\nu\) deviates from the
value of 1. Similarly to the glm output, the output of
glm.cmp displays several quantities for each coefficient
\(\theta_j\), \(j = 1, \ldots, d\): a point estimate \(\hat{\theta}_j\), an associated standard
error \(\widehat{\text{SE}}(\hat{\theta}_j)\), a
z-value \(z_j = \hat{\theta}_j /
\widehat{\text{SE}}(\hat{\theta}_j)\), and a p-value \(2\Phi(-|z_j|)\) for the test \(H_0: \theta_j = 0\) versus \(H_1: \theta_j \neq 0\). Here, \(\Phi\) is the CDF of the standard normal
distribution. Because an intercept-only formula was specified for \(\nu_i\), \(\hat{\nu} = \exp(\hat{\gamma})\) does not
vary with \(i\) and its estimate and
associated standard error are added to the display. Here we see evidence
of underdispersion with \(\hat{\nu} >
1\). A test for equidispersion is displayed to determine whether
there is a significant amount of over or underdispersion in the data. In
particular, a likelihood ratio test is used to decide whether \(H_0: \vec{\gamma} = \vec{0}\) versus \(H_0: \vec{\gamma} \neq \vec{0}\). The test
statistic is displayed along with the degrees of freedom and associated
p-value. Here we have fairly strong evidence to reject the null
hypothesis of equidispersion.
The AIC for the CMP model cmp.out shows some improvement
over that of glm.out. Let us also consider a slope
coefficient for the \(\nu\) component
using \[\begin{align*}
\log \nu_i = \gamma_0 + \gamma_1 \cdot \text{transfers}_i.
\end{align*}\] via the following call to
glm.cmp.
> cmp2.out = glm.cmp(broken ~ transfers, formula.nu = ~ transfers, data = freight)
> print(cmp2.out)
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) 15.5959 7.1087 2.1939 0.02824
X:transfers 4.6322 2.3978 1.9319 0.05338
S:(Intercept) 1.8935 0.4525 4.1846 2.857e-05
S:transfers 0.1206 0.0505 2.3883 0.01693
--
Chi-squared test for equidispersion
X^2 = 11.6991, df = 2, p-value = 2.8812e-03
--
Elapsed: 0.09 sec Sample size: 10 formula interface
LogLik: -17.3477 AIC: 42.6954 BIC: 43.9058
Optimization Method: L-BFGS-B Converged status: 0
Message: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCHModel cmp2.out provides a slight improvement over
cmp.out in AIC and BIC, but we may prefer
cmp.out in the interest of simplicity.
To gain some insight into the optimization, we may wish to increase the trace level, which can be done as follows.
> control = get.control(optim.control = list(maxit = 5, trace = 3, REPORT = 1))
> cmp3.out = glm.cmp(broken ~ transfers, data = freight, control = control)
N = 3, M = 5 machine precision = 2.22045e-16
This problem is unconstrained.
At X0, 0 variables are exactly at the bounds
At iterate 0 f= 273.34 |proj g|= 260.29
At iterate 1 f = 49.923 |proj g|= 113
At iterate 2 f = 46.155 |proj g|= 44.594
At iterate 3 f = 29.883 |proj g|= 22.868
At iterate 4 f = 27.657 |proj g|= 2.6174
At iterate 5 f = 27.599 |proj g|= 4.443
At iterate 6 f = 27.359 |proj g|= 7.3357
final value 27.359209
stopped after 6 iterations
N = 2, M = 5 machine precision = 2.22045e-16
This problem is unconstrained.
At X0, 0 variables are exactly at the bounds
At iterate 0 f= 148.87 |proj g|= 152.94
At iterate 1 f = 57.259 |proj g|= 146.63
At iterate 2 f = 53.033 |proj g|= 67.913
At iterate 3 f = 42.039 |proj g|= 42.645
At iterate 4 f = 36.942 |proj g|= 38.126
At iterate 5 f = 32.584 |proj g|= 43.852
At iterate 6 f = 24.06 |proj g|= 17.695
final value 24.060495
stopped after 6 iterationsData from the local environment may be passed to the
glm.cmp function without explicitly using the
data argument.
In a count regression model, it may be desirable to include offset
terms such as \[\begin{align*}
\log \lambda_i = \vec{x}_i^\top \vec{\beta} + \text{offx}_i, \quad
\log \nu_i = \vec{s}_i^\top \vec{\beta} + \text{offs}_i.
\end{align*}\] An offset term may be used
in the formula interface to accomplish this.
> freight$offx = 13
> freight$offs = 1
> glm.cmp(broken ~ transfers + offset(offx), data = freight)
> glm.cmp(broken ~ transfers + offset(offx), formula.nu = ~1 + offset(offs), data = freight)For users who wish to bypass the formula interface and prepare the \(\vec{X}\) and \(\vec{S}\) design matrices manually, a “raw” interface to the regression functionality is also provided.
Several accessors are provided to extract results from the output object.
> logLik(cmp.out) ## Log-likelihood evaluated at MLE.
[1] -18.64489
> AIC(cmp.out) ## AIC evaluated at MLE.
[1] 43.28978
> BIC(cmp.out) ## BIC evaluated at MLE.
[1] 44.19754
> coef(cmp.out) ## Estimates of theta as a flat vector
X:(Intercept) X:transfers S:(Intercept)
13.828561 1.484303 1.755002
> coef(cmp.out, type = "list") ## Estimates of theta as a named list
$beta
X:(Intercept) X:transfers
13.828561 1.484303
$gamma
S:(Intercept)
1.755002
> vcov(cmp.out) ## Estimated covariance matrix of theta hat
X:(Intercept) X:transfers S:(Intercept)
X:(Intercept) 38.944034 4.0877545 2.8000943
X:transfers 4.087754 0.4749988 0.2978880
S:(Intercept) 2.800094 0.2978880 0.2018301
> sdev(cmp.out) ## Standard deviations from vcov(...) diagonals
X:(Intercept) X:transfers S:(Intercept)
6.2405155 0.6892016 0.4492551
> sdev(cmp.out, type = "list") ## Standard deviations as a named list
$beta
[1] 6.2405155 0.6892016
$gamma
[1] 0.4492551The predict function computes several useful quantities
evaluated at the estimate \(\hat{\btheta}\). The argument default
type = "response" produces a vector of estimates of the
response \(\hat{y}_i = \E(Y_i)\) for
\(i = 1, \ldots, n\) using the method
described in Section \(\ref{sec:cmp-ev}\). The argument
type = "link" produces a data.frame with
columns for the linked parameters \(\lambda_i\) and \(\nu_i\).
> predict(cmp.out)
[1] 13.70507 10.50769 17.83752 10.50769 23.17871 13.70507 10.50769 13.70507
[9] 17.83752 10.50769
> predict(cmp.out, type = "link")
lambda nu
1 4469845 5.78346
2 1013136 5.78346
3 19720454 5.78346
4 1013136 5.78346
5 87004438 5.78346
6 4469845 5.78346
7 1013136 5.78346
8 4469845 5.78346
9 19720454 5.78346
10 1013136 5.78346Note that the estimated nu values are equal for all
observations because the model assumed only an intercept term for the
dispersion component.
We can also use predict on new covariate values. Note
that models fit with the formula interface expect the new data to be
provided as a data.frame which is interpreted using the
formula used to fit the model. If the raw interface was used to fit the
model, use the get.modelmatrix function to specify design
matrices to use for prediction.
# Prepare new data to fit by formula interface
new.df = data.frame(transfers = 0:10)
# Prepare new data to fit by raw interface
X = model.matrix(~ transfers, data = new.df)
S = model.matrix(~ 1, data = new.df)
new.data = get.modelmatrix(X = X, S = S)
# Pass new data to model from by formula interface
y.hat.new = predict(cmp.out, newdata = new.df)
# Pass new data to model from by raw interface
y.hat.new = predict(cmp.raw.out, newdata = new.data)
# Compute predictions for links
predict.out = predict(cmp.out, newdata = new.df, type = "link")
# Plot predictions
ggplot() +
geom_point(data = new.df, aes(transfers, y.hat.new)) +
xlab("Number of transfers") +
ylab("Predicted number broken") +
theme_bw()> print(y.hat.new)
[1] 1.176542 2.690927 5.694026 11.715378 23.807032 48.096062
[7] 96.889869 194.912671 391.832565 787.430083 1582.156271
> print(predict.out)
lambda nu
1 1.013136e+06 5.78346
2 4.469845e+06 5.78346
3 1.972045e+07 5.78346
4 8.700444e+07 5.78346
5 3.838538e+08 5.78346
6 1.693520e+09 5.78346
7 7.471622e+09 5.78346
8 3.296396e+10 5.78346
9 1.454333e+11 5.78346
10 6.416355e+11 5.78346
11 2.830824e+12 5.78346The leverage function computes the diagonal entries of a
“hat” matrix which can be formulated in CMP regression. These can be
used to diagnose influential observations. For details, see Section 3.6
of Sellers and Shmueli (2010).
> leverage(cmp.out)
[1] 0.1541752 0.1943925 0.2733327 0.2259140 0.6002368 0.3795226 0.2379250
[8] 0.1117655 0.3653074 0.4574283The residuals function provides either raw (the default)
or quantile-based residuals (Dunn and Smyth
1996). In a CMP regression setting, raw residuals \(y_i - \hat{y}_i\) generally do not work
well with traditional regression diagnostics, such as Q-Q plots.
Quantile-based residuals often produce interpretable diagnostics;
however, a random element is used in the computation of quantile
residuals for discrete distributions. This aids interpretability but
gives slightly different residual values each time they are computed.
See Dunn and Smyth (1996) for details.
Pearson residuals may be preferred over raw residuals for diagnostics; these can be obtained by standardizing raw residuals using leverage values and variance estimates.
> link.hat = predict(cmp.out, type = "link")
> vv = vcmp(link.hat$lambda, link.hat$nu)
> hh = leverage(cmp.out)
> res.pearson = res.raw / sqrt(vv*(1-hh))For each type of residual—raw, Pearson, and quantile—we now plot fitted values versus residuals and Q-Q plots.
plot.fit.res = function(y.hat, res) {
ggplot(data.frame(y = y.hat, res = res)) +
geom_point(aes(y, res)) +
xlab("Fitted Value") +
ylab("Residual Value") +
theme_bw() +
theme(plot.title = element_text(size = 10))
}
plot.qq.res = function(res) {
ggplot(data.frame(res = res), aes(sample = res)) +
stat_qq() +
stat_qq_line() +
theme_bw() +
theme(plot.title = element_text(size = 10))
}
y.hat = predict(cmp.out)
plot.fit.res(y.hat, res.raw) +
ggtitle("Fitted Values vs. Raw Residuals")
plot.qq.res(res.raw) +
ggtitle("Q-Q Plot of Raw Residuals")
plot.fit.res(y.hat, res.pearson) +
ggtitle("Fitted Values vs. Pearson Residuals")
plot.qq.res(res.pearson) +
ggtitle("Q-Q Plot of Pearson Residuals")
plot.fit.res(y.hat, res.qtl) +
ggtitle("Fitted Values vs. Quantile Residuals")
plot.qq.res(res.qtl) +
ggtitle("Q-Q Plot of Quantile Residuals")In this example, with only 10 observations, it is difficult to see an advantage of using quantile residuals; the benefit will be more apparent in Section \(\ref{sec:zicmp-reg}\). One benefit of raw residuals is that they may be used to compute a mean-squared error.
To access the results of the equidispersion test shown in the output
of cmp.out, we may use the equitest accessor
function.
The deviance function computes the deviance quantities
\(D_i = -2 [\log L_i(\hat{\btheta}) - \log
L_i(\tilde{\btheta}_i)]\) for \(i = 1,
\ldots, n\), where \(L_i(\btheta)\) is the term of the
likelihood corresponding to the \(i\)th
observation, \(\hat{\btheta}\) is the
MLE computed under the full likelihood \(L(\btheta) = \prod_{i=1}^n L_i(\btheta)\),
and \(\tilde{\btheta}_i\) is the
maximizer of \(L_i(\btheta)\).
> deviance(cmp.out)
[1] 0.3376988 -0.5150035 -1.7750671 -0.6863613 -2.2081582 -1.9383878
[7] 2.1710154 -1.1354954 -1.6584238 -2.1772063The parametric.bootstrap function carries out a
parametric bootstrap with \(R\)
repetitions. Using the fitted MLE \(\hat{\btheta}\), bootstrap samples \(\vec{y}^{(r)} = (y_1^{(r)}, \ldots,
y_n^{(r)})\) are drawn from the likelihood \(L(\hat{\btheta})\) for \(r = 1, \ldots, R\). Estimate \(\hat{\btheta}^{(r)}\) is fitted from
bootstrap sample \(\vec{y}^{(r)}\). An
\(R \times d\) matrix is returned whose
\(r\)th row is \(\hat{\btheta}^{(r)}\).
> cmp.boot = parametric.bootstrap(cmp.out, reps = 100)
> head(cmp.boot)
(Intercept) transfers (Intercept)
[1,] 12.435045 1.066322 1.620051
[2,] 22.895677 2.768967 2.297103
[3,] 21.989757 2.518793 2.223626
[4,] 26.788708 2.663686 2.416597
[5,] 15.550523 2.011527 1.873008
[6,] 8.892033 1.240461 1.348294We used \(R = 100\) in the display above to keep vignette computations small, but a larger number may be desired in practice. Bootstrap repetitions can be used, for example, to compute 95% confidence intervals for each of the coefficients.
Large covariates can present numerical difficulties in fitting CMP regression. We will briefly demonstrate the difficulties and some possible workarounds. First let us generate a new dataset based on a large covariate in the regression for \(\lambda_i\).
set.seed(1234)
n = 200
x = rnorm(n, 500, 10)
X = cbind(intercept = 1, slope = x)
S = matrix(1, n, 1)
beta_true = c(-0.05, 0.05)
gamma_true = 2
lambda_true = exp(X %*% beta_true)
nu_true = exp(S %*% gamma_true)
y = rcmp(n, lambda_true, nu_true)Notice that the generated counts \(y_1, \ldots, y_n\) are relatively small compared to the covariate \(x_1, \ldots, x_n\).
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
471.4 492.3 498.3 499.4 505.5 530.4
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
22.0 27.0 29.0 29.1 31.0 37.0 An initial attempt to fit the true data-generating model fails.
> tryCatch({
+ glm.cmp(y ~ x, formula.nu = ~ 1)
+ }, error = print_warning)
[1] Error in optim(par.init, loglik, method = optim.method, control =
[2] optim.control, : L-BFGS-B needs finite values of 'fn' Internally, the linked rate parameter \(\lambda_i = \exp(\beta_0 + \beta_1 x_i)\)
may evaluate to Inf or become very close to zero as the
optimizer moves \(\beta_1\) away from
zero in a positive or negative direction, respectively. Some possible
ways to address this are as follows.
Standardize the covariate to have mean zero and variance one.
> glm.cmp(y ~ scale(x), formula.nu = ~ 1)
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) 26.0617 2.5565 10.1944 2.1e-24
X:scale(x) 0.5253 0.0627 8.3848 5.082e-17
S:(Intercept) 2.0416 0.0980 20.8290 2.365e-96
--
Transformed intercept-only parameters
Estimate SE
nu 7.7031 0.755
--
Chi-squared test for equidispersion
X^2 = 232.4625, df = 1, p-value = 1.7311e-52
--
Elapsed: 0.81 sec Sample size: 200 formula interface
LogLik: -417.9344 AIC: 841.8689 BIC: 851.7638
Optimization Method: L-BFGS-B Converged status: 0
Message: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCHUse a logarithmic transformation on the covariate.
> glm.cmp(y ~ log(x), formula.nu = ~ 1)
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) -134.3015 17.4934 -7.6773 1.625e-14
X:log(x) 25.8123 3.1597 8.1693 3.103e-16
S:(Intercept) 2.0422 0.1017 20.0764 1.188e-89
--
Transformed intercept-only parameters
Estimate SE
nu 7.7074 0.784
--
Chi-squared test for equidispersion
X^2 = 232.3911, df = 1, p-value = 1.7943e-52
--
Elapsed: 0.95 sec Sample size: 200 formula interface
LogLik: -417.9869 AIC: 841.9738 BIC: 851.8688
Optimization Method: L-BFGS-B Converged status: 0
Message: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCHChange optimization method or other optim arguments.
> control = get.control(optim.method = "BFGS", optim.control = list(maxit = 200))
> suppressWarnings({
+ cmp.out = glm.cmp(y ~ x, formula.nu = ~ 1, control = control)
+ print(cmp.out)
+ })
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) 12.1759 0.0467 260.6333 0
X:x -0.0210 NaN NaN NaN
S:(Intercept) -0.6747 NaN NaN NaN
--
Transformed intercept-only parameters
Estimate SE
nu 0.5093 NaN
--
Chi-squared test for equidispersion
X^2 = 254.5417, df = 1, p-value = 2.6567e-57
--
Elapsed: 0.15 sec Sample size: 200 formula interface
LogLik: -937.3960 AIC: 1880.7921 BIC: 1890.6870
Optimization Method: BFGS Converged status: 0In this case, standardization and logarithmic transformation produce
a usable fit. Changing the optimization method to BFGS
allows the optimization to finish, but there are further numerical
problems in computing the Hessian for standard errors.
Now consider a generated dataset with large outcomes but a relatively small covariate. This situation can also present numerical difficulties.
set.seed(1234)
n = 200
x = runif(n, 1, 2)
X = cbind(intercept = 1, slope = x)
S = matrix(1, n, 1)
beta_true = c(1, 1)
gamma_true = -0.95
lambda_true = exp(X %*% beta_true)
nu_true = exp(S %*% gamma_true)
y = rcmp(n, lambda_true, nu_true)> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.009 1.238 1.486 1.483 1.735 1.999
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
156.0 332.0 606.5 810.5 1161.2 2391.0 An initial attempt to fit the data-generating model fails.
> tryCatch({
+ glm.cmp(y ~ x, formula.nu = ~ 1)
+ }, error = print_warning)
[1] Error in optim(par.init, loglik, method = optim.method, control =
[2] optim.control, : L-BFGS-B needs finite values of 'fn' Informative starting values help the optimizer to initially make progress. True data-generating parameters will not be available in a real data analysis situation but help to illustrate the idea.
> init = get.init(beta = beta_true, gamma = gamma_true)
> glm.cmp(y ~ x, formula.nu = ~ 1, init = init)
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) 0.9970 0.1050 9.4933 2.238e-21
X:x 1.0029 0.1047 9.5757 1.012e-21
S:(Intercept) -0.9498 0.1044 -9.0936 9.584e-20
--
Transformed intercept-only parameters
Estimate SE
nu 0.3868 0.0404
--
Chi-squared test for equidispersion
X^2 = 133.5665, df = 1, p-value = 6.7968e-31
--
Elapsed: 0.01 sec Sample size: 200 formula interface
LogLik: -1023.2186 AIC: 2052.4373 BIC: 2062.3322
Optimization Method: L-BFGS-B Converged status: 0
Message: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCHWithout choosing an initial value, changing the optimization method
to Nelder-Mead and increasing the maximum number of
iterations also helps the optimizer find a solution.
> control = get.control(optim.method = "Nelder-Mead", optim.control = list(maxit = 1000))
> glm.cmp(y ~ x, formula.nu = ~ 1, control = control)
CMP coefficients
Estimate SE z-value p-value
X:(Intercept) 0.8460 0.0424 19.9591 1.25e-88
X:x 0.8518 0.0417 20.4230 1.043e-92
S:(Intercept) -1.1133 0.0486 -22.8895 5.915e-116
--
Transformed intercept-only parameters
Estimate SE
nu 0.3285 0.016
--
Chi-squared test for equidispersion
X^2 = 5048.8383, df = 1, p-value = 0.0000e+00
--
Elapsed: 0.04 sec Sample size: 200 formula interface
LogLik: -1024.1845 AIC: 2054.3691 BIC: 2064.2640
Optimization Method: Nelder-Mead Converged status: 0Note that this solution is different from the previous one; the log-likelihood of the previous one is slightly better.
The couple dataset (Loeys et al.
2012) was analyzed with ZICMP regression in Sellers and Raim (2016) and found to exhibit
overdispersion. The data concern separation trajectories of \(n = 387\) couples. The variable
UPB records the number of unwanted pursuit behavior
perpetrations and is considered the outcome of interest. Included
covariates are the binary variable EDUCATION, which is 1 if
at least a bachelor’s degree was attained, and a continuous variable
ANXIETY which measures anxious attachment. A zero-inflated
count model is considered for these data because 246 of the 387 records
have an outcome of zero.
Let us load and view the first few records in the dataset.
> data(couple)
> head(couple)
UPB EDUCATION ANXIETY
1 15 0 1.0053
2 0 0 -0.7034
3 0 1 -0.7034
4 3 1 0.6110
5 12 0 0.2167
6 0 1 2.0569As a preliminary model, let us fit a standard Poisson model \(Y_i \indep \text{Poisson}(\lambda_i)\) with
\[\begin{align*}
\log \lambda_i = \beta_0 + \beta_1 \cdot \text{EDUCATION}_i + \beta_2
\cdot \text{ANXIETY}_i.
\end{align*}\] We may use the standard glm
function.
> glm.out = glm(UPB ~ EDUCATION + ANXIETY, data = couple, family = poisson)
> summary(glm.out)
Call:
glm(formula = UPB ~ EDUCATION + ANXIETY, family = poisson, data = couple)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.81695 0.04386 18.628 <2e-16 ***
EDUCATION -0.21579 0.07047 -3.062 0.0022 **
ANXIETY 0.42169 0.03333 12.651 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2478.3 on 386 degrees of freedom
Residual deviance: 2310.8 on 384 degrees of freedom
AIC: 2782.4
Number of Fisher Scoring iterations: 6Now consider a ZICMP regression with \[\begin{align*}
&\log \lambda_i = \beta_0 + \beta_1 \cdot \text{EDUCATION}_i +
\beta_2 \cdot \text{ANXIETY}_i, \\
&\log \nu_i = \gamma_0, \\
&\logit p_i = \zeta_0 + \zeta_1 \cdot \text{EDUCATION}_i + \zeta_2
\cdot \text{ANXIETY}_i.
\end{align*}\] We use the glm.cmp function
as follows.
> zicmp0.out = glm.cmp(UPB ~ EDUCATION + ANXIETY,
+ formula.nu = ~ 1,
+ formula.p = ~ EDUCATION + ANXIETY,
+ data = couple)
> print(zicmp0.out)
ZICMP coefficients
Estimate SE z-value p-value
X:(Intercept) -0.1604 0.0189 -8.4843 2.171e-17
X:EDUCATION -0.0678 0.0325 -2.0849 0.03708
X:ANXIETY 0.0226 0.0142 1.5923 0.1113
S:(Intercept) -11.0963 59.4424 -0.1867 0.8519
W:(Intercept) 0.4176 0.1599 2.6119 0.009005
W:EDUCATION -0.3863 0.2677 -1.4433 0.1489
W:ANXIETY -0.5234 0.1335 -3.9200 8.856e-05
--
Transformed intercept-only parameters
Estimate SE
nu 0 9e-04
--
Chi-squared test for equidispersion
X^2 = 350.5662, df = 1, p-value = 3.1903e-78
--
Elapsed: 6.21 sec Sample size: 387 formula interface
LogLik: -627.1675 AIC: 1268.3350 BIC: 1296.0440
Optimization Method: L-BFGS-B Converged status: 0
Message: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCHThere are now three sets of coefficients reported in the output: the
X:, S:, and W: prefixes label
estimates for the \(\lambda_i\), \(\nu_i\), and \(p_i\) formulas respectively.
Our attempt to fit the previous model strongly tended to the Zero-Inflated Geometric special case of ZICMP, but SEs computed via the Hessian become large in this region. In this section, we fix \(\gamma_0\) at the extreme \(-\infty\) and fit the remaining coefficients. Let us use the raw interface to do this.
init = coef(zicmp0.out, type = "list")
y = couple$UPB
X = model.matrix(~ EDUCATION + ANXIETY, data = couple)
S = model.matrix(~ 1, data = couple)
W = model.matrix(~ EDUCATION + ANXIETY, data = couple)
control = get.control(optim.method = "BFGS")
zicmp.out = glm.zicmp.raw(y, X, S, W,
init = get.init(beta = c(-1,0,0), gamma = -Inf, zeta = c(-1,0,0)),
fixed = get.fixed(gamma = 1L), control = control)> print(zicmp.out)
ZICMP coefficients
Estimate SE z-value p-value Fixed
X:(Intercept) -0.1605 0.0188 -8.5432 1.305e-17 F
X:EDUCATION -0.0678 0.0325 -2.0837 0.03718 F
X:ANXIETY 0.0226 0.0142 1.5896 0.1119 F
S:(Intercept) -Inf NA NA NA T
W:(Intercept) 0.4172 0.1599 2.6090 0.009082 F
W:EDUCATION -0.3863 0.2678 -1.4428 0.1491 F
W:ANXIETY -0.5245 0.1336 -3.9261 8.632e-05 F
--
Transformed intercept-only parameters
Estimate SE
nu 0 0
--
Some elements of gamma were fixed. Chi-squared test for equidispersion not defined.
--
Elapsed: 1.24 sec Sample size: 387 raw interface
LogLik: -627.1672 AIC: 1266.3345 BIC: 1290.0850
Optimization Method: BFGS Converged status: 0Notice that an additional Fixed column has been added to
the display, indicating that the coefficient gamma is
fixed. Furthermore, its Estimate column is set to the
initial value and the columns SE, z-value, and
p-value are set to NA. This model achieves a
similar log-likelihood value as our first attempt using
L-BFGS-B but does not exhibit signs of numerical
issues.
Here are several of the accessors provided to extract model outputs.
> logLik(zicmp.out) ## Log-likelihood evaluated at MLE
[1] -627.1672
> AIC(zicmp.out) ## AIC evaluated at MLE
[1] 1266.334
> BIC(zicmp.out) ## BIC evaluated at MLE
[1] 1290.085
> coef(zicmp.out) ## Estimates of theta as a flat vector
X:(Intercept) X:EDUCATION X:ANXIETY S:(Intercept) W:(Intercept)
-0.16047147 -0.06776507 0.02257241 -Inf 0.41722279
W:EDUCATION W:ANXIETY
-0.38633736 -0.52450590
> coef(zicmp.out, type = "list") ## Estimates of theta as a named list
$beta
X:(Intercept) X:EDUCATION X:ANXIETY
-0.16047147 -0.06776507 0.02257241
$gamma
S:(Intercept)
-Inf
$zeta
W:(Intercept) W:EDUCATION W:ANXIETY
0.4172228 -0.3863374 -0.5245059
> vcov(zicmp.out) ## Estimated covariance matrix of theta hat
X:(Intercept) X:EDUCATION X:ANXIETY W:(Intercept)
X:(Intercept) 0.0003528180 -2.909813e-04 -1.246030e-04 0.0005767772
X:EDUCATION -0.0002909813 1.057615e-03 2.583975e-05 -0.0005084108
X:ANXIETY -0.0001246030 2.583975e-05 2.016361e-04 -0.0001489969
W:(Intercept) 0.0005767772 -5.084108e-04 -1.489969e-04 0.0255741783
W:EDUCATION -0.0005266101 2.146314e-03 7.074582e-05 -0.0255267345
W:ANXIETY -0.0001695705 1.830298e-04 3.848049e-04 -0.0006753149
W:EDUCATION W:ANXIETY
X:(Intercept) -5.266101e-04 -0.0001695705
X:EDUCATION 2.146314e-03 0.0001830298
X:ANXIETY 7.074582e-05 0.0003848049
W:(Intercept) -2.552673e-02 -0.0006753149
W:EDUCATION 7.169741e-02 0.0009802763
W:ANXIETY 9.802763e-04 0.0178471340
> sdev(zicmp.out) ## Standard deviations from vcov(...) diagonals
X:(Intercept) X:EDUCATION X:ANXIETY W:(Intercept) W:EDUCATION
0.01878345 0.03252099 0.01419986 0.15991929 0.26776372
W:ANXIETY
0.13359317
> sdev(zicmp.out, type = "list") ## Standard deviations as a named list
$beta
[1] 0.01878345 0.03252099 0.01419986
$gamma
[1] NA
$zeta
[1] 0.1599193 0.2677637 0.1335932
> equitest(zicmp0.out) ## Likelihood ratio test for H_0: gamma = 0
$teststat
[1] 350.5662
$pvalue
[1] 3.190326e-78
$df
[1] 1
> tryCatch({ ## An error is thrown for model with fixed gamma
+ equitest(zicmp.out)
+ }, error = print_warning)
[1] Error in equitest.zicmpfit(zicmp.out): Some elements of gamma were
[2] fixed, chi-squared test for equidispersion not defined Because we fixed \(\gamma =
-\infty\) to obtain zicmp0.out, the
equitest function throws an error instead of proceeding
with an equidispersion test.
The predict function behaves similarly as in CMP
regression; however, the link type here also includes a
column with the estimated \(p_i\).
> y.hat = predict(zicmp.out) ## Fitted values based on ecmp
> link.hat = predict(zicmp.out, type = "link")
> head(y.hat)
[1] 3.570765 1.622937 1.451592 2.391102 2.522830 3.713332
> head(link.hat)
lambda nu p
1 0.8712909 0 0.4725120
2 0.8383254 0 0.6870063
3 0.7833983 0 0.5986451
4 0.8069894 0 0.4281048
5 0.8559186 0 0.5753131
6 0.8337620 0 0.2596150In this example, we can see the benefit of using quantile residuals
rather than raw residuals for diagnostic plots. The functions
plot.fit.res and plot.qq.res have been defined
in Section \(\ref{sec:cmp-reg}\).
res.raw = residuals(zicmp.out, type = "raw")
res.qtl = residuals(zicmp.out, type = "quantile")
plot.fit.res(y.hat, res.raw) +
ggtitle("Fitted Values vs. Raw Residuals")
plot.qq.res(res.raw) +
ggtitle("Q-Q Plot of Raw Residuals")
plot.fit.res(y.hat, res.qtl) +
ggtitle("Fitted Values vs. Quantile Residuals")
plot.qq.res(res.qtl) +
ggtitle("Q-Q Plot of Quantile Residuals")Here is an example of computing fitted values for new covariate data.
new.df = data.frame(EDUCATION = round(1:20 / 20), ANXIETY = seq(-3,3, length.out = 20))
X.new = model.matrix(~ EDUCATION + ANXIETY, data = new.df)
S.new = model.matrix(~ 1, data = new.df)
W.new = model.matrix(~ EDUCATION + ANXIETY, data = new.df)
new.data = get.modelmatrix(X.new, S.new, W.new)
# For model fit using raw interface, use get.modelmatrix to prepare new design
# matrices, offsets, etc
y.hat.new = predict(zicmp.out, newdata = new.data)
# For models fit with the formula interface, pass a data.frame with the same
# structure as used in the fit.
y.hat.new = predict(zicmp0.out, newdata = new.df)> print(y.hat.new)
[1] 0.4699009 0.5622034 0.6711343 0.7991174 0.9487439 1.1227245 1.3238518
[8] 1.5549320 1.8187433 2.1180077 2.0359259 2.2797300 2.5382951 2.8108724
[15] 3.0968671 3.3959397 3.7081289 4.0339804 4.3745604 4.7316162As with CMP regression, a parametric.bootstrap function
is provided for convenience to obtain a bootstrap sample \(\hat{\btheta}^{(r)}\) of \(\btheta\) based on the estimate \(\hat{\btheta}\). Because it is too time
consuming to run this example within the vignette, we show the code
without output below. As in Section \(\ref{sec:cmp-reg}\), we consider using the
bootstrap samples to construct a 95% confidence interval for each of the
coefficients.
We acknowledge Thomas Lotze for significant contributions to the
initial development of the COMPoissonReg package and for
service as initial maintainer on CRAN. We thank Darcy Steeg Morris, Eric
Slud, and Tommy Wright for reviewing the manuscript. We are grateful to
the users of COMPoissonReg for their interest and for
bringing to light some of the issues which have been considered in this
work.
[email protected], Center for Statistical Research
& Methodology, U.S. Census Bureau, Washington, DC, 20233, U.S.A.
Disclaimer: This document is released to
inform interested parties of ongoing research and to encourage
discussion of work in progress. Any views expressed are those of the
authors and not those of the U.S. Census Bureau. ↩︎
[email protected], Center for Statistical Research & Methodology, U.S. Census Bureau and Department of Mathematics and Statistics, Georgetown University, Washington, DC, 20057, U.S.A. ↩︎
These bounds are also stated at https://en.wikipedia.org/wiki/Stirling\%27s_approximation, last accessed 2022-10-09.↩︎
Comments about Results
The AIC of the ZICMP model is drastically smaller than the Poisson model, indicating a greatly improved fit. However, there are signs of possible numerical issues. The estimate for \(\gamma_0\) is a large negative number, but with an extremely large associated SE, which suggests that the effect may not be statistically significant. On the other hand, the estimate of \(\nu\) is nearly zero with a small SE, which suggests that the dispersion parameter is indeed statistically significant. On the surface, this seems to be a contradiction.
The issue is that the Hessian of the log-likelihood becomes insensitive to small changes in \(\gamma_0\) when \(\lambda_i < 1\) and \(\gamma_0\) is a large negative number. Let us first verify that the estimates for \(\lambda_i\) are indeed smaller than 1.
To show the insensitivity of the Hessian, let us consider a simpler setting with \(Y \sim \text{CMP}(\lambda, \nu)\), \(\lambda = \exp\{-0.25\} \approx 0.7788\) fixed, and \(\nu = \exp\{\gamma_0\}\). We then have log-density \[\begin{align*} \log f(y \mid \gamma_0) = y \log \lambda - e^{\gamma_0} \log(y!) - \log Z(\lambda, e^{\gamma_0}), \end{align*}\] with first derivative and second derivatives, respectively, \[\begin{align*} &\frac{\partial}{\partial \gamma_0} \log f(y \mid \gamma_0) = -e^{\gamma_0} \log(y!) - \frac{\partial}{\partial \gamma_0} \log Z(\lambda, e^{\gamma_0}), \\ &\frac{\partial^2}{\partial \gamma_0^2} \log f(y \mid \gamma_0) = -e^{\gamma_0} \log(y!) - \frac{\partial^2}{\partial \gamma_0^2} \log Z(\lambda, e^{\gamma_0}). \end{align*}\] For a given value of \(y\), \(-e^{\gamma_0} \log(y!)\) approaches zero as \(\gamma_0\) decreases. Therefore, let us focus on the function \(g(\gamma_0) = -\log Z(\lambda, e^{\gamma_0})\) and its first and second derivatives. The following code illustrates their behavior.
Here is the result.
Notice that \(g(\gamma_0)\) approaches a limit as \(\gamma_0 \rightarrow -\infty\), which coincides with the CMP distribution approaching a Geometric distribution. It may not be surprising that the first and second derivatives approach zero accordingly. This explains the large SE for \(\gamma_0\) in the model
zicmp0.out. With estimates tending to this region of the parameter space, it may be preferable to fix fix \(\gamma_0\) at a value such as \(-\infty\), which will be done in the next section.