Models for count data

De Popix
Aller à : Navigation, rechercher








Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.

For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.

We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.

If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include Poisson, binomial and negative binomial.

Indeed, here we will only consider parametric distributions. In this context, building a model means defining:


    • the parameter function (or "intensity") $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.
    • the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.


The conditional distribution of the observations is therefore written:

\( \prob{y_{ij}=k | \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. \)


Man02.jpg
Example


Let us illustrate this approach for the Poisson distribution.

A Poisson distribution with intensity $\lambda$ is defined by its probability mass function:

\( \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. \)


Poisson1.png


One of the main property of the Poisson distribution is that $\lambda$ is both the mean and the variance of the distribution:

\(\esp{y} = \var{y} = \lambda \)

All that remains is to define the Poisson intensity function $ \lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,

\(\prob{y_{ij}=k | \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. \)


There are many variations of the Poisson model:


    • Homogeneous Poisson distribution: this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$.


    • Non-homogeneous Poisson distribution: this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.


    • Additional regression variables: the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an "Imax" model:

    \( \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) , \)

    where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.
    This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.


    • Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.



    • If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see Models for time-to-event data section).


Let us see now some other examples of distributions for count data:


    \( \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc} p_0 + (1-p_0)e^{-\lambda} & {\rm if } \ k=0 \\ (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } & {\rm if } \ k>0 . \end{array} \right. \)

    where $0\leq p_0 <1$. This is useful when data seem generally to follow a Poisson distribution except for having an overly large quantity of cases when $k=0$:


    Poisson2.png


    \( \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k , \)

    with $0\leq p \leq 1$ and $r>0$. If $r$ is an integer, then the negative binomial (NB) distribution with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of Bernoulli trials with probability of success $p$ before $r$ failures occur.


    Poisson3.png


    \( \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} }, \)

    with $\lambda>0$ and $0\leq \delta <1$.
    The generalized Poisson (GP) distribution includes the Poisson distribution as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:


    \( \begin{eqnarray} \esp{y} &=& \frac{\lambda}{1-\delta} \\ \var{y} &=& \frac{\lambda}{1-\delta^3}. \end{eqnarray}\)


    Poisson4.png





      Summary


      For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:


        - the probability mass function used to represent the distribution of the data in a given time interval

        - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.



      $\mlxtran$ for count data models

      Man02.jpg
      Example 1: Poisson model with time varying intensity


      \( \begin{array}{c} \psi_i &=& (\alpha_i,\beta_i) \\[0.3cm] \lambda(t,\psi_i) &=& \alpha_i + \beta_i\,t \\[0.3cm] \prob{y_{ij}=k} &=& \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\ \end{array}\)
      Monolix icon2.png
      MLXTran
       
      INPUT:
      input = {alpha, beta}
      
      EQUATION:
      lambda = alpha + beta*t
      
      DEFINITION:
      y ~ poisson(lambda)
      



      Man02.jpg
      Example 2: generalized Poisson model


      \( \begin{array}{c} \psi_i &=& (\lambda_i,\delta_i) \\ \log\left( \prob{y_{ij}=k} \right) &=& \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\ && -\lambda_i-k\delta_i - \log(k!)\\[1cm] \end{array}\)
      Monolix icon2.png
      MLXTran
       
      INPUT:
      parameter = {dlt, lbd}
      
      DEFINITION:
      Y = {
        type = count,
        log(P(Y=k)) = log(lambda)
        + (k-1)*log(lambda+k*delta)
        - lambda -k*delta - factln(k)
      } 




      Bibliography

      Blundell, R., Griffith, R., Windmeijer, F. - Individual effects and dynamics in count data models

      Journal of Econometrics 108(1):113-131,2002
      Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H., White, J.-S. S., others - Generalized linear mixed models: a practical guide for ecology and evolution
      Trends in ecology & evolution 24(3):127-135,2009
      Cameron, A. C., Trivedi, P. K. - Regression analysis of count data
      Vol. 30, Cambridge University Press,1998
      Christensen, O. F., Waagepetersen, R. - Bayesian prediction of spatial count data using generalized linear mixed models
      Biometrics 58(2):280-286,2002
      Fahrmeir, L., Tutz, G., Hennevogl, W. - Multivariate statistical modelling based on generalized linear models
      Vol. 2, Springer New York,1994
      Hall, D. B. - Zero-inflated Poisson and binomial regression with random effects: a case study
      Biometrics 56(4):103--1039,2004
      Heilbron, D. C. - Zero-Altered and other Regression Models for Count Data with Added Zeros
      Biometrical Journal 36(5):531-547,2007
      Lawless, J. F. - Negative binomial and mixed Poisson regression
      Canadian Journal of Statistics 15(3):209-225,1987
      Lee, A. H., Wang, K., Scott, J. A., Yau, K. K. W., McLachlan, G. J. - Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros
      Statistical Methods in Medical Research 15(1):47-61,2006
      McCulloch, C. E., Searle, S. R., Neuhaus, J. M. - Generalized, Linear, and Mixed Models
      Wiley,2011
      http://books.google.fr/books?id=kyvgyK\_sBlkC
      Min, Y., Agresti, A. - Random effect models for repeated measures of zero-inflated count data
      Statistical Modelling 5(1):1-19,2005
      Molenberghs, G., Verbeke, G. - Models for discrete longitudinal data
      Springer,2005
      Mullahy, J. - Heterogeneity, excess zeros, and the structure of count data models
      Journal of Applied Econometrics 12(3):337-350,1998
      Savic, R., Lavielle, M. - Performance in population models for count data, part ii: A new saem algorithm
      Journal of pharmacokinetics and pharmacodynamics 36(4):367-379,2009
      Thall, P. F. - Mixed Poisson likelihood regression models for longitudinal interval count data
      Biometrics pp. 197-209,1988
      Thall, P. F., Vail, S. C. - Some covariance models for longitudinal count data with overdispersion
      Biometrics pp. 657-671,1990
      Tempelman, R. J., Gianola, D. - A mixed effects model for overdispersed count data in animal breeding
      Biometrics pp. 265-279,1996
      Winkelmann, R. - Econometric analysis of count data
      Springer,2008
      Wolfinger, R., O'Connell, M. - Generalized linear mixed models a pseudo-likelihood approach
      Journal of statistical Computation and Simulation 48(3-4):233-243,1993
      Yau, K. K. W., Wang, K., Lee, A. H. - Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros
      Biometrical Journal 45(4):437-452,2003
      Zeileis, A., Kleiber, C., Jackman, S. - Regression models for count data in R
      Journal of Statistical Software 27(8):1-25,2008

      Back.png
      Forward.png

Outils personnels
Espaces de noms

Variantes
Actions
WikiPopix
Introduction
Models
Tasks & Tools
Methods
Download files
Boîte à outils