Model for categorical data : Différence entre versions
m (→Bibliography) 
m (→Markovian dependence) 

Ligne 125 :  Ligne 125 :  
For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.  For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.  
−  We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j1}$. i.e., for all $k=1,2,\ldots ,K$,  +  We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j1}$. i.e., for all $k=1,2,\ldots ,K$, 
−  +  
{{Equation1  {{Equation1  
equation=<math>  equation=<math>  
Ligne 136 :  Ligne 136 :  
=== Discrete time Markov chains ===  === Discrete time Markov chains ===  
−  If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time Markov chain. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:  +  If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by: 
Ligne 219 :  Ligne 219 :  
The previous situation can be extended to the case where observation times are irregular, by modeling the  The previous situation can be extended to the case where observation times are irregular, by modeling the  
−  sequence of states as a continuoustime Markov process. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:  +  sequence of states as a continuoustime [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities: 
{{Equation1  {{Equation1  
Ligne 254 :  Ligne 254 :  
<ol>  <ol>  
−  <li> the probability transitions in the case of a discretetime Markov chain</li>  +  <li> the probability transitions in the case of a discretetime http://en.wikipedia.org/wiki/Markov_chain Markov chain]</li> 
−  <li> (or) the transition rates in the case of a continuoustime Markov process</li>  +  <li> (or) the transition rates in the case of a continuoustime [http://en.wikipedia.org/wiki/Markov_process Markov process]</li> 
<li> the probability distribution of the initial states</li>  <li> the probability distribution of the initial states</li> 
Version du 7 juin 2013 à 14:45
Sommaire 
Overview
Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$. Considering the observations $(y_{ij}, 1 \leq j \leq n_i)$ of any individual $i$ as a sequence of independent random variables, the model is completely defined by the probability mass functions $\prob{y_{ij}=c_k  \psi_i}$, for $k=1,\ldots, K$ and $1 \leq j \leq n_i$.
For a given $(i,j)$, the sum of the $K$ probabilities is 1, so in fact only $K1$ of them need to be defined.
In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\prob{y_{ij}=c_k  \psi_i} \in [0,1]$, and $\sum_{k=1}^{K} \prob{y_{ij}=c_k  \psi_i} = 1$. For instance, we could define $K$ timedependent parametric functions $a_1$, $a_2$, ..., $a_K$ and set for any individual $i$, time $t_{ij}$ and $k \in \{1,\ldots,K\}$,
\(
\prob{y_{ij}=c_k  \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. \)

(1) 
Such parametrizations are extremely flexible and easy to interpret in simple situations. In the previous example for instance, $\prob{y_{ij}=1  \psi_i}$ and $a_2(t_{ij},\psi_i)$ move in the same direction as time increases.
Ordinal data
Ordinal data further assumes that the categories are ordered, i.e., there exists an order $\prec$ such that
We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.
Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\prob{y_{ij} \preceq c_k  \psi_i}$ for $k=1,\ldots ,K1$, or in the other direction: $\prob{y_{ij} \succeq c_k  \psi_i}$ for $k=2,\ldots, K$. Any model is possible as long as it defines a probability distribution, i.e., satisfies:
Without any loss of generality, we will consider numerical categories in what follows. The order $\prec$ then reduces to the usual order $<$ on $\Rset$. Currently, the most popular model for ordinal data is the proportional odds model which uses logits of these cumulative probabilities, also called cumulative logits. We assume that there exist $\alpha_{i,1}\geq0$, $\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K1}\geq 0$ such that for $k=1,2,\ldots,K1$,
\( \logit \left(\prob{y_{ij} \leq c_k  \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,
\)

(2) 
where $x(t_{ij})$ is a vector of regression variables and $\beta_i$ a vector of coefficients. Here, $\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K1},\beta_i)$.
Recall that $\logit(p) = \log\left(p/(1p)\right)$. Then, the probability defined in (2) can also be expressed as
Markovian dependence
For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.
We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j1}$. i.e., for all $k=1,2,\ldots ,K$,
Discrete time Markov chains
If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time Markov chain. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:
 the distribution $ \pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$ of the first observation $y_{i,1}$:
 the sequence of transition matrices $(Q_{i,j}, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that,
The conditional distribution of $y_i=(y_{i,j}, j=1,2,\ldots, n_i)$ is then welldefined:
For a given individual $i$, $Q_{i,j}$ defines the transition probabilities between states at a given time $t_{ij}$:
Our model must therefore give, for each individual $i$, the distribution of first observation $(y_{i,1})$ and a description of how the transition probabilities evolve with time.
The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:
where $t_j = j$.
In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$). Transition probabilities are therefore constant over time. Here, $q^{1,2}=1/(1+\exp(2.5))=0.0759$ and $q^{2,1}=1/(1+\exp(2))=0.1192$. As $q^{1,2}$ and $q^{2,1}$ are small with $q^{1,2}<q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix $P$: $\prob{y_{ij}=1}=0.611$ and $ \prob{y_{ij}=2}=0.389$. The figure (left) displays the transition rates $q^{1,2}$ and $q^{2,1}$ as function of the time (top left) and two simulated sequences of states (centre and bottom left).
In the second example (center), $b_i$ and $d_i$ are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.
In the third example (right), now $b_i$ and $d_i$ are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1. Note that the value of $a_i$ (resp. $c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time $t=0$.
Different choices can be made for defining an initial distribution $\pi_{i,1}$:
 The initial state can be defined arbitrarily: $y_{i,1}=k_0$. This means that $\pi_{i,1}^{k_0} = 1$ and $\pi_{i,1}^{k} = 0$ for $k\neq k_0$.
 More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution $\pi_{i,1}^{k} = 1/K$ for $ k=1,2,\ldots , K$.
 If a transition matrix $Q_{i1} $ has been defined at time $t_1$, we might consider using its stationary distribution, i.e., taking for $\pi_{i,1}$ the solution to:
Continuous time Markov chains
The previous situation can be extended to the case where observation times are irregular, by modeling the sequence of states as a continuoustime Markov process. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by transition rates instead of transition probabilities:
The probability that no transition happens between $t$ and $t+h$ is
$\mlxtran$ for categorical data models
Bibliography
Agresti, A.  Analysis of ordinal categorical data
 Vol. 656, Wiley,2010
 Vol. 423, WileyInterscience,2007
 Trends in ecology & evolution 24(3):127135,2009
 Chapman & Hall., London,1995
 Springer Series in Statistics, New York,2007
 SAS institute,2006
Molenberghs, G., Verbeke, G.  Models for discrete longitudinal data
 Springer,2005
 Emerald Group Publishing,2008
 Journal of statistical Computation and Simulation 48(34):233243,1993