http://wiki.webpopix.org/api.php?action=feedcontributions&user=Admin&feedformat=atom Popix - Contributions de l'utilisateur [fr] 2020-10-26T21:50:16Z Contributions de l'utilisateur MediaWiki 1.20.2 http://wiki.webpopix.org/index.php/The_SAEM_algorithm_for_estimating_population_parameters The SAEM algorithm for estimating population parameters 2016-04-29T12:54:08Z <p>Admin : </p> <hr /> <div>==Introduction ==<br /> <br /> <br /> The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br /> <br /> SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br /> <br /> Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> SAEM is an iterative algorithm that essentially consists of constructing $N$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_1^{(k)})$, ..., $(\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br /> <br /> We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several [http://en.wikipedia.org/wiki/Markov_chain Markov chains] per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br /> <br /> <br /> &lt;br&gt;<br /> ==The EM algorithm==<br /> <br /> <br /> We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta)$.<br /> <br /> However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br /> <br /> <br /> * $\textbf{E-step:}$ evaluate the quantity<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .&lt;/math&gt; }}<br /> <br /> <br /> * $\textbf{M-step:}$ update the estimation of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br /> stationary point of the observed likelihood under mild regularity conditions.<br /> <br /> Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br /> <br /> <br /> * The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br /> <br /> * The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==The SAEM algorithm==<br /> <br /> At iteration $k$ of SAEM:<br /> <br /> <br /> * $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br /> <br /> <br /> * $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br /> &lt;/math&gt; }}<br /> <br /> where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $\sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 &lt; \infty$.<br /> <br /> <br /> * $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_{k} = \argmax{\theta} \, Q_k(\theta) .&lt;/math&gt; }}<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text= &amp;#32;<br /> * Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . &lt;/math&gt; }}<br /> <br /> : This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br /> <br /> <br /> * When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences ${ \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} }$ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br /> &lt;/math&gt; }}<br /> <br /> : By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br /> }}<br /> <br /> <br /> Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , &lt;/math&gt; }}<br /> <br /> where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:saem_stat&quot;&gt;&lt;math&gt;<br /> \tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br /> <br /> <br /> * $\textbf{Stochastic approximation}$: update $s_k$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . &lt;/math&gt; }}<br /> <br /> <br /> Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br /> <br /> Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br /> <br /> Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br /> <br /> In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br /> Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] Monte Carlo procedure.<br /> <br /> <br /> {{Remarks<br /> |title= Remark<br /> |text= Convergence of the [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Implementing SAEM ==<br /> <br /> Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br /> <br /> &lt;br&gt;<br /> ===SAEM for general hierarchical models===<br /> <br /> Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} y_i {{!}} \psi_i &amp;\sim&amp; \pcyipsii(y_i {{!}} \psi_i) \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}( \mu , \Omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br /> <br /> We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br /> $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \tilde{S}_1(\bpsi) &amp;= &amp; \sum_{i=1}^N h(\psi_i) \\<br /> \tilde{S}_2(\bpsi) &amp;= &amp; \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> At iteration $k$ of SAEM, we have:<br /> <br /> <br /> * $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br /> <br /> * $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> s_{k,1} &amp;=&amp; s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br /> s_{k,2} &amp;=&amp; s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> * $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_{k} &amp;=&amp; \frac{1}{N} s_{k,1} \\<br /> \Omega_k &amp;=&amp; \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===SAEM for continuous data models===<br /> Consider now a continuous data model in which the residual error variance is now constant:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij},\phi_i) + a \teps_{ij} \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}( \mu , \Omega) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br /> &lt;/math&gt; }}<br /> <br /> The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> s_{k,3} &amp;=&amp; s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br /> a_k^2 &amp;=&amp; \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br /> Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br /> almost sure convergence of the algorithm to the maximum likelihood estimator.<br /> <br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Consider a simple model for continuous data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br /> \log(A_i)&amp;\sim&amp;{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br /> \log(k_i)&amp;\sim&amp;{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br /> Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br /> <br /> <br /> 1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the &quot;solution&quot;. The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br /> <br /> [[File:saem1.png|link=]]<br /> <br /> <br /> 2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br /> <br /> [[File:saem2.png|link=]]<br /> <br /> <br /> 3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br /> <br /> [[File:saem3.png|link=]]<br /> <br /> <br /> 4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br /> <br /> [[File:saem4.png|link=]]<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==A simple example to understand why SAEM converges in practice==<br /> <br /> <br /> Let us look at a very simple Gaussian model, with only one observation per individual:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi_i &amp;\sim&amp; {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br /> y_i &amp;\sim&amp; {\cal N}(\psi_i,\sigma^2).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br /> <br /> Here, the maximum likelihood estimator $\hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br /> &lt;/math&gt;}}<br /> <br /> We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br /> &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a &amp;= &amp; \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br /> \gamma^2 &amp;= &amp;\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \tilde{\theta}({\cal S(\bpsi)} ) &amp;=&amp; \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br /> &amp;=&amp; \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br /> &amp;=&amp; \frac{ {\cal S}(\bpsi)}{N}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br /> <br /> <br /> * Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2).$<br /> <br /> * Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br /> <br /> <br /> It can be shown that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br /> &lt;/math&gt; }}<br /> <br /> where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br /> <br /> <br /> Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br /> <br /> <br /> * Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2)$<br /> <br /> * Maximization step:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br /> &lt;/math&gt; }}<br /> <br /> <br /> : Here, we can show that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br /> &lt;/math&gt; }}<br /> <br /> : where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br /> <br /> <br /> {{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br /> <br /> <br /> Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br /> <br /> <br /> {{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br /> <br /> <br /> {{ShowVideo|image=saem5b.png|video=http://wiki.webpopix.org/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br /> <br /> &lt;!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --&gt;<br /> <br /> ==A simulated annealing version of SAEM==<br /> <br /> <br /> Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br /> <br /> To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any &quot;temperature&quot; $T\geq0$, we consider the complete model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br /> &lt;/math&gt; }}<br /> <br /> where $C_T(\theta)$ is still a normalizing constant.<br /> <br /> We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $T_k = \max(1, \tau \ T_{k-1})$.<br /> <br /> Consider for example the following model for continuous data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}(\mu , \Omega) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, $\theta = (\mu,\Omega,a^2)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br /> &lt;/math&gt; }}<br /> <br /> where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br /> <br /> <br /> We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br /> In other words, a model with a &quot;large temperature&quot; is a model with large variances.<br /> <br /> The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)})$ and $\Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)})$ during the first iterations. Here, $0\leq\tau\leq 1$.<br /> <br /> These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to &quot;escape&quot; from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br /> After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br /> <br /> <br /> {{Remarks<br /> |title= Remark<br /> |text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1&lt;1$ and $\tau_2&gt;1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=A PK example<br /> |text= <br /> <br /> Consider a simple one-compartment model for oral administration:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:saem_sa&quot;&gt;&lt;math&gt;<br /> f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> We then simulate PK data from 80 patients using the following population PK parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .&lt;/math&gt; }}<br /> <br /> We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . &lt;/math&gt; }}<br /> <br /> We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br /> <br /> The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br /> <br /> :{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br /> <br /> Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br /> <br /> :{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{allassonniere2010construction,<br /> title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br /> author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br /> journal={Bernoulli},<br /> volume={16},<br /> number={3},<br /> pages={641--678},<br /> year={2010},<br /> publisher={Bernoulli Society for Mathematical Statistics and Probability}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> volume={56},<br /> pages={2073-2085}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2013sde,<br /> title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Statistics and its interfaces},<br /> year={2013},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delyon1999convergence,<br /> title={Convergence of a stochastic approximation version of the EM algorithm},<br /> author={Delyon, B. and Lavielle, M. and Moulines, E.},<br /> journal={Annals of Statistics},<br /> pages={94-128},<br /> year={1999},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{dempster1977maximum,<br /> title={Maximum likelihood from incomplete data via the EM algorithm},<br /> author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br /> journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br /> pages={1-38},<br /> year={1977},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{kuhn2004coupling,<br /> title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br /> author={Kuhn, E. and Lavielle, M.},<br /> journal={ESAIM: Probability and Statistics},<br /> volume={8},<br /> pages={115-131},<br /> year={2004},<br /> publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lavielle2013improved,<br /> title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br /> author={Lavielle, M. and Mbogning, C.},<br /> journal={Statistics and Computing},<br /> year={2013},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mclachlan2007algorithm,<br /> title={The EM algorithm and extensions},<br /> author={McLachlan, G.J. and Krishnan, T.},<br /> volume={382},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{samson2006extension,<br /> title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br /> author={Samson, A. and Lavielle, M. and Mentr&amp;eacute;, F.},<br /> journal={Computational statistics &amp; data analysis},<br /> volume={51},<br /> number={3},<br /> pages={1562-1574},<br /> year={2006},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wei1990monte,<br /> title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br /> author={Wei, G. and Tanner, M.},<br /> journal={Journal of the American Statistical Association},<br /> volume={85},<br /> number={411},<br /> pages={699-704},<br /> year={1990},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wu1983convergence,<br /> title={On the convergence properties of the EM algorithm},<br /> author={Wu, C.F.},<br /> journal={The Annals of Statistics},<br /> volume={11},<br /> number={1},<br /> pages={95-103},<br /> year={1983},<br /> publisher={Institute of Mathematical Statistics}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Introduction and notation<br /> |linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div> Admin http://wiki.webpopix.org/index.php/Animations_%26_Videos Animations & Videos 2016-04-29T12:52:14Z <p>Admin : </p> <hr /> <div><br /> == Introduction to the population approach == <br /> <br /> The goal of this animation is to show that the population approach is relevant to many fields of application (biology, agronomy, toxicology, pharmacology, etc.) and to present a PK modeling application in a bit more detail.<br /> <br /> {{ShowVideo_NoCaption|image=IntroductionPA.png|size=500px|video=http://team.inria.fr/popix/files/2012/01/Populations.swf}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Introduction to PK modeling ==<br /> <br /> This animation describes how complex biological phenomena can be approximated by simplified models represented by mathematical equations.<br /> <br /> {{ShowVideo_NoCaption|image=IntroductionPK.png |size=500px|video=https://team.inria.fr/popix/files/2013/02/PKmodelling.swf }}<br /> <br /> <br /> Examples of PK modeling using $\mlxplore$ can be visualized [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|here]].<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> == The SAEM algorithm in practice ==<br /> <br /> This video shows how the SAEM algorithm works, as described in [[The SAEM algorithm for estimating population parameters]] section. <br /> <br /> <br /> {{ShowVideo_NoCaption|image=saem5b.png|video=http://wiki.webpopix.org/images/2/20/saem.mp4}}</div> Admin http://wiki.webpopix.org/index.php/Visualization Visualization 2013-06-21T09:46:39Z <p>Admin : </p> <hr /> <div>== Introduction ==<br /> <br /> Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br /> illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br /> <br /> It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br /> <br /> Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] for some different reasons:<br /> <br /> <br /> &lt;ul&gt;<br /> * [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br /> &lt;br&gt;<br /> <br /> * [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Data exploration ==<br /> <br /> <br /> The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br /> <br /> <br /> &lt;ul&gt;<br /> '''id''' the ID number of the patient<br /> &lt;br&gt;&lt;br&gt;<br /> '''time''' dose administration and observation times<br /> &lt;br&gt;&lt;br&gt;<br /> '''amt''' the amount of drug administered<br /> &lt;br&gt;&lt;br&gt;<br /> '''y''' the observations (concentrations and events)<br /> &lt;br&gt;&lt;br&gt;<br /> '''ytype''' the type of observation: 1=concentration, 2=event<br /> &lt;br&gt;&lt;br&gt;<br /> '''weight''' a continuous individual covariate<br /> &lt;br&gt;&lt;br&gt;<br /> '''gender''' a categorical individual covariate (F or M)<br /> &lt;br&gt;&lt;br&gt;<br /> '''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br /> <br /> <br /> We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> datafile.name='pkrtte_data.csv';<br /> datafile.format='csv'; % can be &quot;csv&quot;, &quot;space&quot;, &quot;tab&quot; or &quot;;&quot;<br /> <br /> info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br /> info.observation.name={'concentration','hemorrhaging'};<br /> info.observation.type={'continuous','event'};<br /> info.observation.unit={'mg/l',''};<br /> info.covariate.unit={'kg',''};<br /> info.time.unit='h';<br /> <br /> data=readdatapx(datafile,info);<br /> &lt;/pre&gt; }}<br /> <br /> <br /> How we graphically represent data depends on the type of data. Often for continuous data we use &quot;spaghetti plots&quot;, where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using [https://en.wikipedia.org/wiki/Kaplan-Meier_survival_curve Kaplan-Meier plots], i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &gt;&gt;exploredatapx(data)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br /> <br /> <br /> When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br /> <br /> <br /> {{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;0&quot;<br /> |style = &quot;width:50%&quot;| [[File:exploredata3a.png]] <br /> |style = &quot;width:50%&quot;| [[File:exploredata3b.png]]<br /> |-<br /> |cellspan=&quot;2&quot; align=&quot;center&quot; style=&quot;text-align:center&quot;| ''Distribution of weight and gender per dose group'' <br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Model exploration==<br /> <br /> ===Exploring the structural model===<br /> <br /> Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k&amp;=&amp;Cl/V \\<br /> \deriv{A_d} &amp;=&amp; - k_a \, A_d(t) \\<br /> \deriv{A_c} &amp;=&amp; k_a \, A_d(t) - k \, A_c(t) \\<br /> Cc(t) &amp;=&amp; {Ac(t)}/{V} \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br /> We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br /> <br /> We first need to implement this model using $\mlxtran$:<br /> <br /> <br /> {{MLXTran<br /> |name=joint1_model.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> <br /> PK:<br /> depot(type=1,target=Ad)<br /> depot(type=2,target=Ac)<br /> <br /> EQUATION:<br /> k = Cl/V<br /> ddt_Ad = -ka*Ad<br /> ddt_Ac = ka*Ad - k*Ac<br /> Cc = Ac/V<br /> h = h0*exp(gamma*Cc)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br /> <br /> The tasks, i.e., how the model is to be used, are then coded as an [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] project:<br /> <br /> <br /> {{MLXPlore<br /> |name=joint1_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint1_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> ka = 0.5<br /> V = 10<br /> Cl = 0.5<br /> h0 = 0.01<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br /> <br /> <br /> We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|&lt;DESIGN&gt;}} needs to be modified:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint2_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=0:12:144, amount=25,type=1}<br /> &lt;/pre&gt; }}<br /> |image=[[File:exploremodel2.png]] }}<br /> <br /> <br /> We can combine different administrations (oral and intravenous for instance) into one global treatment:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint3_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=6:48:150, amount=25,type=2}<br /> <br /> [TREATMENT]<br /> trt1={adm1, adm2}<br /> &lt;/pre&gt; }}<br /> |image= [[File:exploremodel3.png]]<br /> }}<br /> <br /> ===Exploring the statistical model===<br /> <br /> One of the main advantages of [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br /> <br /> <br /> {{MLXTran<br /> |name=joint2_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br /> <br /> DEFINITION:<br /> V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br /> Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br /> h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br /> <br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> .<br /> .<br /> .<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br /> <br /> <br /> {{MLXTran<br /> |name=joint4_project.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint2_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> V_pop = 10<br /> Cl_pop = 0.5<br /> h0_pop=0.01<br /> omega_V = 0.2<br /> omega_Cl = 0.3<br /> omega_h0 = 0.2<br /> ka = 0.5<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br /> <br /> <br /> It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br /> <br /> <br /> The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @ARTICLE{popixplore,<br /> author = {POPIX Inria team},<br /> title = {Popixplore 1.0},<br /> url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{MLXplore,<br /> author = {Lixoft},<br /> title = {MLXPlore 1.0},<br /> url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{macey2000berkeley,<br /> title={Berkeley Madonna user’s guide},<br /> author={Macey, R. and Oster, G. and Zahnley, T.},<br /> journal={Berkeley (CA): University of California},<br /> year={2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{chatterjee2009sensitivity,<br /> title={Sensitivity analysis in linear regression},<br /> author={Chatterjee, S. and Hadi, A. S.},<br /> volume={327},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{sensibilité2013,<br /> title={Analyse de sensibilité et exploration de modèles},<br /> author={Faivre R. and Looss B. and Mah&amp;eacute;vas, S. and Makowski, D. and Monod, H.},<br /> year={2013},<br /> publisher={Editions Quae}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2000sensitivity,<br /> title={Sensitivity analysis},<br /> author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br /> volume={134},<br /> year={2000},<br /> publisher={Wiley New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2008global,<br /> title={Global sensitivity analysis: the primer},<br /> author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br /> year={2008},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2004sensitivity,<br /> title={Sensitivity analysis in practice: a guide to assessing scientific models},<br /> author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Next<br /> |link=Modeling}}</div> Admin http://wiki.webpopix.org/index.php/Visualization Visualization 2013-06-21T09:42:17Z <p>Admin : </p> <hr /> <div><br /> == Introduction ==<br /> <br /> Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br /> illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br /> <br /> It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br /> <br /> Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$] for some different reasons:<br /> <br /> <br /> &lt;ul&gt;<br /> * $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br /> &lt;br&gt;<br /> <br /> * $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Data exploration ==<br /> <br /> <br /> The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br /> <br /> <br /> &lt;ul&gt;<br /> '''id''' the ID number of the patient<br /> &lt;br&gt;&lt;br&gt;<br /> '''time''' dose administration and observation times<br /> &lt;br&gt;&lt;br&gt;<br /> '''amt''' the amount of drug administered<br /> &lt;br&gt;&lt;br&gt;<br /> '''y''' the observations (concentrations and events)<br /> &lt;br&gt;&lt;br&gt;<br /> '''ytype''' the type of observation: 1=concentration, 2=event<br /> &lt;br&gt;&lt;br&gt;<br /> '''weight''' a continuous individual covariate<br /> &lt;br&gt;&lt;br&gt;<br /> '''gender''' a categorical individual covariate (F or M)<br /> &lt;br&gt;&lt;br&gt;<br /> '''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br /> <br /> <br /> We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> datafile.name='pkrtte_data.csv';<br /> datafile.format='csv'; % can be &quot;csv&quot;, &quot;space&quot;, &quot;tab&quot; or &quot;;&quot;<br /> <br /> info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br /> info.observation.name={'concentration','hemorrhaging'};<br /> info.observation.type={'continuous','event'};<br /> info.observation.unit={'mg/l',''};<br /> info.covariate.unit={'kg',''};<br /> info.time.unit='h';<br /> <br /> data=readdatapx(datafile,info);<br /> &lt;/pre&gt; }}<br /> <br /> <br /> How we graphically represent data depends on the type of data. Often for continuous data we use &quot;spaghetti plots&quot;, where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using [https://en.wikipedia.org/wiki/Kaplan-Meier_survival_curve Kaplan-Meier plots], i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &gt;&gt;exploredatapx(data)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br /> <br /> <br /> When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br /> <br /> <br /> {{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;0&quot;<br /> |style = &quot;width:50%&quot;| [[File:exploredata3a.png]] <br /> |style = &quot;width:50%&quot;| [[File:exploredata3b.png]]<br /> |-<br /> |cellspan=&quot;2&quot; align=&quot;center&quot; style=&quot;text-align:center&quot;| ''Distribution of weight and gender per dose group'' <br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Model exploration==<br /> <br /> ===Exploring the structural model===<br /> <br /> Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k&amp;=&amp;Cl/V \\<br /> \deriv{A_d} &amp;=&amp; - k_a \, A_d(t) \\<br /> \deriv{A_c} &amp;=&amp; k_a \, A_d(t) - k \, A_c(t) \\<br /> Cc(t) &amp;=&amp; {Ac(t)}/{V} \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br /> We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br /> <br /> We first need to implement this model using $\mlxtran$:<br /> <br /> <br /> {{MLXTran<br /> |name=joint1_model.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> <br /> PK:<br /> depot(type=1,target=Ad)<br /> depot(type=2,target=Ac)<br /> <br /> EQUATION:<br /> k = Cl/V<br /> ddt_Ad = -ka*Ad<br /> ddt_Ac = ka*Ad - k*Ac<br /> Cc = Ac/V<br /> h = h0*exp(gamma*Cc)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br /> <br /> The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br /> <br /> <br /> {{MLXPlore<br /> |name=joint1_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint1_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> ka = 0.5<br /> V = 10<br /> Cl = 0.5<br /> h0 = 0.01<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br /> <br /> <br /> We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|&lt;DESIGN&gt;}} needs to be modified:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint2_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=0:12:144, amount=25,type=1}<br /> &lt;/pre&gt; }}<br /> |image=[[File:exploremodel2.png]] }}<br /> <br /> <br /> We can combine different administrations (oral and intravenous for instance) into one global treatment:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint3_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=6:48:150, amount=25,type=2}<br /> <br /> [TREATMENT]<br /> trt1={adm1, adm2}<br /> &lt;/pre&gt; }}<br /> |image= [[File:exploremodel3.png]]<br /> }}<br /> <br /> ===Exploring the statistical model===<br /> <br /> One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br /> <br /> <br /> {{MLXTran<br /> |name=joint2_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br /> <br /> DEFINITION:<br /> V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br /> Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br /> h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br /> <br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> .<br /> .<br /> .<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br /> <br /> <br /> {{MLXTran<br /> |name=joint4_project.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint2_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> V_pop = 10<br /> Cl_pop = 0.5<br /> h0_pop=0.01<br /> omega_V = 0.2<br /> omega_Cl = 0.3<br /> omega_h0 = 0.2<br /> ka = 0.5<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br /> <br /> <br /> It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br /> <br /> <br /> The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @ARTICLE{popixplore,<br /> author = {POPIX Inria team},<br /> title = {Popixplore 1.0},<br /> url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{MLXplore,<br /> author = {Lixoft},<br /> title = {MLXPlore 1.0},<br /> url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{macey2000berkeley,<br /> title={Berkeley Madonna user’s guide},<br /> author={Macey, R. and Oster, G. and Zahnley, T.},<br /> journal={Berkeley (CA): University of California},<br /> year={2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{chatterjee2009sensitivity,<br /> title={Sensitivity analysis in linear regression},<br /> author={Chatterjee, S. and Hadi, A. S.},<br /> volume={327},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{sensibilité2013,<br /> title={Analyse de sensibilité et exploration de modèles},<br /> author={Faivre R. and Looss B. and Mah&amp;eacute;vas, S. and Makowski, D. and Monod, H.},<br /> year={2013},<br /> publisher={Editions Quae}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2000sensitivity,<br /> title={Sensitivity analysis},<br /> author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br /> volume={134},<br /> year={2000},<br /> publisher={Wiley New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2008global,<br /> title={Global sensitivity analysis: the primer},<br /> author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br /> year={2008},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2004sensitivity,<br /> title={Sensitivity analysis in practice: a guide to assessing scientific models},<br /> author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Next<br /> |link=Modeling}}</div> Admin http://wiki.webpopix.org/index.php/Visualization Visualization 2013-06-21T09:37:47Z <p>Admin : </p> <hr /> <div>== Introduction ==<br /> <br /> Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br /> illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br /> <br /> It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br /> <br /> Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br /> <br /> <br /> &lt;ul&gt;<br /> * $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br /> &lt;br&gt;<br /> <br /> * $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Data exploration ==<br /> <br /> <br /> The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br /> <br /> <br /> &lt;ul&gt;<br /> '''id''' the ID number of the patient<br /> &lt;br&gt;&lt;br&gt;<br /> '''time''' dose administration and observation times<br /> &lt;br&gt;&lt;br&gt;<br /> '''amt''' the amount of drug administered<br /> &lt;br&gt;&lt;br&gt;<br /> '''y''' the observations (concentrations and events)<br /> &lt;br&gt;&lt;br&gt;<br /> '''ytype''' the type of observation: 1=concentration, 2=event<br /> &lt;br&gt;&lt;br&gt;<br /> '''weight''' a continuous individual covariate<br /> &lt;br&gt;&lt;br&gt;<br /> '''gender''' a categorical individual covariate (F or M)<br /> &lt;br&gt;&lt;br&gt;<br /> '''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br /> <br /> <br /> We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> datafile.name='pkrtte_data.csv';<br /> datafile.format='csv'; % can be &quot;csv&quot;, &quot;space&quot;, &quot;tab&quot; or &quot;;&quot;<br /> <br /> info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br /> info.observation.name={'concentration','hemorrhaging'};<br /> info.observation.type={'continuous','event'};<br /> info.observation.unit={'mg/l',''};<br /> info.covariate.unit={'kg',''};<br /> info.time.unit='h';<br /> <br /> data=readdatapx(datafile,info);<br /> &lt;/pre&gt; }}<br /> <br /> <br /> How we graphically represent data depends on the type of data. Often for continuous data we use &quot;spaghetti plots&quot;, where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using [https://en.wikipedia.org/wiki/Kaplan-Meier_survival_curve Kaplan-Meier plots], i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &gt;&gt;exploredatapx(data)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br /> <br /> <br /> When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br /> <br /> <br /> {{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;0&quot;<br /> |style = &quot;width:50%&quot;| [[File:exploredata3a.png]] <br /> |style = &quot;width:50%&quot;| [[File:exploredata3b.png]]<br /> |-<br /> |cellspan=&quot;2&quot; align=&quot;center&quot; style=&quot;text-align:center&quot;| ''Distribution of weight and gender per dose group'' <br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Model exploration==<br /> <br /> ===Exploring the structural model===<br /> <br /> Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k&amp;=&amp;Cl/V \\<br /> \deriv{A_d} &amp;=&amp; - k_a \, A_d(t) \\<br /> \deriv{A_c} &amp;=&amp; k_a \, A_d(t) - k \, A_c(t) \\<br /> Cc(t) &amp;=&amp; {Ac(t)}/{V} \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br /> We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br /> <br /> We first need to implement this model using $\mlxtran$:<br /> <br /> <br /> {{MLXTran<br /> |name=joint1_model.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> <br /> PK:<br /> depot(type=1,target=Ad)<br /> depot(type=2,target=Ac)<br /> <br /> EQUATION:<br /> k = Cl/V<br /> ddt_Ad = -ka*Ad<br /> ddt_Ac = ka*Ad - k*Ac<br /> Cc = Ac/V<br /> h = h0*exp(gamma*Cc)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br /> <br /> The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br /> <br /> <br /> {{MLXPlore<br /> |name=joint1_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint1_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> ka = 0.5<br /> V = 10<br /> Cl = 0.5<br /> h0 = 0.01<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br /> <br /> <br /> We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|&lt;DESIGN&gt;}} needs to be modified:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint2_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=0:12:144, amount=25,type=1}<br /> &lt;/pre&gt; }}<br /> |image=[[File:exploremodel2.png]] }}<br /> <br /> <br /> We can combine different administrations (oral and intravenous for instance) into one global treatment:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint3_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=6:48:150, amount=25,type=2}<br /> <br /> [TREATMENT]<br /> trt1={adm1, adm2}<br /> &lt;/pre&gt; }}<br /> |image= [[File:exploremodel3.png]]<br /> }}<br /> <br /> ===Exploring the statistical model===<br /> <br /> One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br /> <br /> <br /> {{MLXTran<br /> |name=joint2_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br /> <br /> DEFINITION:<br /> V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br /> Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br /> h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br /> <br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> .<br /> .<br /> .<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br /> <br /> <br /> {{MLXTran<br /> |name=joint4_project.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint2_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> V_pop = 10<br /> Cl_pop = 0.5<br /> h0_pop=0.01<br /> omega_V = 0.2<br /> omega_Cl = 0.3<br /> omega_h0 = 0.2<br /> ka = 0.5<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br /> <br /> <br /> It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br /> <br /> <br /> The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{chatterjee2009sensitivity,<br /> title={Sensitivity analysis in linear regression},<br /> author={Chatterjee, S. and Hadi, A. S.},<br /> volume={327},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{sensibilité2013,<br /> title={Analyse de sensibilité et exploration de modèles},<br /> author={Faivre R. and Looss B. and Mah&amp;eacute;vas, S. and Makowski, D. and Monod, H.},<br /> year={2013},<br /> publisher={Editions Quae}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{MLXplore,<br /> author = {Lixoft},<br /> title = {MLXPlore 1.0},<br /> url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{macey2000berkeley,<br /> title={Berkeley Madonna user’s guide},<br /> author={Macey, R. and Oster, G. and Zahnley, T.},<br /> journal={Berkeley (CA): University of California},<br /> year={2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{popixplore,<br /> author = {POPIX Inria team},<br /> title = {Popixplore 1.0},<br /> url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2000sensitivity,<br /> title={Sensitivity analysis},<br /> author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br /> volume={134},<br /> year={2000},<br /> publisher={Wiley New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2008global,<br /> title={Global sensitivity analysis: the primer},<br /> author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br /> year={2008},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2004sensitivity,<br /> title={Sensitivity analysis in practice: a guide to assessing scientific models},<br /> author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> <br /> {{Next<br /> |link=Modeling}}</div> Admin http://wiki.webpopix.org/index.php/Visualization Visualization 2013-06-21T09:19:38Z <p>Admin : /* Introduction */</p> <hr /> <div>== Introduction ==<br /> <br /> Before deciding to model data, it is very important to be able to visualize it. This is especially the case for longitudinal data when we want to see how an outcome varies with time or as a function of another outcome. We may also want to visualize how the individual covariates are distributed, visually detect if there are relationships between variables, visually compare data from different groups, etc. Development of such visual exploration tools poses no methodological problems. It is simple to write a Matlab or R code for one's own needs. To<br /> illustrate the data visualization part of this chapter, we have created a little Matlab toolbox called $\popixplore$ ({{filepath:popixplore 1.1.zip}}) which can be freely downloaded and used.<br /> <br /> It may also be useful to be able to visualize the model itself by undertaking a sensitivity analysis to look at how the structural model changes when we vary one or several parameters. This is important for truly understanding the structural model, i.e., what is behind the given mathematical equations. In the modeling context, we may also want to visually calibrate parameters in order to obtain predictions as close as possible to the observations. Developing such a tool is a difficult task because the tool needs to be able to easily input a model using some coding language, perform complex calculations, and provide a decent graphical interface (e.g., one that lets you easily modify the model parameters).<br /> <br /> Various model visualization tools exist, such as [http://www.berkeleymadonna.com/index.html Berkeley Madonna], specialized in the analysis of dynamical systems and the resolution of ordinary differential equations. Here, we use [[http://www.lixoft.eu/products/mlxplore/mlxplore-overview/ $\mlxplore$]] for some different reasons:<br /> <br /> <br /> &lt;ul&gt;<br /> * $\mlxplore$ uses the $\mlxtran$ language which is extremely flexible and well-adapted to implementing complex mixed-effects models. Indeed, with $\mlxtran$ we can implement pharmacokinetic models with complex administration schedules, include inter-individual variability in parameters, define a statistical model for the covariates, etc. Another extremely important aspect of $\mlxtran$ is that it rigorously adopts the model representation formalisms proposed in $\wikipopix$. In other words, model implementation is completely in sync with its mathematical representation.<br /> &lt;br&gt;<br /> <br /> * $\mlxplore$ provides a clear graphical interface that of course allows us to visualize the structural model, but also the statistical model, which is of fundamental importance in the population approach. We can thus visualize the impact of covariates and inter-individual variability of model parameters on predictions.<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Data exploration ==<br /> <br /> <br /> The following example involves 80 individuals that receive a unique dose of an anticoagulant at time $t=0$. For each patient we then measure the plasmatic concentration of the drug at various times. This drug can cause undesirable side effects such as nose bleeds. If this happens, we also record the times at which this happens. The data is recorded in columns of a single text file {{Verbatim|pkrtte_data.csv}}. In this example, the columns are:<br /> <br /> <br /> &lt;ul&gt;<br /> '''id''' the ID number of the patient<br /> &lt;br&gt;&lt;br&gt;<br /> '''time''' dose administration and observation times<br /> &lt;br&gt;&lt;br&gt;<br /> '''amt''' the amount of drug administered<br /> &lt;br&gt;&lt;br&gt;<br /> '''y''' the observations (concentrations and events)<br /> &lt;br&gt;&lt;br&gt;<br /> '''ytype''' the type of observation: 1=concentration, 2=event<br /> &lt;br&gt;&lt;br&gt;<br /> '''weight''' a continuous individual covariate<br /> &lt;br&gt;&lt;br&gt;<br /> '''gender''' a categorical individual covariate (F or M)<br /> &lt;br&gt;&lt;br&gt;<br /> '''group''' four different groups receive different doses: A=40mg, B=60mg, C=80mg, D=100mg.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=exploredata0.png|caption=The datafile {{Verbatim|pkrtte_data.csv}} }} <br /> <br /> <br /> We can read this datafile with the function {{Verbatim|readdatapx}} and add additional information about the data:<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> datafile.name='pkrtte_data.csv';<br /> datafile.format='csv'; % can be &quot;csv&quot;, &quot;space&quot;, &quot;tab&quot; or &quot;;&quot;<br /> <br /> info.header = {'ID','TIME','AMT','Y','YTYPE','COV','CAT','CAT'};<br /> info.observation.name={'concentration','hemorrhaging'};<br /> info.observation.type={'continuous','event'};<br /> info.observation.unit={'mg/l',''};<br /> info.covariate.unit={'kg',''};<br /> info.time.unit='h';<br /> <br /> data=readdatapx(datafile,info);<br /> &lt;/pre&gt; }}<br /> <br /> <br /> How we graphically represent data depends on the type of data. Often for continuous data we use &quot;spaghetti plots&quot;, where all of the observations are given on the same plot, and those for each individual are joined up using line segments. Time-to-event data are usually represented using Kaplan-Meyer plots, i.e., an estimate of the survival function for the first event. In the case of repeated events, we can instead represent the average cumulative number of events per individual.<br /> <br /> <br /> {{MATLABcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &gt;&gt;exploredatapx(data)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=exploredata1.png|caption=Graphical representation of the data. Left: concentrations, right: average cumulative number of events per individual}}<br /> <br /> <br /> When different groups receive different treatments, it can be useful to separately visualize the data from each group. Here for instance we can separate the patients into groups depending on the initial dose given.<br /> <br /> <br /> {{ImageWithCaption|image=exploredata2.png|caption=Concentration profiles per dose group}}<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;0&quot;<br /> |style = &quot;width:50%&quot;| [[File:exploredata3a.png]] <br /> |style = &quot;width:50%&quot;| [[File:exploredata3b.png]]<br /> |-<br /> |cellspan=&quot;2&quot; align=&quot;center&quot; style=&quot;text-align:center&quot;| ''Distribution of weight and gender per dose group'' <br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text=The data file {{Verbatim|pkrtte_data.csv}} and the matlab script {{Verbatim|pkrtte_demo.m}} are available in the folder {{Verbatim|demos}} of $\popixplore$: {{filepath:popixplore 1.1.zip}}.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Model exploration==<br /> <br /> ===Exploring the structural model===<br /> <br /> Suppose that we now want to visualize the following joint model which is one that can be used for simultaneously modeling PK and time-to-event data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k&amp;=&amp;Cl/V \\<br /> \deriv{A_d} &amp;=&amp; - k_a \, A_d(t) \\<br /> \deriv{A_c} &amp;=&amp; k_a \, A_d(t) - k \, A_c(t) \\<br /> Cc(t) &amp;=&amp; {Ac(t)}/{V} \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here, $A_d$ and $A_c$ are the amounts of drug in the depot and central compartments, $Cc$ the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging for instance). The parameters of the model are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$.<br /> We assume that the drug can be administered both intravenously and orally, meaning that the drug can be administered to both the depot and the central compartment.<br /> <br /> We first need to implement this model using $\mlxtran$:<br /> <br /> <br /> {{MLXTran<br /> |name=joint1_model.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> <br /> PK:<br /> depot(type=1,target=Ad)<br /> depot(type=2,target=Ac)<br /> <br /> EQUATION:<br /> k = Cl/V<br /> ddt_Ad = -ka*Ad<br /> ddt_Ac = ka*Ad - k*Ac<br /> Cc = Ac/V<br /> h = h0*exp(gamma*Cc)<br /> &lt;/pre&gt;}}<br /> <br /> <br /> Here, an administration of type 1 (resp. 2) is an oral (resp. iv) administration.<br /> <br /> The tasks, i.e., how the model is to be used, are then coded as an $\mlxplore$ project:<br /> <br /> <br /> {{MLXPlore<br /> |name=joint1_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint1_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> ka = 0.5<br /> V = 10<br /> Cl = 0.5<br /> h0 = 0.01<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> In this example, a single dose of 50 mg is administered orally ({{Verbatim|target{{-}}Ad}} when {{Verbatim|type{{-}}1}}) at time 0. We have asked $\mlxplore$ to display the predicted concentration $Cc$ and the hazard function $h$ between $t=0$ and $t=100$ every $0.1\,h$ for a given set of parameters. We can then change the values of these parameters with the sliders to see what the impact on the two functions is.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel1.png|caption=Exploring the model using $\mlxplore$ }}<br /> <br /> <br /> We can easily modify the dose regimen without changing anything in the model itself. Suppose for instance that we want now to compare a treatment with repeated doses of 50mg every 24 hours and a treatment with repeated doses of 25mg every 12 hours. Only the section {{Verbatim|&lt;DESIGN&gt;}} needs to be modified:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint2_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=0:12:144, amount=25,type=1}<br /> &lt;/pre&gt; }}<br /> |image=[[File:exploremodel2.png]] }}<br /> <br /> <br /> We can combine different administrations (oral and intravenous for instance) into one global treatment:<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXPloreForTable<br /> |name=joint3_project.txt<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0:24:144, amount=50,type=1}<br /> adm2={time=6:48:150, amount=25,type=2}<br /> <br /> [TREATMENT]<br /> trt1={adm1, adm2}<br /> &lt;/pre&gt; }}<br /> |image= [[File:exploremodel3.png]]<br /> }}<br /> <br /> ===Exploring the statistical model===<br /> <br /> One of the main advantages of $\mlxplore$ is its ability to graphically display the predicted distribution of the functions of interest $Cc$ and $h$ when certain parameters of the model are assumed to be random variables. Assume for instance that $V$, $Cl$ and $h_0$ are log-normally distributed. To take this into account, we simply need to insert a section {{Verbatim|[INDIVIDUAL]}} into the project file:<br /> <br /> <br /> {{MLXTran<br /> |name=joint2_model.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={V_pop,Cl_pop,h0_pop,omega_V,omega_Cl,omega_h0}<br /> <br /> DEFINITION:<br /> V = {distribution=lognormal, reference=V_pop, sd=omega_V}<br /> Cl = {distribution=lognormal, reference=Cl_pop, sd=omega_Cl}<br /> h0 = {distribution=lognormal, reference=h0_pop, sd=omega_h0}<br /> <br /> [PREDICTION]<br /> input={ka, V, Cl, h0, gamma}<br /> .<br /> .<br /> .<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The parameters of the model are now the population parameters $V_{\rm pop}$, $Cl_{\rm pop}$, $h0_{\rm pop}$, $\omega_V$, $\omega_{Cl}$ and $\omega_{h_0}$ and the parameters $k_a$ and $\gamma$ which have no inter-individual variability.<br /> <br /> <br /> {{MLXTran<br /> |name=joint4_project.txt<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> &lt;MODEL&gt;<br /> file='joint2_model.txt'<br /> <br /> &lt;DESIGN&gt;<br /> [ADMINISTRATION]<br /> adm1={time=0, amount=50,type=1}<br /> <br /> &lt;PARAMETER&gt;<br /> V_pop = 10<br /> Cl_pop = 0.5<br /> h0_pop=0.01<br /> omega_V = 0.2<br /> omega_Cl = 0.3<br /> omega_h0 = 0.2<br /> ka = 0.5<br /> gamma = 0.5<br /> <br /> &lt;OUTPUT&gt;<br /> list={Cc, h}<br /> grid=0:0.1:100<br /> &lt;/pre&gt; }}<br /> <br /> <br /> When some parameters of the model are random variables, $\mlxplore$ displays the median of the predicted distribution and several prediction intervals (the default is to use different shaded areas for the 10%, 20%, ..., 90% quantiles).<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel4b.png|caption=Exploring the statistical model using $\mlxplore$}}<br /> <br /> <br /> It is possible to introduce covariates into the statistical model by considering for example that the volume depends on the weight, and considering that these covariates are themselves random variables. This may be important if we are for example looking to visualize the amount of variation in concentration due to variation in weight, and the variation in concentration which remains unaccounted for, caused by random effects.<br /> <br /> <br /> {{ImageWithCaption|image=exploremodel5.png|caption=Exploring the statistical model using $\mlxplore$ }}<br /> <br /> <br /> The $\mlxtran$ model files and the $\mlxplore$ scripts can be downloaded here: {{filepath:pk mlxplore.zip}}.<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{chatterjee2009sensitivity,<br /> title={Sensitivity analysis in linear regression},<br /> author={Chatterjee, S. and Hadi, A. S.},<br /> volume={327},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{sensibilité2013,<br /> title={Analyse de sensibilité et exploration de modèles},<br /> author={Faivre R. and Looss B. and Mah&amp;eacute;vas, S. and Makowski, D. and Monod, H.},<br /> year={2013},<br /> publisher={Editions Quae}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{MLXplore,<br /> author = {Lixoft},<br /> title = {MLXPlore 1.0},<br /> url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{macey2000berkeley,<br /> title={Berkeley Madonna user’s guide},<br /> author={Macey, R. and Oster, G. and Zahnley, T.},<br /> journal={Berkeley (CA): University of California},<br /> year={2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{popixplore,<br /> author = {POPIX Inria team},<br /> title = {Popixplore 1.0},<br /> url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2000sensitivity,<br /> title={Sensitivity analysis},<br /> author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br /> volume={134},<br /> year={2000},<br /> publisher={Wiley New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2008global,<br /> title={Global sensitivity analysis: the primer},<br /> author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br /> year={2008},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2004sensitivity,<br /> title={Sensitivity analysis in practice: a guide to assessing scientific models},<br /> author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> <br /> {{Next<br /> |link=Modeling}}</div> Admin http://wiki.webpopix.org/index.php/The_SAEM_algorithm_for_estimating_population_parameters The SAEM algorithm for estimating population parameters 2013-06-21T09:17:20Z <p>Admin : </p> <hr /> <div>==Introduction ==<br /> <br /> <br /> The SAEM (Stochastic Approximation of EM) algorithm is a stochastic algorithm for calculating the maximum likelihood estimator (MLE) in the quite general setting of incomplete data models. SAEM has been shown to be a very powerful NLMEM tool, known to accurately estimate population parameters as well as having good theoretical properties. In fact, it converges to the MLE under very general hypotheses.<br /> <br /> SAEM was first implemented in the $\monolix$ software. It has also been implemented in NONMEM, the {{Verbatim|R}} package {{Verbatim|saemix}} and the Matlab statistics toolbox as the function {{Verbatim|nlmefitsa.m}}.<br /> <br /> Here, we consider a model that includes observations $\by=(y_i , 1\leq i \leq N)$, unobserved individual parameters $\bpsi=(\psi_i , 1\leq i \leq N)$ and a vector of parameters $\theta$. By definition, the maximum likelihood estimator of $\theta$ maximizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\theta ; \by) = \py(\by ; \theta) = \displaystyle{ \int \pypsi(\by,\bpsi ; \theta) \, d \bpsi}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> SAEM is an iterative algorithm that essentially consists of constructing $N$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_1^{(k)})$, ..., $(\psi_N^{(k)})$ that converge to the conditional distributions $\pmacro(\psi_1|y_1),\ldots , \pmacro(\psi_N|y_N)$, using at each step the complete data $(\by,\bpsi^{(k)})$ to calculate a new parameter vector $\theta_k$. We will present a general description of the algorithm highlighting the connection with the EM algorithm, and present by way of a simple example how to implement SAEM and use it in practice.<br /> <br /> We will also give some extensions of the base algorithm that allow us to improve the convergence properties of the algorithm. For instance, it is possible to stabilize the algorithm's convergence by using several [http://en.wikipedia.org/wiki/Markov_chain Markov chains] per individual. Also, a simulated annealing version of SAEM allows us improve the chances of converging to the global maximum of the likelihood rather than to local maxima.<br /> <br /> <br /> &lt;br&gt;<br /> ==The EM algorithm==<br /> <br /> <br /> We first remark that if the individual parameters $\bpsi=(\psi_i)$ are observed, estimation is not thwarted by any particular problem because an estimator could be found by directly maximizing the joint distribution $\pypsi(\by,\bpsi ; \theta)$.<br /> <br /> However, since the $\psi_i$ are not observed, the EM algorithm replaces $\bpsi$ by its conditional expectation. Then, given some initial value $\theta_0$, iteration $k$ updates ${\theta}_{k-1}$ to ${\theta}_{k}$ with the two following steps:<br /> <br /> <br /> * $\textbf{E-step:}$ evaluate the quantity<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta)=\esp{\log \pmacro(\by,\bpsi;\theta){{!}} \by;\theta_{k-1} } .&lt;/math&gt; }}<br /> <br /> <br /> * $\textbf{M-step:}$ update the estimation of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_{k} = \argmax{\theta} \, Q_k(\theta) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> In can be proved that each EM iteration increases the likelihood of observations and that the EM sequence $(\theta_k)$ converges to a<br /> stationary point of the observed likelihood under mild regularity conditions.<br /> <br /> Unfortunately, in the framework of nonlinear mixed-effects models, there is no explicit expression for the E-step since the relationship between observations $\by$ and individual parameters $\bpsi$ is nonlinear. However, even though this expectation cannot be computed in a closed-form, it can be approximated by simulation. For instance,<br /> <br /> <br /> * The Monte Carlo EM (MCEM) algorithm replaces the E-step by a Monte Carlo approximation based on a large number of independent simulations of the non-observed individual parameters $\bpsi$.<br /> <br /> * The SAEM algorithm replaces the E-step by a stochastic approximation based on a single simulation of $\bpsi$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==The SAEM algorithm==<br /> <br /> At iteration $k$ of SAEM:<br /> <br /> <br /> * $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from the conditional distribution $\pmacro(\psi_i |y_i ;\theta_{k-1})$.<br /> <br /> <br /> * $\textbf{Stochastic approximation}$: update $Q_k(\theta)$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k ( \log \pmacro(\by,\bpsi^{(k)};\theta) - Q_{k-1}(\theta) ),<br /> &lt;/math&gt; }}<br /> <br /> where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $\sum_{k=1}^{\infty} \gamma_k = \infty$ and $\sum_{k=1}^{\infty} \gamma_k^2 &lt; \infty$.<br /> <br /> <br /> * $\textbf{Maximization step}$: update $\theta_{k-1}$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_{k} = \argmax{\theta} \, Q_k(\theta) .&lt;/math&gt; }}<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text= &amp;#32;<br /> * Setting $\gamma_k=1$ for all $k$ means that there is no memory in the stochastic approximation:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = \log \pmacro(\by,\bpsi^{(k)};\theta) . &lt;/math&gt; }}<br /> <br /> : This algorithm, known as Stochastic EM (SEM) thus consists of successively simulating $\bpsi^{(k)}$ with the conditional distribution $\pmacro(\bpsi^{(k)} {{!}} \by;\theta_{k-1})$, then computing $\theta_k$ by maximizing the joint distribution $\pmacro(\by,\bpsi^{(k)};\theta)$.<br /> <br /> <br /> * When the number $N$ of subjects is small, convergence of SAEM can be improved by running $L$ [http://en.wikipedia.org/wiki/Markov_chain Markov chains] for each individual instead of one. The simulation step at iteration $k$ then requires us to draw $L$ sequences ${ \phi_i^{(k,1)} } ,\ldots , { \phi_i^{(k,L)} }$ for each individual $i$ and to combine stochastic approximation and Monte Carlo in the approximation step:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; Q_k(\theta) = Q_{k-1}(\theta) + \gamma_k \left( \frac{1}{L}\sum_{\ell=1}^{L} \log \pmacro(\by,\bpsi^{(k,\ell)};\theta) - Q_{k-1}(\theta) \right) .<br /> &lt;/math&gt; }}<br /> <br /> : By default, $\monolix$ selects $L$ so that $N\times L \geq 50$.<br /> }}<br /> <br /> <br /> Implementation of SAEM is simplified when the complete model $\pmacro(\by,\bpsi;\theta)$ belongs to a regular (curved) exponential family:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pmacro(\by,\bpsi ;\theta) = \exp\left\{ - \zeta(\theta) + \langle \tilde{S}(\by,\bpsi) , \varphi(\theta) \rangle \right\} , &lt;/math&gt; }}<br /> <br /> where $\tilde{S}(\by,\bpsi)$ is a sufficient statistic of the complete model (i.e., whose value contains all the information needed to compute any estimate of $\theta$) which takes its values in an open subset ${\cal S}$ of $\Rset^m$. Then, there exists a function $\tilde{\theta}$ such that for any $s\in {\cal S}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:saem_stat&quot;&gt;&lt;math&gt;<br /> \tilde{\theta}(s) = \argmax{\theta} \left\{ - \zeta(\theta) + \langle s , \varphi(\theta) \rangle \right\} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> The approximation step of SAEM simplifies to a general Robbins-Monro-type scheme for approximating this conditional expectation:<br /> <br /> <br /> * $\textbf{Stochastic approximation}$: update $s_k$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> s_k = s_{k-1} + \gamma_k ( \tilde{S}(\by,\bpsi^{(k)}) - s_{k-1} ) . &lt;/math&gt; }}<br /> <br /> <br /> Note that the E-step of EM simplifies to computing $s_k=\esp{\tilde{S}(\by,\bpsi) | \by ; \theta_{k-1}}$.<br /> <br /> Then, both EM and SAEM algorithms use [[#eq:saem_stat|(1)]] for the M-step: $\theta_k = \tilde{\theta}(s_k)$.<br /> <br /> Precise results for convergence of SAEM were obtained in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] chapter in the case where $\pmacro(\by,\bpsi;\theta)$ belongs to a regular curved exponential family. This first version of [[The SAEM algorithm for estimating population parameters|SAEM]] and these first results assume that the individual parameters are simulated exactly under the conditional distribution at each iteration. Unfortunately, for most nonlinear models or non-Gaussian models, the unobserved data cannot be simulated exactly under this conditional distribution. A well-known alternative consists in using the Metropolis-Hastings algorithm: introduce a transition probability which has as unique invariant distribution the conditional distribution we want to simulate.<br /> <br /> In other words, the procedure consists of replacing the Simulation step of SAEM at iteration $k$ by $m$ iterations of the<br /> Metropolis-Hastings (MH) algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] section. It was shown in the [[Estimation of the observed Fisher information matrix#Estimation using linearization of the model|Estimation of the F.I.M. using a linearization of the model]] section that [[The SAEM algorithm for estimating population parameters|SAEM]] still converges under general conditions when coupled with a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] Monte Carlo procedure.<br /> <br /> <br /> {{Remarks<br /> |title= Remark<br /> |text= Convergence of the [http://en.wikipedia.org/wiki/Markov_chain Markov chains] $(\psi_i^{(k)})$ is not necessary at each SAEM iteration. It suffices to run a few MH iterations with various transition kernels before resetting $\theta_{k-1}$. In $\monolix$ by default, three transition kernels are used twice each, successively, in each SAEM iteration.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Implementing SAEM ==<br /> <br /> Implementation of SAEM can be difficult to describe when looking at complex statistical models such as mixture models, models with inter-occasion variability, etc. We are therefore going to limit ourselves to looking at some basic models in order to illustrate how SAEM can be implemented.<br /> <br /> &lt;br&gt;<br /> ===SAEM for general hierarchical models===<br /> <br /> Consider first a very general model for any type (continuous, categorical, survival, etc.) of data $(y_i)$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} y_i {{!}} \psi_i &amp;\sim&amp; \pcyipsii(y_i {{!}} \psi_i) \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}( \mu , \Omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^\transpose$ is a $d$-vector of (transformed) individual parameters, $\mu$ a $d$-vector of fixed effects and $\Omega$ a $d\times d$ variance-covariance matrix.<br /> <br /> We assume here that $\Omega$ is positive-definite. Then, a sufficient statistic for the complete model $\pmacro(\by,\bpsi;\theta)$ is<br /> $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$, where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \tilde{S}_1(\bpsi) &amp;= &amp; \sum_{i=1}^N h(\psi_i) \\<br /> \tilde{S}_2(\bpsi) &amp;= &amp; \sum_{i=1}^N h(\psi_i) h(\psi_i)^\transpose .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> At iteration $k$ of SAEM, we have:<br /> <br /> <br /> * $\textbf{Simulation step}$: for $i=1,2,\ldots, N$, draw $\psi_i^{(k)}$ from $m$ iterations of the MH algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters|The Metropolis-Hastings algorithm]] with $\pmacro(\psi_i |y_i ;\mu_{k-1},\Omega_{k-1})$ as limiting distribution.<br /> <br /> * $\textbf{Stochastic approximation}$: update $s_k=(s_{k,1},s_{k,2})$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> s_{k,1} &amp;=&amp; s_{k-1,1} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)}) - s_{k-1,1} \right) \\<br /> s_{k,2} &amp;=&amp; s_{k-1,2} + \gamma_k \left( \sum_{i=1}^N h(\psi_i^{(k)})h(\psi_i^{(k)})^\transpose - s_{k-1,2} \right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> * $\textbf{Maximization step}$: update $(\mu_{k-1},\Omega_{k-1})$ according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_{k} &amp;=&amp; \frac{1}{N} s_{k,1} \\<br /> \Omega_k &amp;=&amp; \frac{1}{N}\left( s_{k,2} - s_{k,1}s_{k,1}^\transpose \right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> What is remarkable is that it suffices to be able to calculate $\pcyipsii(y_i | \psi_i)$ for all $\psi_i$ and $y_i$ in order to be able to run SAEM. In effect, this allows the simulation step to be run using MH since the acceptance probabilities can be calculated.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===SAEM for continuous data models===<br /> Consider now a continuous data model in which the residual error variance is now constant:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij},\phi_i) + a \teps_{ij} \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}( \mu , \Omega) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> Here, the individual parameters are $\psi_i=(\phi_i,a)$. The variance-covariance matrix for $\psi_i$ is not positive-definite in this case because $a$ has no variability. If we suppose that the variance matrix $\Omega$ is positive-definite, then noting $\theta=(\mu,\Omega,a)$, a natural decomposition of the model is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pmacro(\by,\bpsi;\theta) = \pmacro(\by {{!}} \bpsi;a)\pmacro(\bpsi;\mu,\Omega) .<br /> &lt;/math&gt; }}<br /> <br /> The previous statistic $\tilde{S}(\bpsi) = (\tilde{S}_1(\bpsi),\tilde{S}_2(\bpsi))$ is not sufficient for estimating $a$. Indeed, we need an additional component which is a function both of $\by$ and $\bpsi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \tilde{S}_3(\by, \bpsi) =\sum_{i=1}^N \sum_{j=1}^{n_i}(y_{ij} - f(t_{ij},\psi_i))^2. &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> s_{k,3} &amp;=&amp; s_{k-1,3} + \gamma_k ( \tilde{S}_3(\by, \bpsi) - s_{k-1,3} ) \\<br /> a_k^2 &amp;=&amp; \displaystyle{ \frac{1}{\sum_{i=1}^N n_i} s_{k,3} }\ .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The choice of step-size $(\gamma_k)$ is extremely important for ensuring convergence of SAEM. The sequence $(\gamma_k)$ used in $\monolix$ decreases like $k^{-\alpha}$. We recommend using $\alpha=0$ (that is, $\gamma_k=1$) during the first $K_1$ iterations, in order to converge quickly to a neighborhood of a maximum of the likelihood, and $\alpha=1$ during the next $K_2$ iterations.<br /> Indeed, the initial guess $\theta_0$ may be far from the maximum likelihood value we are looking for, and the first iterations with $\gamma_k=1$ allow SAEM to converge quickly to a neighborhood of this value. Following this, smaller step-sizes ensure the<br /> almost sure convergence of the algorithm to the maximum likelihood estimator.<br /> <br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Consider a simple model for continuous data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\cal N}(A_i\,e^{-k_i \, t_{ij} } , a^2) \\<br /> \log(A_i)&amp;\sim&amp;{\cal N}(\log(A_{\rm pop}) , \omega_A^2) \\<br /> \log(k_i)&amp;\sim&amp;{\cal N}(\log(k_{\rm pop}) , \omega_k^2) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $A_{\rm pop}=6$, $k_{\rm pop}=0.25$, $\omega_A=0.3$, $\omega_k=0.3$ and $a=0.2$.<br /> Let us look at the effect of different settings for $(\gamma_k)$ (and $L$) for estimating the population parameters of the model with SAEM.<br /> <br /> <br /> 1. For all $k$, $\gamma_k = 1$: the sequence $(\theta_{k})$ converges very quickly to a neighborhood of the &quot;solution&quot;. The sequence $(\theta_{k})$ is a homogeneous Markov Chain that converges in distribution but does not converge almost surely. <br /> <br /> [[File:saem1.png|link=]]<br /> <br /> <br /> 2. For all $k$, $\gamma_k = 1/k$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, but very slowly. <br /> <br /> [[File:saem2.png|link=]]<br /> <br /> <br /> 3. $\gamma_k = 1$, $k=1$, ...,$40$, $\gamma_k = 1/(k-40)$, $k \geq 41$: the sequence $(\theta_{k})$ converges almost surely to the maximum likelihood estimate of $\theta$, and quickly.<br /> <br /> [[File:saem3.png|link=]]<br /> <br /> <br /> 4. $L=10$, $\gamma_k = 1$, $k \geq 1$: the sequence $(\theta_{k})$ is an homogeneous Markov chain that converges in distribution, as in Example 1, but the variance is reduced by a factor $\sqrt{10}$; in this case, SAEM behaves like EM. <br /> <br /> [[File:saem4.png|link=]]<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==A simple example to understand why SAEM converges in practice==<br /> <br /> <br /> Let us look at a very simple Gaussian model, with only one observation per individual:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi_i &amp;\sim&amp; {\cal N}(\theta,\omega^2) , \ \ \ 1 \leq i \leq N \\<br /> y_i &amp;\sim&amp; {\cal N}(\psi_i,\sigma^2).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We will furthermore assume that both $\omega^2$ and $\sigma^2$ are known.<br /> <br /> Here, the maximum likelihood estimator $\hat{\theta}$ of $\theta$ is easy to compute since $y_i \sim_{i.i.d.} {\cal N}(\theta,\omega^2+\sigma^2)$. We find that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \hat{\theta} = \displaystyle{\frac{1}{N} }\sum_{i=1}^{N} y_i .<br /> &lt;/math&gt;}}<br /> <br /> We now propose to try and compute $\hat{\theta}$ using SAEM instead. The simulation step is straightforward since the conditional distribution of $\psi_i$ is a normal distribution:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \psi_i {{!}} y_i \sim {\cal N}(a \theta + (1-a)y_i , \gamma^2) ,<br /> &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a &amp;= &amp; \displaystyle{ \frac{1}{\omega^2} } \left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1} \\<br /> \gamma^2 &amp;= &amp;\left(\displaystyle{ \frac{1}{\sigma^2} }+ \displaystyle{\frac{1}{\omega^2} }\right)^{-1}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The maximization step is also straightforward. Indeed, a sufficient statistic for estimating $\theta$ is<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\cal S}(\bpsi) = \sum_{i=1}^{N} \psi_i. &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \tilde{\theta}({\cal S(\bpsi)} ) &amp;=&amp; \argmax{\theta} \pmacro(y_1,\ldots,y_N,\psi_1,\ldots,\psi_N;\theta) \\<br /> &amp;=&amp; \argmax{\theta} \pmacro(\psi_1,\ldots,\psi_N;\theta) \\<br /> &amp;=&amp; \frac{ {\cal S}(\bpsi)}{N}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let us first look at the behavior of SAEM when $\gamma_k=1$. At iteration $k$,<br /> <br /> <br /> * Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2).$<br /> <br /> * Maximization step: $\theta_k = \displaystyle{ \frac{ {\cal S}(\bpsi^{(k)})}{N} } = \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)}$.<br /> <br /> <br /> It can be shown that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_k - \hat{\theta} = a(\theta_{k-1} - \hat{\theta}) + e_k ,<br /> &lt;/math&gt; }}<br /> <br /> where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ is an autoregressive process of order 1 (AR(1)) which converges in distribution to a normal distribution when $k\to \infty$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\theta_k \limite{}{\cal D} {\cal N}\left(\hat{\theta} , \displaystyle{ \frac{\gamma^2}{N(1-a^2)} }\right) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> {{ImageWithCaption|image=saemb1.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1$ for $1\leq k \leq 50$ }} <br /> <br /> <br /> Now, let us see what happens instead when $\gamma_k$ decreases like $1/k$. At iteration $k$,<br /> <br /> <br /> * Simulation step: $\psi_i^{(k)} \sim {\cal N}( a \theta_{k-1} + (1-a)y_i , \gamma^2)$<br /> <br /> * Maximization step:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\theta_k = \theta_{k-1} + \displaystyle{ \frac{1}{k} }\left( \displaystyle{ \frac{1}{N} }\sum_{i=1}^{N} \psi_i^{(k)} -\theta_{k-1} \right). <br /> &lt;/math&gt; }}<br /> <br /> <br /> : Here, we can show that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \theta_k - \hat{\theta} = \displaystyle{ \frac{k-a}{k} }(\theta_{k-1} - \hat{\theta}) + \displaystyle{\frac{e_k}{k} }, <br /> &lt;/math&gt; }}<br /> <br /> : where $e_k \sim {\cal N}(0, \gamma^2 /N)$. Then, the sequence $(\theta_k)$ converges almost surely to $\hat{\theta}$.<br /> <br /> <br /> {{ImageWithCaption|image=saemb2.png|caption=10 sequences $(\theta_k)$ obtained with different initial values and $\gamma_k=1/k$ for $1\leq k \leq 50$ }}<br /> <br /> <br /> Thus, we see that by combining the two strategies, the sequence $(\theta_k)$ is a Markov chain that converges to a random walk around $\hat{\theta}$ during the first $K_1$ iterations, then converges almost surely to $\hat{\theta}$ during the next $K_2$ iterations.<br /> <br /> <br /> {{ImageWithCaption|image=saemb3.png|caption=10 sequences $(\theta_k)$ obtained with different initial values, $\gamma_k=1$ for $1\leq k \leq 20$ and $\gamma_k=1/(k-20)$ for $21\leq k \leq 50$ }}<br /> <br /> <br /> {{ShowVideo|image=saem5b.png|video=http://popix.lixoft.net/images/2/20/saem.mp4|caption=The SAEM algorithm in practice. }}<br /> <br /> &lt;!-- {{ImageWithCaptionL|image=saem5.png|size=750px|caption= The SAEM algorithm in practice. (a) the observations and the initialization $p_0(\psi_i)$, (b) the initialization $p_0(\psi_i)$ and the conditional distributions of the observations $p(y_i{{!}}\psi_i)$, (c) the conditional distributions $p_0(\psi_i{{!}}y_i)$ and the simulated individual parameters $(\psi_i^{(1)})$, (d) the updated distribution $p_1(\psi_i)$. }} --&gt;<br /> <br /> ==A simulated annealing version of SAEM==<br /> <br /> <br /> Convergence of SAEM can strongly depend on the initial guess when the likelihood ${\like}$ has several local maxima. A simulated annealing version of SAEM can improve convergence of the algorithm toward the global maximum of ${\like}$.<br /> <br /> To detail this, we can first rewrite the joint pdf of $(\by,\bpsi)$ as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{-U(\by,\bpsi;\theta)\right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $C(\theta)$ is a normalizing constant that only depends on $\theta$. Then, for any &quot;temperature&quot; $T\geq0$, we consider the complete model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pmacro_T(\by,\bpsi;\theta) = C_T(\theta)\, \exp \left\{-\displaystyle{\frac{1}{T} }U(\by,\bpsi;\theta) \right\} ,<br /> &lt;/math&gt; }}<br /> <br /> where $C_T(\theta)$ is still a normalizing constant.<br /> <br /> We then introduce a decreasing temperature sequence $(T_k, 1\leq k \leq K)$ and use the SAEM algorithm on the complete model $\pmacro_{T_k}(\by,\bpsi;\theta)$ at iteration $k$ (the usual version of SAEM uses $T_k=1$ at each iteration). The sequence $(T_k)$ is chosen to have large positive values during the first iterations, then decrease with an exponential rate to 1: $T_k = \max(1, \tau \ T_{k-1})$.<br /> <br /> Consider for example the following model for continuous data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\cal N}(f(t_{ij};\psi_i) , a^2) \\<br /> h(\psi_i) &amp;\sim&amp; {\cal N}(\mu , \Omega) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, $\theta = (\mu,\Omega,a^2)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pmacro(\by,\bpsi;\theta) = C(\theta)\, \exp \left\{- \displaystyle{ \frac{1}{2 a^2} }\sum_{i=1}^N \sum_{j=1}^{n_i} (y_{ij} - f(t_{ij};\psi_i))^2 - \displaystyle{ \frac{1}{2} } \sum_{i=1}^N (h(\psi_i)-\mu)^\transpose \Omega (h(\psi_i)-\mu) \right\},<br /> &lt;/math&gt; }}<br /> <br /> where $C(\theta)$ is a normalizing constant that only depends on $a$ and $\Omega$.<br /> <br /> <br /> We see that $\pmacro_T(\by,\bpsi;\theta)$ will also be a normal distribution whose residual error variance $a^2$ is replaced by $T a^2$ and variance matrix $\Omega$ for the random effects by $T\Omega$.<br /> In other words, a model with a &quot;large temperature&quot; is a model with large variances.<br /> <br /> The algorithm therefore consists in choosing large initial variances $\Omega_0$ and $a^2_0$ (that include the initial temperature $T_0$ implicitly) and setting $a^2_k = \max(\tau \ a^2_{k-1} , \hat{a}(\by,\bpsi^{(k)})$ and $\Omega_k = \max(\tau \ \Omega_{k-1} , \hat{\Omega}(\bpsi^{(k)})$ during the first iterations. Here, $0\leq\tau\leq 1$.<br /> <br /> These large values of the variance make the conditional distributions $\pmacro_T(\psi_i | y_i;\theta)$ less concentrated around their modes, and thus allow the sequence $(\theta_k)$ to &quot;escape&quot; from local maxima of the likelihood during the first iterations of SAEM and converge to a neighborhood of the global maximum of ${\like}$.<br /> After these initial iterations, the usual SAEM algorithm is used to estimate these variances at each iteration.<br /> <br /> <br /> {{Remarks<br /> |title= Remark<br /> |text= We can use two different coefficients $\tau_1$ and $\tau_2$ for $\Omega$ and $a^2$ in $\monolix$. It is possible, for example, to choose $\tau_1&lt;1$ and $\tau_2&gt;1$, with large initial inter-subject variances $\Omega_0$ and small initial residual variance $a^2_0$. In this case, SAEM tries to obtain the best possible fit during the first iterations, allowing for a large inter-subject variability. During the next iterations, this variability is reduced and the residual variance increases until reaching the best possible trade-off between the two criteria.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=A PK example<br /> |text= <br /> <br /> Consider a simple one-compartment model for oral administration:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:saem_sa&quot;&gt;&lt;math&gt;<br /> f(t;ka,V,k) = \displaystyle{ \frac{D\, ka}{V(ka-ke)} }\left( e^{-ke \, t} - e^{-ka \, t} \right) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> We then simulate PK data from 80 patients using the following population PK parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; ka_{\rm pop} = 1, \quad V_{\rm pop}=8, \quad ke_{\rm pop}=0.25 .&lt;/math&gt; }}<br /> <br /> We can see that the following parametrization gives the same prediction as the one given in [[#eq:saem_sa|(2)]]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \tilde{ka} = ke, \quad \tilde{V}=V \times ke/ka, \quad \tilde{ke}=ka . &lt;/math&gt; }}<br /> <br /> We can then expect a (global) maximum around $(ka,V,ke) = (1, \ 8, \ 0.25)$ and a (local) maximum of the likelihood around $(ka,V,ke) = (0.25, \ 2, \ 1).$<br /> <br /> The figure below displays the convergence of SAEM without simulated annealing to a local maximum of the likelihood (deviance = $-2\,\log {\like} =816$). The initial values of the population parameters we chose were $(ka_0,V_0,k_0) = (1,1,1)$.<br /> <br /> :{{ImageWithCaption_special|image=recuit1.png|caption=Convergence of SAEM to a local maxima of the likelihood}} <br /> <br /> Using the same initial guess, the simulated annealing version of SAEM converges to the global maximum of the likelihood (deviance = 734).<br /> <br /> :{{ImageWithCaption_special|image=recuit2.png|caption=Convergence of SAEM to the global maxima of the likelihood }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{allassonniere2010construction,<br /> title={Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study},<br /> author={Allassonnière, S. and Kuhn, E. and Trouvé, A.},<br /> journal={Bernoulli},<br /> volume={16},<br /> number={3},<br /> pages={641--678},<br /> year={2010},<br /> publisher={Bernoulli Society for Mathematical Statistics and Probability}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> volume={56},<br /> pages={2073-2085}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2013sde,<br /> title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Statistics and its interfaces},<br /> year={2013},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delyon1999convergence,<br /> title={Convergence of a stochastic approximation version of the EM algorithm},<br /> author={Delyon, B. and Lavielle, M. and Moulines, E.},<br /> journal={Annals of Statistics},<br /> pages={94-128},<br /> year={1999},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{dempster1977maximum,<br /> title={Maximum likelihood from incomplete data via the EM algorithm},<br /> author={Dempster, A.P. and Laird, N.M. and Rubin, D.B.},<br /> journal={Journal of the Royal Statistical Society. Series B (Methodological)},<br /> pages={1-38},<br /> year={1977},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{kuhn2004coupling,<br /> title={Coupling a stochastic approximation version of EM with an MCMC procedure},<br /> author={Kuhn, E. and Lavielle, M.},<br /> journal={ESAIM: Probability and Statistics},<br /> volume={8},<br /> pages={115-131},<br /> year={2004},<br /> publisher={EDP Sciences, 17 Avenue du Hoggar Les Ulis Cedex A BP 112 91944 France}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lavielle2013improved,<br /> title={An improved SAEM algorithm for maximum likelihood estimation in mixtures of non linear mixed effects models},<br /> author={Lavielle, M. and Mbogning, C.},<br /> journal={Statistics and Computing},<br /> year={2013},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mclachlan2007algorithm,<br /> title={The EM algorithm and extensions},<br /> author={McLachlan, G.J. and Krishnan, T.},<br /> volume={382},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{samson2006extension,<br /> title={Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model},<br /> author={Samson, A. and Lavielle, M. and Mentr&amp;eacute;, F.},<br /> journal={Computational statistics &amp; data analysis},<br /> volume={51},<br /> number={3},<br /> pages={1562-1574},<br /> year={2006},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wei1990monte,<br /> title={A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms},<br /> author={Wei, G. and Tanner, M.},<br /> journal={Journal of the American Statistical Association},<br /> volume={85},<br /> number={411},<br /> pages={699-704},<br /> year={1990},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wu1983convergence,<br /> title={On the convergence properties of the EM algorithm},<br /> author={Wu, C.F.},<br /> journal={The Annals of Statistics},<br /> volume={11},<br /> number={1},<br /> pages={95-103},<br /> year={1983},<br /> publisher={Institute of Mathematical Statistics}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Introduction and notation<br /> |linkNext=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div> Admin http://wiki.webpopix.org/index.php/Hidden_Markov_models Hidden Markov models 2013-06-21T09:13:17Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> [http://en.wikipedia.org/wiki/Markov_chain Markov chains] are a useful tool for analyzing categorical longitudinal data. However, sometimes the [https://en.wikipedia.org/wiki/Markov_process Markov process] cannot be directly observed, though some output, dependent on the<br /> (hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br /> HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br /> <br /> Here, we will consider a parametric framework with [http://en.wikipedia.org/wiki/Markov_chain Markov chains] in a discrete and finite state space $\mathbf{K} = \{1,\ldots,K\}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed hidden Markov models==<br /> <br /> <br /> HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes $(z_j,y_j, j=1,2,\ldots)$, where the latent sequence $(z_j)$ is a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] and where the distribution of the observation $y_j$ at time $t_j$ depends on the state $z_j$.<br /> <br /> <br /> {{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br /> <br /> <br /> In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br /> Let $y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$ and $z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$ denote respectively the sequences of observations and hidden states for individual $i$.<br /> <br /> We suppose that the joint distribution of $(z_i,y_i)$ is a parametric distribution that depends on a vector of parameters $\psi_i$ and can be decomposed as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:hmm1&quot;&gt;&lt;math&gt;<br /> \pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> For each individual $i$, $z_i$ is a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] whose probability distribution is defined by<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$ of the first state $z_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that $q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br /> <br /> <br /> The conditional distribution $\qcyizpsii$ depends on the model for the observations: for each state, observation $y_{ij}$ has a certain distribution. Let us see some examples:<br /> <br /> <br /> &lt;br&gt;<br /> === Examples ===<br /> <br /> <br /> 1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between $K$ possible residual error models.<br /> <br /> <br /> {{Example<br /> |title=Example 1<br /> |text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br /> <br /> <br /> ::[[File:hmm1bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> 2. In a Poisson model for count data, the Poisson parameter might randomly switch between $K$ intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br /> <br /> <br /> {{Example<br /> |title=Example 2<br /> |text= Instead of assuming a single Poisson distribution for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> :: [[File:hmm2bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Distributions of observations==<br /> <br /> <br /> Assuming that the $N$ individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Then, computing the conditional distribution of the observations $\qcyipsii$ for any individual $i$ requires integration of the joint conditional distribution $\qcyzipsii$ over the states:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Though this looks complicated, it turns out that forward recursion of the [http://en.wikipedia.org/wiki/Baum-Welch_algorithm Baum-Welch algorithm] provides a quick way to numerically compute it.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{Albert1991,<br /> title = &quot;A two state Markov mixture model for a time series of epileptic seizure counts&quot;,<br /> author = &quot;Albert, P. S.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;47&quot;,<br /> year = &quot;1991&quot;,<br /> pages = &quot;1371-1381&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Altman2007,<br /> title = &quot;Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting&quot;,<br /> author = &quot;Altman, R. M.&quot;,<br /> journal = &quot;Journal of the American Statistical Association&quot;,<br /> volume = &quot;102&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;201-210&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Anisimov2007,<br /> title = &quot;Analysis of responses in migraine modelling using hidden Markov models&quot;,<br /> author = &quot;Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.&quot;,<br /> journal = &quot;Statistics in Medicine&quot;,<br /> volume = &quot;26&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;4163-4178&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{Cappe2005,<br /> author = &quot;Capp&amp;eacute;e, O. and Moulines, E. and Ryd&amp;eacute;en, T.&quot;,<br /> title = &quot;Inference in hidden Markov models&quot;,<br /> year = &quot;2005&quot;,<br /> publisher= &quot;Springer Series in Statistics&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{ChaubertPereira2011,<br /> title = &quot;Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br /> Forest Tree Growth Components&quot;,<br /> author = &quot;Chaubert-Pereira, F. and Gu&amp;eacute;don, Y. and Lavergne, C. and Trottier, C.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;66&quot;,<br /> year = &quot;2011&quot;,<br /> pages = &quot;753-762&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012analysis,<br /> title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br /> author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> pages={1-9},<br /> year={2012},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Maruotti2009,<br /> title = &quot;A semiparametric approach to hidden Markov models under longitudinal<br /> observations&quot;,<br /> author = &quot;Maruotti, A. and Ryd&amp;eacute;en, T.&quot;,<br /> journal = &quot;Statistics and Computing&quot;,<br /> volume = &quot;19&quot;,<br /> year = &quot;2009&quot;,<br /> pages = &quot;381-393&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rabiner1989,<br /> title = &quot;A tutorial on Hidden Markov Models and selected applications in speech recognition&quot;,<br /> author = &quot;Rabiner, L. R.&quot;,<br /> journal = &quot;Proceedings of the IEEE&quot;,<br /> volume = &quot;77&quot;,<br /> year = &quot;1989&quot;,<br /> pages = &quot;257-286&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rijmen2008,<br /> title = &quot;Qualitative longitudinal analysis of symptoms in patients with primary<br /> and metastatic brain tumours&quot;,<br /> author = &quot;Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.&quot;,<br /> journal = &quot;Journal of the Royal Statistical Society - Series A.&quot;,<br /> volume = &quot;171, Part 3&quot;,<br /> year = &quot;2008&quot;,<br /> pages = &quot;739-753&quot;}<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Mixture models<br /> |linkNext= Stochastic differential equations based models }}</div> Admin http://wiki.webpopix.org/index.php/Models_for_count_data Models for count data 2013-06-21T09:05:11Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.<br /> <br /> For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.<br /> <br /> We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.<br /> <br /> If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br /> <br /> Indeed, here we will only consider parametric distributions. In this context, building a model means defining:<br /> <br /> <br /> &lt;ul&gt;<br /> * the parameter function (or &quot;intensity&quot;) $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.&lt;br&gt;<br /> <br /> * the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of the observations is therefore written:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. &lt;/math&gt; }} <br /> <br /> <br /> {{Example<br /> |title=Example<br /> <br /> |text= Let us illustrate this approach for the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution].<br /> A [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with intensity $\lambda$ is defined by its probability mass function:<br /> <br /> {{Equation1|equation=&lt;math&gt; \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. &lt;/math&gt;}}<br /> <br /> <br /> ::[[File:poisson1.png|link=]]<br /> <br /> <br /> One of the main property of the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is that $\lambda$ is both the mean and the variance of the distribution:<br /> <br /> {{Equation1|equation=&lt;math&gt;\esp{y} = \var{y} = \lambda &lt;/math&gt;}}<br /> <br /> All that remains is to define the Poisson intensity function $\lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> There are many variations of the Poisson model:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$. <br /> &lt;br&gt;&lt;br&gt;<br /> * ''Non-homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> * ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an &quot;Imax&quot; model:<br /> <br /> {{Equation1|equation=&lt;math&gt; <br /> \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br /> &lt;/math&gt; }}<br /> <br /> : where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br /> <br /> : This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.&lt;br&gt;<br /> <br /> <br /> * Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> * If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see [[Models for time-to-event data]] section).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let us see now some other examples of distributions for count data:<br /> <br /> <br /> &lt;ul&gt;<br /> * The inflated [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br /> p_0 + (1-p_0)e^{-\lambda} &amp; {\rm if } \ k=0 \\<br /> (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } &amp; {\rm if } \ k&gt;0 .<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;}}<br /> <br /> :where $0\leq p_0 &lt;1$. This is useful when data seem generally to follow a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] except for having an overly large quantity of cases when $k=0$:<br /> <br /> <br /> ::[[File:poisson2.png|link=]]<br /> <br /> <br /> * The [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial distribution] is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br /> &lt;/math&gt;}}<br /> <br /> :with $0\leq p \leq 1$ and $r&gt;0$. If $r$ is an integer, then the [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial (NB) distribution] with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] with probability of success $p$ before $r$ failures occur.<br /> <br /> <br /> ::[[File:poisson3.png|link=]]<br /> <br /> <br /> * The generalized [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br /> &lt;/math&gt; }}<br /> <br /> :with $\lambda&gt;0$ and $0\leq \delta &lt;1$.<br /> :The generalized Poisson (GP) distribution includes the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray} \esp{y} &amp;=&amp; \frac{\lambda}{1-\delta} \\<br /> \var{y} &amp;=&amp; \frac{\lambda}{1-\delta^3}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> ::[[File:poisson4.png|link=]]<br /> &lt;ul&gt;<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -----------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text=<br /> For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> - the probability mass function used to represent the distribution of the data in a given time interval<br /> &lt;br&gt;&lt;br&gt;<br /> - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == $\mlxtran$ for count data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2: <br /> |title2= generalized Poisson model<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\lambda_i,\delta_i) \\<br /> \log\left( \prob{y_{ij}=k} \right) &amp;=&amp; \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br /> &amp;&amp; -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {dlt, lbd}<br /> <br /> DEFINITION:<br /> Y = {<br /> type = count,<br /> log(P(Y=k)) = log(lambda)<br /> + (k-1)*log(lambda+k*delta)<br /> - lambda -k*delta - factln(k)<br /> } &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{blundell2002individual,<br /> title={Individual effects and dynamics in count data models},<br /> author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br /> journal={Journal of Econometrics},<br /> volume={108},<br /> number={1},<br /> pages={113-131},<br /> year={2002},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{cameron1998regression,<br /> title={Regression analysis of count data},<br /> author={Cameron, A. C. and Trivedi, P. K.},<br /> volume={30},<br /> year={1998},<br /> publisher={Cambridge University Press}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{christensen2002bayesian,<br /> title={Bayesian prediction of spatial count data using generalized linear mixed models},<br /> author={Christensen, O. F. and Waagepetersen, R.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={2},<br /> pages={280-286},<br /> year={2002},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fahrmeir1994multivariate,<br /> title={Multivariate statistical modelling based on generalized linear models},<br /> author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br /> volume={2},<br /> year={1994},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hall2004zero,<br /> title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br /> author={Hall, D. B.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={4},<br /> pages={103--1039},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{heilbron2007zero,<br /> title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br /> author={Heilbron, D. C.},<br /> journal={Biometrical Journal},<br /> volume={36},<br /> number={5},<br /> pages={531-547},<br /> year={2007},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lawless1987negative,<br /> title={Negative binomial and mixed Poisson regression},<br /> author={Lawless, J. F.},<br /> journal={Canadian Journal of Statistics},<br /> volume={15},<br /> number={3},<br /> pages={209-225},<br /> year={1987},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lee2006multi,<br /> title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br /> author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br /> journal={Statistical Methods in Medical Research},<br /> volume={15},<br /> number={1},<br /> pages={47-61},<br /> year={2006},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{min2005random,<br /> title={Random effect models for repeated measures of zero-inflated count data},<br /> author={Min, Y. and Agresti, A.},<br /> journal={Statistical Modelling},<br /> volume={5},<br /> number={1},<br /> pages={1-19},<br /> year={2005},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mullahy1998heterogeneity,<br /> title={Heterogeneity, excess zeros, and the structure of count data models},<br /> author={Mullahy, J.},<br /> journal={Journal of Applied Econometrics},<br /> volume={12},<br /> number={3},<br /> pages={337-350},<br /> year={1998},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{savic2009performance,<br /> title={Performance in population models for count data, part ii: A new saem algorithm},<br /> author={Savic, R. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={36},<br /> number={4},<br /> pages={367-379},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1988mixed,<br /> title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br /> author={Thall, P. F.},<br /> journal={Biometrics},<br /> pages={197-209},<br /> year={1988},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1990some,<br /> title={Some covariance models for longitudinal count data with overdispersion},<br /> author={Thall, P. F. and Vail, S. C.},<br /> journal={Biometrics},<br /> pages={657-671},<br /> year={1990},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tempelman1996mixed,<br /> title={A mixed effects model for overdispersed count data in animal breeding},<br /> author={Tempelman, R. J. and Gianola, D.},<br /> journal={Biometrics},<br /> pages={265-279},<br /> year={1996},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{winkelmann2008econometric,<br /> title={Econometric analysis of count data},<br /> author={Winkelmann, R.},<br /> year={2008},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{yau2003zero,<br /> title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br /> author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br /> journal={Biometrical Journal},<br /> volume={45},<br /> number={4},<br /> pages={437-452},<br /> year={2003},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zeileis2008regression,<br /> title={Regression models for count data in R},<br /> author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br /> journal={Journal of Statistical Software},<br /> volume={27},<br /> number={8},<br /> pages={1-25},<br /> year={2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=Continuous data models<br /> |linkNext=Model for categorical data }}</div> Admin http://wiki.webpopix.org/index.php/Continuous_data_models Continuous data models 2013-06-21T09:02:40Z <p>Admin : /* Distribution of the standardized residual errors */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The data ==<br /> <br /> Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br /> <br /> The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br /> <br /> <br /> {| cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width:60%&quot; align=&quot;center&quot;| <br /> :[[File:continuous_graf0a_1.png|link=]]<br /> | style=&quot;width: 40%&quot; align=&quot;left&quot;| <br /> {| class=&quot;wikitable&quot; style=&quot;width: 70%;font-size:7pt; &quot;<br /> !| ID || TIME ||CONCENTRATION<br /> |- <br /> |1 || 1.0 || 9.84 <br /> |-<br /> |1 || 2.0 || 8.19 <br /> |-<br /> |1 || 4.0 || 6.91 <br /> |-<br /> |1 || 8.0 || 3.71 <br /> |-<br /> |1 || 12.0 || 1.25 <br /> |-<br /> |2 || 1.0 || 17.23 <br /> |-<br /> |2 || 3.0 || 11.14 <br /> |-<br /> |2 || 5.0 || 4.35 <br /> |-<br /> |2 || 10.0 || 2.92 <br /> |-<br /> |3 || 2.0 || 9.78 <br /> |-<br /> |3 || 3.0 || 10.40 <br /> |-<br /> |3 || 4.0 || 7.67 <br /> |-<br /> |3 || 6.0 || 6.84 <br /> |-<br /> |3 || 11.0 || 1.10 <br /> |-<br /> |4 || 4.0 || 8.78 <br /> |-<br /> |4 || 6.0 || 3.87 <br /> |-<br /> |4 || 12.0 || 1.85 <br /> |}<br /> |}<br /> <br /> <br /> Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br /> <br /> <br /> ::[[File:continuous_graf0b_1.png|link=]]<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The model ==<br /> <br /> <br /> For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;nlme&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1)<br /> }}<br /> <br /> where $g(t_{ij},\psi_i)\geq 0$.<br /> <br /> Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br /> In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &amp;=&amp; f(t_{ij},\psi_i) \\ <br /> \std{y_{ij} {{!}} \psi_i} &amp;=&amp; g(t_{ij},\psi_i).<br /> \end{eqnarray}&lt;/math&gt;}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The structural model == <br /> <br /> <br /> $f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br /> <br /> In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br /> Here are some example curves for various combinations of $A$ and $\alpha$:<br /> <br /> <br /> ::[[File:continuous_graf1bis.png|link=]]<br /> <br /> <br /> Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == The residual error model ==<br /> <br /> <br /> For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br /> <br /> <br /> &lt;ul&gt;<br /> * A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme1&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br /> \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> :The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br /> <br /> <br /> ::[[File: continuous_graf2a1.png|link=]]<br /> <br /> <br /> :Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br /> <br /> <br /> &lt;ul&gt;<br /> ::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br /> &lt;br&gt;<br /> ::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br /> &lt;/ul&gt;<br /> <br /> <br /> ::[[File:continuous_graf2a2.png|link=]]<br /> <br /> <br /> :These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br /> <br /> <br /> * A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br /> <br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme2&quot;&gt;&lt;math&gt; y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br /> \quad \ 1 \leq j \leq n_i . &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> :The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br /> <br /> <br /> ::[[File:continuous_graf2b.png|link=]]<br /> <br /> <br /> * A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1&gt;0$ and $b_i&gt;0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br /> <br /> <br /> ::[[File:continuous_graf2c.png|link=]]<br /> <br /> <br /> * Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br /> <br /> <br /> ::[[File:continuous_graf2d.png|link=]]<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extension to autocorrelated errors == <br /> <br /> <br /> For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process.<br /> For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;autocorr1&quot;&gt;&lt;math&gt; {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> where $0\leq \rho_i &lt;1$ for each individual $i$.<br /> If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{array}<br /> \gamma(\tau) &amp;=&amp; {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &amp;= &amp;\rho_i^{\tau} .<br /> \end{array}&lt;/math&gt; }}<br /> <br /> The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br /> <br /> <br /> ::[[File:continuousGraf3.png|link=]]<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Distribution of the standardized residual errors ==<br /> <br /> <br /> The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br /> Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}&lt;-u}=\prob{\teps_{ij}&gt;u}$ for all $u\in \Rset$.<br /> Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}&gt;f(t_{ij},\psi_i)} = \prob{y_{ij}&lt;f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br /> <br /> A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br /> <br /> Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br /> <br /> <br /> ::[[File:continuous_graf4_bis.png|link=]]<br /> <br /> <br /> If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized [http://en.wikipedia.org/wiki/Student's_t-distribution Student's $t$-distribution].<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The conditional likelihood ==<br /> <br /> <br /> The conditional likelihood for given observations $\by$ is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), &lt;/math&gt; }}<br /> <br /> where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br /> If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model1&quot;&gt;&lt;math&gt; \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br /> <br /> For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $\qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model2&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \psi ) &amp; = &amp;<br /> \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Transforming the data==<br /> <br /> <br /> The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br /> <br /> Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;def_t&quot; &gt;&lt;math&gt; \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br /> As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br /> <br /> <br /> <br /> {{Example<br /> |title=Examples: <br /> | text=<br /> 1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br /> <br /> &lt;!-- Therefore, $y=f e^{g\teps}$. --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \log(y_{ij})&amp;=&amp;\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br /> y_{ij}&amp;=&amp;f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File: continuous_graf5a.png|link=]]<br /> <br /> <br /> 2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br /> &lt;!-- %\begin{eqnarray*}<br /> %\transy(y)&amp;=&amp;\log(y/(1-y)) \\<br /> % y&amp;=&amp;\frac{f}{f+(1-f) e^{-g\teps}} .<br /> %\end{eqnarray*} --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \logit(y_{ij})&amp;=&amp;\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br /> y_{ij}&amp;=&amp; \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File:continuous_graf5b.png|link=]]<br /> <br /> <br /> 3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \transy(y_{ij})&amp;=&amp;\log((y_{ij}-A)/(B-y_{ij})), \\<br /> y_{ij}&amp;=&amp;A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> &lt;!-- [[File:continuous_graf5c.png]] --&gt;<br /> }}<br /> <br /> <br /> Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;likeN_model3&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br /> &lt;/math&gt; }} <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Censored data ==<br /> <br /> <br /> Censoring occurs when the value of a measurement or observation is only partially known.<br /> For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br /> <br /> For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the &quot;measurement&quot; is not a number but the information that the quantity is less than the LLOD.<br /> <br /> Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br /> <br /> As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br /> <br /> We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br /> <br /> <br /> * '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; <br /> \repy_{ij} = \left\{ \begin{array}{c}<br /> y_{ij} &amp; {\rm if } \ y_{ij} \geq L \\<br /> y_{ij} &lt; L &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }} <br /> <br /> &lt;blockquote&gt;In the figures below, the &quot;data&quot; below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6a.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||log-VL || cens<br /> |- <br /> | 1 || 1.0 || 0.26 || 0<br /> |-<br /> | 1 || 2.0 || 0.02 || 0<br /> |-<br /> | 1 || 3.0 || -0.13 || 0<br /> |-<br /> | 1 || 4.0 || -0.13 || 0<br /> |-<br /> | 1 || 5.0 || -0.30 || 1<br /> |-<br /> | 1 || 6.0 || -0.30 || 1<br /> |-<br /> | 1 || 7.0 || -0.25 || 0<br /> |-<br /> | 1 || 8.0 || -0.30 || 1<br /> |-<br /> | 1 || 9.0 || -0.29 || 0<br /> |-<br /> | 1 || 10.0 || -0.30 || 1<br /> |}<br /> |}<br /> <br /> <br /> * '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\notin I \\<br /> y_{ij} \in I &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as &quot;$y_{ij} \in [0,1)$&quot;. In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6b.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||CONC. || llimit || cens<br /> |-<br /> | 1 || 0.3 || 1.20 || . || 0<br /> |-<br /> | 1 || 0.5 || 1.93 || . || 0<br /> |-<br /> | 1 || 1.0 || 3.38 || . || 0<br /> |-<br /> | 1 || 2.0 || 3.88 || . || 0<br /> |-<br /> | 1 || 4.0 || 3.24 || . || 0<br /> |-<br /> | 1 || 6.0 || 1.82 || . || 0<br /> |-<br /> | 1 || 8.0 || 1.07 || . || 0<br /> |-<br /> | 1 || 12.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 16.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 20.0 || 1.00 || 0.00 || 1<br /> |}<br /> |}<br /> <br /> <br /> <br /> * '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\leq U \\<br /> y_{ij} &gt; U &amp; {\rm otherwise.}<br /> \end{array} \right. <br /> &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br /> &lt;/blockquote&gt;<br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6c.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot; |<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||VOLUME || CENS<br /> |-<br /> | 1 || 2.0 || 1.85 || 0<br /> |-<br /> | 1 || 7.0 || 2.40 || 0<br /> |-<br /> | 1 || 12.0 || 3.27 || 0<br /> |-<br /> | 1 || 17.0 || 3.28 || 0<br /> |-<br /> | 1 || 22.0 || 3.62 || 0<br /> |- <br /> | 1 || 27.0 || 3.02 || 0<br /> |-<br /> | 1 || 32.0 || 3.80 || -1<br /> |-<br /> | 1 || 37.0 || 3.80 || -1<br /> |-<br /> | 1 || 42.0 || 3.80 || -1<br /> |-<br /> | 1 || 47.0 || 3.80 || -1<br /> |}<br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> <br /> |text= &amp;#32;<br /> * Different censoring limits and intervals can be in play at different times and for different individuals.<br /> * Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br /> }}<br /> <br /> <br /> The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br /> <br /> {{EquationWithRef<br /> |equation = &lt;div id=&quot;likeN_model4&quot;&gt;&lt;math&gt; <br /> \begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du &lt;/math&gt; }}<br /> <br /> We see that if $y_{ij}$ is not censored (i.e., $\mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions to multidimensional continuous observations == <br /> <br /> <br /> &lt;ul&gt;<br /> * Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i.<br /> &lt;/math&gt;}}<br /> <br /> : It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br /> <br /> <br /> * Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i .<br /> &lt;/math&gt; }}<br /> <br /> : Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> &lt;br&gt;&lt;br&gt;<br /> -----------------------------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary <br /> |text= <br /> A model for continuous data is completely defined by:<br /> <br /> *The structural model $f$<br /> *The residual error model $g$<br /> *The probability distribution of the residual errors $(\teps_{ij})$<br /> *Possibly a transformation $\transy$ of the data<br /> <br /> <br /> The model is associated with a design which includes:<br /> <br /> <br /> - the observation times $(t_{ij})$<br /> <br /> - possibly some additional regression variables $(x_{ij})$<br /> <br /> - possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br /> <br /> - possibly a censoring process $(I_{ij})$<br /> <br /> }}<br /> <br /> <br /> == $\mlxtran$ for continuous data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=<br /> <br /> |text= <br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; (A,\alpha,B,\beta, a) \\<br /> f(t,\psi) &amp;=&amp; A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br /> y_{ij} &amp;=&amp; f(t_{ij} , \psi_i) + a\, \teps_{ij}<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {A, B, alpha, beta, a}<br /> <br /> EQUATION:<br /> f = A*exp(-alpha*t) + B*exp(-beta*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, std=a}&lt;/pre&gt;<br /> }}<br /> <br /> }}<br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2=<br /> <br /> |text=<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br /> t_0 &amp;=&amp;0 \\[0.2cm]<br /> {\rm if \quad t&lt;t_0} \\[0.2cm]<br /> \quad \nitc &amp;=&amp; \delta \, c/( \beta \, p) \\<br /> \quad \itc &amp;=&amp; (s - d\,\nitc) / \delta \\<br /> \quad \vl &amp;=&amp; p \, \itc / c. \\[0.2cm] <br /> {\rm else \quad \quad }\\[0.2cm] <br /> \quad \dA{\nitc}{} &amp; =&amp; s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br /> \quad \dA{\itc}{} &amp; = &amp;\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br /> \quad \dA{\vl}{} &amp; = &amp;p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br /> \quad \log(y_{ij}) &amp;= &amp;\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {delta, c, beta, p, s, d, nu, rho, a}<br /> <br /> EQUATION:<br /> t0=0<br /> N_0 = delta*c/(beta*p)<br /> I_0 = (s - d*N_0)/delta<br /> V_0 = p*I_0/c<br /> ddt_N = s - beta*(1-nu)*N*V - d*N<br /> ddt_I = beta*(1-nu)*N*V - delta*I<br /> ddt_V = p*(1-rho)*I - c*V<br /> <br /> DEFINITION:<br /> y = {distribution=logNormal, prediction=V, std=a}<br /> &lt;/pre&gt; }} <br /> }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D.M. },<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{demidenko2005mixed,<br /> title={Mixed Models: Theory and Applications},<br /> author={Demidenko, E.},<br /> isbn={9780471726135},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&amp;redir_esc=y}, <br /> year={2005}, publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fitzmaurice2008longitudinal,<br /> title={Longitudinal Data Analysis},<br /> author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br /> isbn={9781420011579},<br /> lccn={2008020681},<br /> series={Chapman &amp; Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br /> year={2008},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang, J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications},<br /> publisher = {Springer Series in Statistics},<br /> year = {2007},<br /> address = {New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{laird1982,<br /> author = {Laird, N.M. and Ware, J.H.},<br /> title = {Random-Effects Models for Longitudinal Data},<br /> journal = {Biometrics},<br /> volume = {38},<br /> pages = {963-974},<br /> year = {1982}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lindstrom1990Nonlinear,<br /> author = {Lindstrom, M.J. and Bates, D.M. },<br /> title = {Nonlinear mixed-effects models for repeated measures},<br /> journal = {Biometrics},<br /> volume = {46},<br /> pages = {673-687},<br /> year = {1990}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R.C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C.E. and Searle, S.R.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&amp;redir_esc=y}, year={2004}, publisher={Wiley &amp; Sons} <br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{verbeke2009linear,<br /> title={Linear Mixed Models for Longitudinal Data},<br /> author={Verbeke, G. and Molenberghs, G.},<br /> isbn={9781441902993},<br /> lccn={2010483807},<br /> series={Springer Series in Statistics},<br /> url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{west2006linear,<br /> title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br /> author={West, B. and Welch, K.B. and Galecki, A.T.},<br /> isbn={9781584884804},<br /> lccn={2006045440},year={2006},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the observations <br /> |linkNext=Models for count data }}</div> Admin http://wiki.webpopix.org/index.php/Description,_representation_and_implementation_of_a_model Description, representation and implementation of a model 2013-06-21T08:43:22Z <p>Admin : </p> <hr /> <div>==Introduction==<br /> <br /> A &quot;model&quot; can be implemented in the real world if it can be programmed using software. To do this, we need a language that can be understood by the software. Before even arriving at this point, it is important to be very clear and systematic about what a model is and how we want to use it.<br /> <br /> It is fundamental to distinguish between the description, representation and implementation of a model. Each of these three concepts uses a specific language.<br /> {| cellpadding=&quot;15&quot; cellspacing=&quot;15&quot;<br /> |style=&quot;width=500px&quot;| 1. First, we describe a model with words, i.e., a human language: || &lt;span style=&quot;font-family:comic sans ms;font-size:11pt&quot;&gt;&quot;The weight is a linear function of the height&quot;&lt;span&gt;<br /> |-<br /> |2. Then we represent the model using a mathematical or schematic language: || &lt;math&gt; W=a\,H + b &lt;/math&gt;<br /> |-<br /> |3. Lastly, we implement the model via a language understood by the software: || {{Verbatim|WEIGHT {{-}} a*HEIGHT + b<br /> |} <br /> <br /> <br /> The representation of a model is not unique. The choice of the representation should be driven by the tasks to be executed: if the model is only used to perform computations, a system of equations contains all the information required. If properties of the model need to be tested (linearity, [http://en.wikipedia.org/wiki/Homoscedasticity homoscedasticity], etc.), then they need to be represented via explicit definitions.<br /> <br /> In the context of mixed-effects models, models that we want to implement can be decomposed into two components: the structural model and the statistical model. Both components have to be described, represented and implemented with precision.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the structural model==<br /> <br /> Let us now look in more detail at these three things.<br /> <br /> &lt;br&gt;<br /> ===Description of the structural model===<br /> <br /> The first step consists in describing a model with high precision, using terminology and vocabulary well-adapted to the application. For instance, let us consider a [http://en.wikipedia.org/wiki/Pharmacokinetics PK] model that describes drug concentration as a function of time. We can describe the model with the sentence:<br /> <br /> &lt;blockquote&gt;<br /> &quot;''The PK model is a two-compartment model with first-order absorption (from the depot compartment - the gut, to the central compartment - the bloodstream), linear transfers between the central and the peripheral compartment, and linear elimination from the central compartment''&quot;.<br /> &lt;/blockquote&gt;<br /> &lt;br&gt;<br /> <br /> ===Representation of the structural model===<br /> <br /> <br /> 1. ''Using a diagram''<br /> <br /> This PK model can by represented by a diagram like the one shown the the following figure. Such diagrams offer both a descriptive and explicit representation (because the properties of the PK model are clearly shown).<br /> <br /> <br /> :::[[File:intro41.png|400px|link=]]<br /> <br /> <br /> 2. ''Using mathematical equations''<br /> <br /> Alternatively, a mathematical representation can be used to translate the description of the model into a system of equations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \deriv{A_d} &amp; =&amp; -k_a A_d(t) \\<br /> &lt;!--%\deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - \frac{V_m}{V\,K_m + A_c(t)} A_c(t) \\--&gt;<br /> \deriv{A_c} &amp; =&amp; k_a A_d(t) -k_{12}A_c(t) + k_{21}A_p(t) - k_e A_c(t) \\<br /> \deriv{A_p} &amp; =&amp; k_{12}A_c(t) - k_{21}A_p(t) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This representation allows us to calculate the amount of drug in each compartment at any point of time. On the other hand, this description of the model is implicit: even if a modeler is able to recognize the model described by the equations, i.e., to identify the processes of [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption], [http://en.wikipedia.org/wiki/Distribution_%28pharmacology%29 distribution] and elimination, these are not explicitly represented like in the diagram.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Implementation of the structural model===<br /> <br /> <br /> <br /> 1. ''Using macros''<br /> <br /> The $\mlxtran$ language allows us to implement the model represented in the previous diagram using a simple script and a system of macros:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 1<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> PK:<br /> compartment(amount=Ac)<br /> oral(ka)<br /> peripheral(k12,k21)<br /> elimination(ke)<br /> &lt;/pre&gt; }}<br /> <br /> As you can see, there is a one-to-one mapping between the diagram and the code: each element of the diagram (and therefore of the model) is implemented as a macro.<br /> <br /> <br /> 2. ''Using equations''<br /> <br /> Alternatively, implementation of the model using the mathematical representation requires entering the system of equations into $\mlxtran$. The syntax used should be as close as possible to the original mathematical language in order make development simple and the code easy to parse. Here is the $\mlxtran$ syntax in this case:<br /> <br /> <br /> {{MLXTran<br /> |name= - version 2<br /> |text=&lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt; <br /> EQUATION:<br /> ddt_Ad = -ka*Ad <br /> ddt_Ac = ka*Ad - k12*Ac + k21*Ap - ke*Ac<br /> ddt_Ap = k12*Ac - k21*Ap<br /> &lt;/pre&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> <br /> It is fundamental to have the possibility of using several representations, and therefore several implementations, depending on the task at hand. One reason is that each kind of implementation has its pros and cons.<br /> <br /> The use of equations has the big advantage of being able to represent ''any'' complex model. This is not possible when using macros, which are fixed in number by default. For instance, the PK macros in $\mlxtran$ allow us to code linear and nonlinear ([http://en.wikipedia.org/wiki/Michaelis%E2%80%93Menten_kinetics Michaelis-Menten]) elimination, but no macro exists that can combine the two types of elimination. In contrast, such processes can be easily input using equations:<br /> <br /> ::{{Verbatim |ddt_Ac{{-}} ka*Ad - k12*Ac + k21*Ap - k*Ac - Vm*Ac/(Km*V + Ac) }}<br /> <br /> In a similar vein, models that are well-defined mathematically may be horribly complex to implement using equations, but easy using macros. This is true for instance for [http://en.wikipedia.org/wiki/Dynamical_system dynamical systems] with source terms such as PK models with repeated oral doses and zero-order absorption. In that example, the absorption rate is a piecewise-constant function.<br /> It is not easy to code this model using equations, and not worth it when we can quickly use the $\mlxtran$ macro {{Verbatim|oral(Tk0)}}, which completely characterizes the model for any dose design. The C++ code generated from an $\mlxtran$ script that uses this macro is the same as the one (that would be) generated by a script using a system of equations.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Description, representation and implementation of the statistical model==<br /> <br /> The statistical component of the model can be decomposed in two sub-models: a model that describes the variability of the parameters and a model that describes the variability of the observations. Each sub-model needs to be described, represented and implemented. Let us illustrate this approach with a very simple statistical model used for modeling the variability of a single individual parameter.<br /> <br /> &lt;br&gt;<br /> ===Description of the statistical model===<br /> In this example we want to describe the distribution of the volume in the population, using weight as a covariate. The first step consists of describing with extreme precision the statistical model that we want to use:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;''Individuals in the population are mutually independent''&lt;/li&gt;<br /> &lt;li&gt;''The volume is log-normally distributed''&lt;/li&gt;<br /> &lt;li&gt;''The log-volume predicted by the model is a linear function of the log-weight''&lt;/li&gt;<br /> &lt;li&gt;''The reference weight in the population is 70kg''&lt;/li&gt;<br /> &lt;li&gt;''The variance of the log-volume is constant.''&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Representation of the statistical model===<br /> <br /> <br /> Since this model involves probability distributions, we will use a probabilistic model to represent it. Let $V_i$ and $w_i$ be the volume and weight of individual $i$. Statement 1 implies that only the conditional distribution $p(V_i | w_i)$ for individual $i$ needs to be represented. A probability distribution can be mathematically represented by a series of definitions and equations. This mathematical representation is not unique. We can use for instance any of these three representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2a&quot;&gt;&lt;math&gt;\begin{eqnarray} <br /> V_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \, e^{\eta_i} \quad \text{where} \quad \eta_i \sim {\cal N}(0, \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; \Vpop \left(\displaystyle{ \frac{w_i}{70} }\right)^\beta \quad \text{and} \quad<br /> \log(V_i) \sim {\cal N}(\log(\hat{V}_i) , \omega^2)<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ex2c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tilde{w}_i&amp; =&amp; \log\left(\displaystyle{ \frac{w_i}{70} }\right) \quad \text{and} \quad \log(V_i) \sim {\cal N}(\log(\Vpop)+\beta \, \tilde{w}_i , \omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(3) }}<br /> Here, $\omega$ is the standard deviation of the log-volume, $\Vpop$ is a reference value of volume in the population for a reference individual of 70kg and $\Vpop (w_i/70)^\beta$ the predicted volume for an individual with weight $w_i$.<br /> <br /> These three representations combine equations and definitions. The equations allow us to define the variables via algebraic equations, while the definitions characterize the random variables via probability distributions.<br /> <br /> <br /> &lt;br&gt;<br /> ===Implementation of the statistical model===<br /> <br /> <br /> The implementation of such models with $\mlxtran$ allows the direct usage of the same definitions and equations with a language very close to the mathematical one. The model in [[#eq:ex2a|(1)]] can be implemented in the following way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;DEFINITION:<br /> eta = {distribution=normal, mean=0, standardDeviation=omega}<br /> <br /> EQUATION:<br /> V = Vpop*((w/70)^beta)*exp(eta)<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2b|(2)]] can be implemented this way:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> Vpred = Vpop*(W/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=Vpred, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The model in [[#eq:ex2c|(3)]] can be implemented like this:<br /> <br /> <br /> {{MLXTran<br /> |name=<br /> |text= &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;EQUATION:<br /> lw70 = log(W/70)<br /> <br /> DEFINITION[model=linear]:<br /> V = {distribution=logNormal, reference=Vpop, covariate=lw70,<br /> covariateCoefficient=beta, standardDeviation=omega}<br /> &lt;/pre&gt; }}<br /> <br /> Note that the linearity of the model is information that is explicitly entered.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Which representation for which task?===<br /> <br /> Representations [[#eq:ex2a|(1)]], [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] provide three different mathematical representations of the same probabilistic model. This means that when any of them are written in text or on a slide, anyone with some basic knowledge in statistics and mathematics will be able to derive the same information from any of the representations.<br /> <br /> However, if we want to use the model to perform tasks using specific software, the information passed to the software needs to be of a form that the software can understand with respect to each given task. It is not always true that any representation paired with any implementation can be used to perform any task. Let us illustrate this on our example for three basic tasks: simulation, likelihood computation and covariate model assessment.<br /> <br /> <br /> '''Simulation.''' If we assume that the software we use is able to simulate normal random variables with any given mean and standard deviation, then any representation of the model can be used for simulation:<br /> <br /> <br /> &lt;ul&gt;<br /> * Using [[#eq:ex2a|(1)]], $\eta_i$ is first simulated as a normal random variable with mean 0 and variance $\omega^2$. Then the volume $V_i$ is calculated as a function of $\eta_i$.<br /> <br /> * Using [[#eq:ex2b|(2)]] or [[#eq:ex2c|(3)]], $\log(V_i)$ can be directly simulated as a random normal variable with mean $\log(\Vpop)+\beta \log\left(w_i/70\right)$, or equivalently $\log(\Vpop(w_i/70)^\beta)$, and standard deviation $\omega^2$. Then $V_i = \exp\left(\log(V_i)\right)$.<br /> &lt;/ul&gt;<br /> <br /> <br /> In summary, what is required for simulation is the capacity to express the variable to be simulated as a function of some random variable that can be directly simulated by the software. In conclusion, any of the three $\mlxtran$ implementations proposed above can be used for simulation.<br /> <br /> <br /> '''Likelihood computation.''' By definition, the likelihood of a set of parameter values given some continuous observed outcomes is equal to the probability distribution function (pdf) of those observed outcomes given those parameters. In other words, to derive the likelihood of $\theta=(\Vpop,\beta,\omega^2)$ requires computation of the pdf of $V_i$ or a certain function of it. Here, it is straightforward to derive the likelihood from the pdf of $V_i$, which is log-normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood1=&gt;&lt;math&gt;\begin{eqnarray}<br /> L_1(\theta ; V_1,\ldots,V_N) &amp;=&amp; \py(V_1,V_2,\ldots,V_N ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \py( V_i ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega}V_i} } \exp\left\{-\displaystyle{ \frac{1}{2\omega} } \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is also straightforward to derive the likelihood from the pdf of $\log(V_i)$, which is normally distributed:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;likelihood2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> L_2(\theta ; \log(V_1),\ldots,\log(V_N)) &amp;=&amp; \py(\log(V_1),\ldots,\log(V_N) ;\theta) \\<br /> &amp; = &amp; \prod_{i=1}^N \displaystyle{ \frac{1}{\sqrt{2\pi \omega} } }\exp\left\{-\frac{1}{2\omega} \left( \log(V_i) - \log\left(\Vpop \left({w_i}/{70}\right)^\beta\right) \right)^2 \right\} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> These two likelihoods $L_1$ and $L_2$ are equal up to a constant $\prod_i V_i$. No matter what the definition of the likelihood is based on, it is nonetheless necessary to provide some information about the pdf of $V_i$ for computing the likelihood. In this very basic example, the minimal information about the model that needs to be passed to the software via code to be able to compute the likelihood is:<br /> <br /> <br /> &lt;ul&gt;<br /> * The log-volume is normally distributed<br /> * The mean of $\log(V_i)$ is $\log\left(\Vpop \left({w_i}/{70}\right)^\beta\right)$<br /> * The standard deviation of $\log(V_i)$ is $\omega$.<br /> &lt;/ul&gt;<br /> <br /> <br /> Then, the likelihood can be easily computed if the software is able to compute a normal pdf for a given mean and standard deviation.<br /> <br /> In our example, only the representations of the model given in [[#eq:ex2b|(2)]] and [[#eq:ex2c|(3)]] (and therefore versions 2 and 3 of the $\mlxtran$ implementation) can be used for computing the likelihood in closed form. Indeed, both representations explicitly describe the probability distribution of $\log(V_i)$ and provide all the required information. On the other hand, the representation given in [[#eq:ex2a|(1)]] does not provide any explicit information about the distribution of $V_i$. Deriving the pdf of $V_i$ from [[#eq:ex2a|(1)]] would therefore require an interpreter to &quot;understand&quot; the formula, and a tool that can perform symbolic computation.<br /> <br /> <br /> '''Covariate model assessment.''' Our model hypothesizes a linear relationship between the log-weight and the log-volume.<br /> To assess if this is valid or not, we might consider using some visual diagnostic check of the plot of the (predicted or simulated) log-volume against the log-weight, to see whether this linear relationship seems plausible or not. Specific statistical procedures can also be used for testing the linearity hypothesis.<br /> <br /> Thus, both displaying an appropriate goodness of fit plot and using an appropriate statistical test require knowledge of the explicit relationship between the covariate and the parameter, i.e., the software needs to &quot;know&quot; this relationship. Neither of the representations of the model based on equations [[#eq:ex2a|(1)]] and [[#eq:ex2b|(2)]] explicitly spell out this relationship to the software. Of course, we can rewrite [[#eq:ex2b|(2)]] as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \mu_i &amp;=&amp; \log(\Vpop)+\beta \log\left(\displaystyle{ \frac{W_i}{70} }\right) \\<br /> \log(V_i) &amp;\sim&amp; {\cal N}(\mu_i , \omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and clearly &quot;see&quot; that the predicted log-volume is a linear function of the log-weight. The issue is that, without a powerful interpreter, this information is not available to the software, so it cannot automatically run these tasks. Therefore, we must explicitly &quot;tell&quot; the software that the model is linear, as can be done with $\mlxtran$.<br /> <br /> <br /> {{Back<br /> |link=What is a model? A joint probability distribution! }}</div> Admin http://wiki.webpopix.org/index.php/What_is_a_model%3F_A_joint_probability_distribution! What is a model? A joint probability distribution! 2013-06-21T08:40:58Z <p>Admin : </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variables, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 2===<br /> <br /> Consider now a model defined by the joint distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br /> &lt;/math&gt; }}<br /> <br /> where the covariates $\bc$ are the weights of the individuals: $\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br /> <br /> We now aim to define a joint model for $\by$, $\bpsi$, $\bc$ and $\theta_R=(V_{\rm pop},k_{\rm pop})$.<br /> <br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot; |<br /> {{Equation2 <br /> |name= &lt;math&gt;\pypsithc(\by,\bpsi, \theta, \bc ; \bt)&lt;/math&gt;<br /> |equation= }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pth(\theta)&lt;/math&gt;<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) \\<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pc(\bc)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> w_i &amp;\sim&amp; {\cal N}\left(70,10^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pcpsic(\bpsi {{!}}\bc;\theta)&lt;/math&gt;<br /> |equation=&lt;math&gt;<br /> \begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style=&quot;width:50%&quot;|<br /> {{MLXTranForTable<br /> |name=jointModel2.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none&quot;&gt;<br /> [POPULATION PARAMETER]<br /> <br /> DEFINITION:<br /> V_pop = {distribution=normal, mean=30, sd=3}<br /> k_pop = {distribution=normal, mean=0.1, sd=0.01}<br /> <br /> <br /> [COVARIATE]<br /> <br /> DEFINITION:<br /> weight = {distribution=normal, mean=70, sd=10}<br /> <br /> <br /> <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br /> <br /> EQUATION:<br /> V_pred = V_pop*(weight/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pred,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> We can use the approach described above for various tasks, e.g., simulating $(\by,\bpsi, \bc, \theta_R)$ for a given input $(\theta_F, \bt)$, simulating the population parameters $(V_{\rm pop},k_{\rm pop})$ with the conditional distribution $p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt)$, estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br /> <br /> &lt;!--<br /> ==Bibliography==<br /> <br /> --&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Admin http://wiki.webpopix.org/index.php/What_is_a_model%3F_A_joint_probability_distribution! What is a model? A joint probability distribution! 2013-06-21T08:39:44Z <p>Admin : </p> <hr /> <div>==Introduction==<br /> <br /> A model built for real-world applications can involve various types of variables, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.<br /> <br /> If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the [http://en.wikipedia.org/wiki/Joint_probability_distribution joint distribution] of these random variables.<br /> <br /> Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional distributions].<br /> <br /> Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - A model is a joint probability distribution. <br /> <br /> - A submodel is a conditional distribution derived from this joint distribution. <br /> <br /> - A task is a specific use of this distribution. <br /> }}<br /> <br /> We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==An illustrative example==<br /> <br /> &lt;br&gt;<br /> ===A model for the observations of a single individual===<br /> Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $\vt=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $\qy$ the distribution (or [http://en.wikipedia.org/wiki/Probability_density_function pdf]) of $y$. If we assume a [http://en.wikipedia.org/wiki/Parametric_model parametric model], then there exists a vector of parameters $\psi$ that completely define $y$.<br /> <br /> We can then explicitly represent this dependency with respect to $\bpsi$ by writing $\qy( \, \cdot \, ; \psi)$ for the pdf of $y$.<br /> <br /> If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $\vt$, and write $\qy(\, \cdot \, ; \psi,\vt)$ instead.<br /> <br /> By convention, the variables which are before the symbol &quot;;&quot; are random variables. Those that are after the &quot;;&quot; are non-random parameters or variables.<br /> When there is no risk of confusion, the non-random terms can be left out of the notation.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> -In this context, the model is the distribution of the observations $\qy(\, \cdot \, ; \psi,\vt)$. &lt;br&gt;<br /> -The inputs of the model are the parameters $\psi$ and the design $\vt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= 500 mg of a drug is given by [http://en.wikipedia.org/wiki/Intravenous_therapy intravenous] [http://en.wikipedia.org/wiki/Bolus_%28medicine%29 bolus] to a patient at time 0. We assume that the evolution of the [http://en.wikipedia.org/wiki/Blood_plasma plasmatic] concentration of the drug over time is described by the [http://en.wikipedia.org/wiki/Pharmacokinetics pharmacokinetic] (PK) model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{-k \, t} , &lt;/math&gt; }}<br /> <br /> where $V$ is the [http://en.wikipedia.org/wiki/Volume_of_distribution volume of distribution] and $k$ the [http://en.wikipedia.org/wiki/Elimination_rate_constant elimination rate constant]. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . &lt;/math&gt; }}<br /> <br /> Assuming that the residual errors $(e_j)$ are [http://en.wikipedia.org/wiki/Dependent_and_independent_variables independent] and [http://en.wikipedia.org/wiki/Normal_distribution normally distributed] with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba1&quot; &gt;&lt;math&gt;<br /> y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.<br /> As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \py(y ; \psi,\vt) = \prod_{j=1}^n \pyj(y_j ; \psi,t_j) ,<br /> &lt;/math&gt; }}<br /> <br /> where $\qyj$ is the normal distribution defined in [[#ex_proba1|(1)]]. <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> === A model for several individuals ===<br /> <br /> Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the [http://en.wikipedia.org/wiki/Statistical_population population], then we can treat the $\psi_i$ as if they were random vectors. As both $\by=(y_i , 1\leq i \leq N)$ and $\bpsi=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $\qypsi$. Using basic probability, this can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsi(\by,\bpsi) = \pcypsi(\by {{!}} \bpsi) \, \ppsi(\bpsi) .&lt;/math&gt; }}<br /> <br /> If $\qpsi$ is a parametric distribution that depends on a vector $\theta$ of ''population parameters'' and a set of ''individual covariates'' $\bc=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $\qpsi(\, \cdot \,;\theta,\bc)$ for the pdf of $\bpsi$.<br /> Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> - In this context, the model is the joint distribution of the observations and the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pypsi(\by , \bpsi; \theta, \bc,\bt)=\pcypsi(\by {{!}} \bpsi;\bt) \, \ppsi(\bpsi;\theta,\bc) . &lt;/math&gt;}}<br /> <br /> - The inputs of the model are the population parameters $\theta$, the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times <br /> :$\bt=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example<br /> |text= Let us suppose $N$ patients received the same treatment as the single patient did. We now have the same PK model [[#ex_proba1|(1)]] for each patient, except that each has its own individual PK parameters $V_i$ and $k_i$ and potentially its own residual error parameter $a_i$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2a&quot;&gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N} \left( \displaystyle{\frac{500}{V_i}e^{-k_i \, t_{ij} } } , a_i^2 \right).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Here, $\psi_i = (V_i,k_i,a_i)$. One possible model is then to assume the same residual error model for all patients, and log-normal distributions for $V$ and $k$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> a_i &amp;=&amp; a \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba2b&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \log(V_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(V_{\rm pop})+\beta\,\log(w_i/70),\ \omega_V^2\right) \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log(k_i) &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\log(k_{\rm pop}),\ \omega_k^2\right), \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where the only covariate we choose to consider, $w_i$, is the weight (in kg) of patient $i$. The model therefore consists of the conditional distribution of the concentrations defined in [[#ex_proba2a|(2)]] and the distribution of the individual PK parameters defined in [[#ex_proba2b|(3)]]. The inputs of the model are the population parameters $\theta = (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$, the covariates (here, the weight) $(w_i, 1\leq i \leq N)$, and the design $\bt$.<br /> }}<br /> <br /> &lt;br&gt;<br /> ===A model for the population parameters===<br /> <br /> In some cases, it may turn out that it is useful or important to consider that the population parameter $\theta$ is itself random, rather than fixed. There are various reasons for this, such as if we want to model uncertainty in its value, introduce a priori information in an estimation context, or model an inter-population variability if the model is not looking at only one given population.<br /> <br /> If so, let us denote $\qth$ the distribution of $\theta$. As the status of $\theta$ has therefore changed, the model now becomes the joint distribution of its random variables, i.e., of $\by$, $\bpsi$ and $\theta$, and can be decomposed as follows:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3a&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta;\bt,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= &lt;ol&gt;<br /> &lt;li&gt; The formula is identical for $\ppsi(\bpsi; \theta)$ and $\pcpsith(\bpsi{{!}}\theta)$. What has changed is the status of $\theta$. It is not random in $\ppsi(\bpsi; \theta)$, the distribution of $\bpsi$ for any given value of $\theta$, whereas it is random in $\pcpsith(\bpsi {{!}} \theta)$, the conditional distribution of $\bpsi$, i.e., the distribution of $\bpsi$ obtained after observing randomly generated $\theta$. &lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;If $\qth$ is a parametric distribution with parameter $\varphi$, this dependence can be made explicit by writing $\qth(\, \cdot \,;\varphi)$ for the distribution of $\theta$.&lt;/li&gt;&lt;br&gt;<br /> <br /> &lt;li&gt;Not necessarily all of the components of $\theta$ need be random. If it is possible to decompose $\theta$ into $(\theta_F,\theta_R)$, where $\theta_F$ is fixed and $\theta_R$ random, then the decomposition [[#proba3a{{!}}(4)]] becomes &lt;/li&gt;<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba3b&quot;&gt;&lt;math&gt;<br /> \pypsith(\by,\bpsi,\theta_R;\bt,\theta_F,\bc) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta_R;\theta_F,\bc) \, \pth(\theta_R).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }} <br /> &lt;/ol&gt;}}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the population parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsith(\by,\bpsi,\theta;\bc,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \, \pcpsith(\bpsi{{!}}\theta;\bc) \, \pth(\theta). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the individual covariates $\bc=(c_i , 1\leq i \leq N)$ and the measurement times $\bt=(t_{ij} , 1\leq i \leq N , 1\leq j \leq n_i)$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We can introduce [http://en.wikipedia.org/wiki/Prior_probability prior distributions] in order to model the inter-population variability of the population parameters $V_{\rm pop}$ and $k_{\rm pop}$: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right). \end{eqnarray}&lt;/math&gt; }}<br /> <br /> As before, the conditional distribution of the concentration is given by [[#ex_proba2a|(2)]]. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given $\theta_R=(V_{\rm pop},k_{\rm pop})$. The distribution of $\theta_R$ is defined in [[#ex_proba3|(6)]]. Here, the inputs of the model are the fixed population parameters $\theta_F = (\omega_V,\omega_k,\beta,a)$, the weights $(w_i)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the covariates===<br /> <br /> <br /> Another scenario is to suppose that in fact it is the covariates $\bc$ that are random, not the population parameters. This may either be in the context of wanting to simulate individuals, or when modeling and wanting to take into account uncertainty in the covariate values. If we note $\qc$ the distribution of the covariates, the joint distribution $\qpsic$ of the individual parameters and the covariates decomposes naturally as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;proba4&quot;&gt;&lt;math&gt;<br /> \ppsic(\bpsi,\bc;\theta) = \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) \, ,<br /> &lt;/math&gt;<br /> |reference=(7) }}<br /> <br /> where $\qcpsic$ is the conditional distribution of $\bpsi$ given $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text= <br /> &lt;li&gt;In this context, the model is the joint distribution of the observations, the individual parameters and the covariates:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \pypsic(\by,\bpsi,\bc;\theta,\bt) = \pcypsi(\by {{!}} \bpsi;\bt) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pc(\bc) . &lt;/math&gt; }}<br /> <br /> &lt;li&gt;The inputs of the model are the population parameters $\theta$ and the measurement times $\bt$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We could assume a normal distribution as a prior for the weights: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba4&quot; &gt;&lt;math&gt; <br /> w_i \sim_{i.i.d.} {\cal N}\left(70,10^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> Once more, [[#ex_proba2a|(2)]] defines the conditional distribution of the concentrations. Now, [[#ex_proba2b|(3)]] is the ''conditional distribution'' of the individual PK parameters, given the weight $\bw$, which is now a random variable whose distribution is defined in [[#ex_proba4|(8)]]. Now, the inputs of the model are the population parameters $\theta= (V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,\beta,a)$ and the design $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the measurement times===<br /> <br /> Another scenario is to suppose that there is uncertainty in the measurement times $\bt=(t_{ij})$ and not the population parameters or covariates. If we note $\nominal{\bt}=(\nominal{t}_{ij}, 1\leq i \leq N, 1\leq j \leq n_i)$ the nominal measurement times (i.e., those presented in a data set), then the &quot;true&quot; measurement times $\bt$ at which the measurement were made can be considered random fluctuations around $\nominal{\bt}$ following some distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> Randomness with respect to time can also appear in the presence of dropout, i.e., individuals that prematurely leave a clinical trial. For such an individual $i$ who leaves at the random time $T_i$, measurement times are the nominal times before $T_i$: $t_{i} = (\nominal{t}_{ij} \ \ {\rm s.t. }\ \ \nominal{t}_{ij}\leq T_i)$.<br /> In such situations, measurement times are therefore random and can be thought to come from a distribution $\qt(\, \cdot \, ; \nominal{\bt})$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= If there are also other regression variables $\bx=(x_{ij})$, it is of course possible to use the same approach and consider $\bx$ as a random variable fluctuating around $\nominal{\bx}$. }}<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsit(\by , \bpsi,\bt; \theta,\bc,\nominal{\bt})=\pcypsit(\by {{!}}\bpsi,\bt) \, \ppsi(\bpsi;\theta,\bc) \, \pt(\bt ; \nominal{\bt}) .<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$ and the nominal design $\nominal{\bt}$.<br /> }}<br /> <br /> {{Example<br /> |title=Example:<br /> |text= Let us assume as prior a normal distribution around the nominal times: <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba5&quot; &gt;&lt;math&gt; <br /> t_{ij} \sim_{i.i.d.} {\cal N}\left(\nominal{t}_{ij},0.03^2\right). &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Here, [[#ex_proba5|(9)]] defines the distribution of the now random variable $\bt$. The other components of the model defined in [[#ex_proba2a|(2)]] and [[#ex_proba2b|(3)]] remain unchanged. <br /> The inputs of the model are the population parameters $\theta$, the weights $(w_i)$ and the nominal measurement times $\nominal{\bt}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A model for the dose regimen===<br /> <br /> If the structural model is a dynamical system (e.g., defined by a system of [http://en.wikipedia.org/wiki/Ordinary_differential_equation ordinary differential equations]), the ''source terms'' $\bu = (u_i, 1\leq i \leq N)$, i.e., the inputs of the dynamical system, are usually considered fixed and known. This is the case for example for doses administered to patients for a given treatment. Here, the source term $u_i$ is made up of the dose(s) given to patient $i$, the time(s) of administration, and their type (intravenous bolus, infusion, oral, etc.).<br /> <br /> Here again, there may be differences between the nominal dose regimen stated in the protocol and given in the data set, and the dose regimen that was in reality administered. For example, it might be that the times of administration and/or the dosage were not exactly respected or recorded. Also, there may have been non-compliance, i.e., certain doses that were not taken by the patient.<br /> <br /> If we denote $\nominal{\bu}=(\nominal{u}_{i}, 1\leq i \leq N)$ the nominal dose regimens (reported in the dataset), then in this context the &quot;real&quot; dose regimens $\bu$ can be considered to randomly fluctuate around $\nominal{\bu}$ with some distribution $\qu(\, \cdot \, ; \nominal{\bu})$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters and the dose regimens:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\pypsiu(\by , \bpsi,\bu; \theta,\bc,\bt,\nominal{\bu})=\pcypsiu(\by {{!}} \bpsi,\bu;\bt) \, \pu(\bu ; \nominal{\bu}) \, \ppsi(\bpsi;\theta,\bc) .<br /> &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the population parameters $\theta$, the individual covariates $\bc$, the nominal design $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Suppose that instead of the one dose given in the example up to now, there are now repeated doses $(d_{ik}, k \geq 1)$ administered to patient $i$ at times $(\tau_{ik} , k \geq 1)$. Then, it is easy to see that<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6b&quot; &gt;&lt;math&gt; <br /> y_{ij} \sim {\cal N}\left(f(t_{ij};V_i,k_i) , a_i^2 \right), &lt;/math&gt;&lt;/div&gt;<br /> |reference=(10) }}<br /> <br /> where<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6a&quot; &gt;&lt;math&gt; <br /> f(t;V_i,k_i) = \sum_{k, \tau_{ik}&lt;t}\displaystyle {\frac{d_{ik} }{V_i} }\, e^{-k_i \, (t- \tau_{ik})} .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(11) }}<br /> <br /> The &quot;real&quot; dose regimen administered to patient $i$ can be written $u_i=(d_{ik},\tau_{ik}, k\geq 1)$, and the prescribed dose regimen $\nominal{u}_i=(\nominal{d}_{ik},\nominal{\tau}_{ik}, k\geq 1)$.<br /> <br /> We can model the random fluctuations of the administration times $\tau_{ik}$ around the nominal times $(\nominal{\tau}_{ij})$:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6c&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \tau_{ik} &amp;\sim_{i.i.d.}&amp; {\cal N}\left(\nominal{\tau}_{ik},0.02^2\right)\ , <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(12) }}<br /> <br /> and non-compliance (here meaning that a dose is not taken):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ex_proba6d&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \pi &amp;=&amp; \prob{d_{ik} = 0} \nonumber \\ &amp;=&amp; 1 - \prob{d_{ik}= \nominal{d}_{ik} }. <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(13) }}<br /> <br /> Here, [[#ex_proba6b{{!}}(10)]] and [[#ex_proba6a{{!}}(11)]] define the conditional distributions of the concentrations $(y_{ij})$, [[#ex_proba2b{{!}}(3)]] defines the distribution of $\bpsi$ and [[#ex_proba6c{{!}}(12)]] and [[#ex_proba6d{{!}}(13)]] define the distribution of $\bu$. The inputs are the population parameters $\theta$, the weights $(w_i)$, the measurement times $\bt$ and the nominal dose regimens $\nominal{\bu}$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===A complete model===<br /> <br /> We have now seen the variety of ways in which the variables in a model either play the role of random variables whose distribution is defined by the model, or nonrandom variables or parameters. Any combination is possible, depending on the context. For instance, the population parameters $\theta$ and covariates $\bc$ could be random with parametric probability distributions $\qth(\, \cdot \,;\varphi)$ and $\qc(\, \cdot \,;\gamma)$, and the dose regimen $\bu$ and measurement times $\bt$ reported with uncertainty and therefore modeled as random variables with distributions $\qu$ and $\qt$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> &lt;li&gt; In this context, the model is the joint distribution of the observations, the individual parameters, the population parameters, the dose regimens, the covariates and the measurement times:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithcut(\by , \bpsi, \theta, \bu, \bc,\bt; \nominal{\bu},\nominal{\bt},\varphi,\gamma)=\pcypsiut(\by {{!}}\bpsi,\bu,\bt) \, \pcpsithc(\bpsi{{!}}\theta,\bc) \, \pth(\theta;\varphi) \, \pc(\bc;\gamma) \, \pu(\bu ; \nominal{\bu}) \, \pt(\bt ; \nominal{\bt}). &lt;/math&gt; }}<br /> <br /> &lt;li&gt; The inputs of the model are the nominal dose regimens $\nominal{\bu}$, the nominal measurement times $\nominal{\bt}$ and the &quot;hyper-parameters&quot; $\varphi$ and $\gamma$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Using the model for executing tasks==<br /> <br /> <br /> In the modeling and simulation context, the tasks to execute make specific use of the various probability distributions associated with a model.<br /> <br /> &lt;br&gt;<br /> ===Simulation===<br /> <br /> <br /> By definition, simulation makes direct use of the probability distribution that defines the model. Simulation of the global model is straightforward as soon as the joint distribution can be decomposed into a product of easily simulated conditional distributions.<br /> <br /> Consider for example that the variables involved in the model are those introduced in [[#An illustrative example|the previous section]]:<br /> <br /> <br /> # The population parameters $\theta$ can either be given, or simulated from the distribution $\qth$.<br /> # The individual covariates $\bc$ can either be given, or simulated from the distribution $\qc$.<br /> # The individual parameters $\bpsi$ can be simulated from the distribution $\qcpsithc$ using the values of $\theta$ and $\bc$ obtained in steps 1 and 2.<br /> # The dose regimen $\bu$ can either be given, or simulated from the distribution $\qu$.<br /> # The measurement times $\bt$ (resp. regression variables $\bx$) can either be given, or simulated from the distribution $\qt$ (resp. $\qx$).<br /> # Lastly, observations $\by$ can be simulated from the distribution $\qcypsiut$ using the values of $\bpsi$, $\bu$ and $\bt$ obtained at steps 3, 4 and 5.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Simulation of a set of variables $w$ using another given set of variables $z$ requires:<br /> <br /> <br /> &lt;ul&gt;<br /> * a model, i.e., a distribution $\qw$ if $z$ is treated as a nonrandom variable, or a conditional distribution $\qcwz$ if $z$ is treated as a random variable.<br /> * the input $z$, i.e., a value of $z$ which allows the distribution $\qw(\, \cdot \, ; z)$ or the conditional distribution $\qcwz(\, \cdot \, {{!}} z)$ to be defined.<br /> * an algorithm which allows us to generate $w$ from $\qw$ or $\qcwz$.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=<br /> - Imagine instead that the population parameter $\theta$ and the design $(\bu,\bt)$ are given, and we want to simulate the individual covariates $\bc$, the individual parameters $\bpsi$ and the observations $\by$. Here, the variables to simulate are $w=(\bc,\bpsi,\by)$ and the variables which are given are $z=(\theta,\bu,\bt)$. If the components of $z$ are taken to be nonrandom variables, then:<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the joint distribution $\qypsic( \, \cdot \, ;\theta,\bu,\bt)$ of $(\by,\bpsi,\bc)$.<br /> * The inputs required for the simulation are the values of $(\theta,\bu,\bt)$.<br /> * The algorithm should be able to generate $(\by,\bpsi,\bc)$ from the joint distribution $\qypsic(\, \cdot \, ;\theta,\bu,\bt)$. Decomposing the model into three submodels leads to decomposing the joint distribution as<br /> &lt;/ul&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsic(\by,\bpsi,\bc ;\theta,\bu,\bt) = \pc(\bc) \, \pcpsic(\bpsi {{!}} \bc;\theta) \, \pcypsi(\by {{!}} \bpsi;\bu,\bt) . &lt;/math&gt; }}<br /> <br /> The algorithm therefore reduces to successively drawing $\bc$, $\bpsi$ and $\by$ from $\qc$, $\qcpsic(\, \cdot \, {{!}} \bc;\theta)$ and $\qcypsi(\, \cdot \, {{!}} \bpsi;\bu,\bt)$. <br /> <br /> <br /> - Imagine instead that the individual covariates $\bc$, the observations $\by$, the design $(\bu,\bt)$ and the population parameter $\theta$ are given (in a modeling context for instance, $\theta$ may have been estimated), and we want to simulate the individual parameters $\bpsi$. The only variable to simulate is $w=\bpsi$ and the variables which are given are $z=(\by,\bc,\theta,\bu,\bt)$. Here, we will treat $\by$ as if it is a random variable. The other components of $z$ can be treated as non random variables. Here,<br /> <br /> <br /> &lt;ul&gt;<br /> * The model is the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$ of $\psi$.<br /> * The inputs required for the simulation are the values of $(\by,\bc,\theta,\bu,\bt)$.<br /> * The algorithm should be able to sample $\bpsi$ from the conditional distribution $\qcpsiy(\, \cdot \, {{!}} \by ;\bc,\theta,\bu,\bt)$. [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo Markov Chain Monte Carlo] (MCMC) algorithms can be used for sampling from such complex conditional distributions.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the population parameters===<br /> <br /> <br /> In a modeling context, we usually assume that we have data that includes the observations $\by$ and the measurement times $\bt$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, let us consider the most general case where all are present.<br /> <br /> Any statistical method for estimating the population parameters $\theta$ will be based on some specific probability distribution. Let us illustrate this with two common statistical methods: maximum likelihood and Bayesian estimation.<br /> <br /> <br /> ''Maximum likelihood estimation'' consists in maximizing with respect to $\theta$ the ''observed likelihood'', defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\theta ; \by,\bc,\bu,\bt) &amp;\eqdef&amp; \py(\by ; \bc,\bu,\bt,\theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The variance of the estimator $\thmle$ and therefore confidence intervals can be derived from the observed Fisher information matrix, which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by,\bc,\bu,\bt) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} } \log({\like}(\thmle ; \by,\bc,\bu,\bt)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(14) }}<br /> <br /> <br /> {{OutlineText<br /> |text=Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\theta) \, d \bpsi$ with respect to $\theta$ and to compute $\displaystyle{ \frac{\partial^2}{\partial \theta^2} }\left\{\log\left(\int \pypsi(\by,\bpsi ; \bc,\bu,\bt,\thmle) \, d \bpsi \right)\right\}$.<br /> }}<br /> <br /> <br /> ''Bayesian estimation'' consists in estimating and/or maximizing the conditional distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ;\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pyth(\by , \theta ; \bc,\bu,\bt)}{\py(\by ; \bc,\bu,\bt)} } \\<br /> &amp;=&amp; \frac{\displaystyle{ \int \pypsith(\by,\bpsi,\theta ; \bc,\bu,\bt) \, d \bpsi} }{\py(\by ; \bc,\bu,\bt)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> {{outlineText<br /> |text= Bayesian estimation of the population parameter $\theta$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsith(\, \cdot \, ; \bc, \bu, \bt)$ for $(\by,\bpsi,\theta)$.<br /> * inputs $\by$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcthy(\theta {{!}} \by ;\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode, i.e., finding its maximum.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Estimation of the individual parameters===<br /> <br /> <br /> When $\theta$ is given (or estimated), various estimators of the individual parameters $\bpsi$ are available. They are all based on a probability distribution:<br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\bpsi$ the ''conditional likelihood''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\like}(\bpsi ; \by,\bu,\bt) &amp;\eqdef&amp; \pcypsi(\by {{!}} \bpsi ;\bu,\bt) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''maximum a posteriori'' (MAP) estimator is obtained by maximizing with respect to $\bpsi$ the ''conditional distribution''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt) &amp;=&amp; \displaystyle{ \frac{\pypsi(\by , \bpsi;\theta,\bc,\bu,\bt)}{\py(\by ; \theta,\bc,\bu,\bt)} } .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The ''conditional mean'' of $\bpsi$ is defined as the mean of the conditional distribution $\qcpsiy(\, \cdot \, | \by ; \theta,\bc,\bu,\bt)$ of $\psi$.<br /> <br /> <br /> {{OutlineText<br /> |text=<br /> Estimation of the individual parameters $\bpsi$ requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * algorithms able to estimate and maximize $\pcpsiy(\bpsi {{!}} \by ; \theta,\bc,\bu,\bt)$. MCMC methods can be used for estimating this conditional distribution. For nonlinear models, optimization tools are required for computing its mode (i.e., its MAP).<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Model selection===<br /> <br /> <br /> Likelihood ratio tests and statistical information criteria ([http://en.wikipedia.org/wiki/Bayesian_information_criterion BIC], [http://en.wikipedia.org/wiki/Akaike_information_criterion AIC]) compare the ''observed likelihoods'' computed under different models, i.e., the probability distribution functions $\py^{(1)}(\by ; \bc,\bu,\bt,\thmle_1)$, $\py^{(2)}(\by ; \bc,\bu,\bt,\thmle_2)$, ..., $\py^{(K)}(\by ; \bc,\bu,\bt,\thmle_K)$ computed under models ${\cal M}_1, {\cal M}_2$, ..., ${\cal M}_K$, where $\thmle_k$ maximizes the observed likelihood of model ${\cal M}_k$, i.e., maximizes $\py^{(k)}(\by ; \bc,\bu,\bt,\theta)$ .<br /> <br /> <br /> {{outlineText<br /> |text=<br /> Computing the observed likelihood and information criteria require:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * inputs $\by$, $\theta$, $\bc$, $\bu$ and $\bt$.<br /> * an algorithm able to compute $\int \pypsi( \by ,\bpsi ;\theta,\bc,\bu,\bt) \, d\bpsi$. For nonlinear models, linearization methods or Monte Carlo methods can be used.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Optimal design===<br /> <br /> In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated with minimum variance by optimizing some statistical criteria. Common optimality criteria are [http://en.wikipedia.org/wiki/Functional_%28mathematics%29 functionals] of the [http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors eigenvalues] of the expected Fisher information matrix<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;efim_intro3&quot;&gt;&lt;math&gt;<br /> \efim(\theta ; \bu,\bt) \ \ \eqdef \ \ \esps{y}{\ofim(\theta ; \by,\bu,\bt)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(15) }}<br /> <br /> where $\ofim$ is the observed Fisher information matrix defined in [[#ofim_intro3|(15)]]. For the sake of simplicity, we consider models without covariates $\bc$.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for minimum variance estimation requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsi(\, \cdot \, ; \theta, \bc, \bu, \bt)$ for $(\by,\bpsi)$.<br /> * a vector of population parameters $\theta$.<br /> * a criteria ${\cal D}(\bu,\bt)$ derived from the expected Fisher information matrix $\efim(\theta ; \bu,\bt)$.<br /> * an algorithm able to estimate $\efim(\theta ; \bu,\bt)$ for any design $(\bu,\bt)$ and to maximize ${\cal D}(\bu,\bt)$ with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> In a [http://en.wikipedia.org/wiki/Clinical_trial clinical trial] context, studies are designed to optimize the probability of reaching some predefined target ${\cal A}$, i.e., $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$. This may include optimizing safety and efficacy, and things like the probability of reaching [http://en.wikipedia.org/wiki/Sustained_viral_response sustained virologic response], etc.<br /> <br /> <br /> {{OutlineText<br /> |text=Optimal design for clinical trials requires:<br /> <br /> <br /> * a model, i.e., a joint distribution $\qypsic(\, \cdot \, ; \theta, \bu, \bt)$ for $(\by,\bpsi,\bc)$.<br /> * a vector of population parameters $\theta$.<br /> * a target ${\cal A}$.<br /> * an algorithm able to estimate $\prob{(\by, \bpsi,\bc) \in {\cal A} ; \bu,\bt,\theta}$ and to maximize it with respect to $\bu$ and $\bt$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Implementing models and running tasks==<br /> <br /> <br /> ===Example 1 ===<br /> <br /> Consider first the model defined by the joint distribution <br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\pypsi(\by,\bpsi ; \theta ,\bt) = \pcypsi(\by {{!}}\bpsi;\bt) \pcpsic(\bpsi ; \theta),&lt;/math&gt;}}<br /> <br /> where as in our running example, <br /> <br /> <br /> &lt;ul&gt;<br /> * $\by = (y_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are concentrations <br /> <br /> * $\bpsi= (\psi_i, 1\leq i \leq N)$ are individual parameters; here $\psi_i=(V_i,k_i,a_i)$<br /> <br /> * $\theta=(V_{\rm pop},k_{\rm pop},\omega_V,\omega_k,a)$ are population parameters <br /> <br /> * $\bt = (t_{ij}, 1\leq i \leq N , 1 \leq j \leq n_i)$ are the measurement times. <br /> &lt;/ul&gt;<br /> <br /> <br /> We aim to define a joint model for $\by$ and $\bpsi$. To do this, we will characterize each component of the model and show how this can be implemented with $\mlxtran$.<br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot;|<br /> {{Equation2 <br /> |name=&lt;math&gt; \pypsi(\by,\bpsi ; \theta, \bt) &lt;/math&gt; <br /> |equation= }}<br /> {{Equation2<br /> |name= &lt;math&gt; \pcpsic(\bpsi {{!}}\theta)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{array}{c} <br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(V_{\rm pop}), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{array}&lt;/math&gt; }}<br /> {{Equation2<br /> |name= &lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style = &quot;width:50%&quot; |<br /> {{MLXTranForTable<br /> |name=Example 1<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k}<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pop,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> We can then use this model with different tools for executing different tasks: it can be used for example with $\mlxplore$ for model exploration, with $\monolix$ for modeling, with R or Matlab for simulation, etc.<br /> <br /> It is important to remember that $\mlxtran$ is not a &quot;function&quot; that calculates an output. It is not an imperative but rather a declarative language, one that allows us to describe a model. It is then the tasks we choose to do which use $\mlxtran$ like a function, &quot;requesting&quot; it to give predictions, simulate random variables, compute a pdf, maximizes a likelihood, etc.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Example 2===<br /> <br /> Consider now a model defined by the joint distribution<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pypsithc(\by,\bpsi, \theta, \bc ; \bt) = \pcypsi(\by{{!}}\bpsi;\bt) \pcpsic(\bpsi{{!}}\bc ; \theta) \, \pth(\theta) \pc(\bc) ,<br /> &lt;/math&gt; }}<br /> <br /> where the covariates $\bc$ are the weights of the individuals: $\bc = (w_i, 1\leq i \leq N)$. The other variables and parameters are those already defined in the previous example.<br /> <br /> We now aim to define a joint model for $\by$, $\bpsi$, $\bc$ and $\theta_R=(V_{\rm pop},k_{\rm pop})$.<br /> <br /> <br /> {| cellspacing=&quot;10&quot; cellpadding=&quot;10&quot;<br /> |style=&quot;width:50%&quot; |<br /> {{Equation2 <br /> |name= &lt;math&gt;\pypsithc(\by,\bpsi, \theta, \bc ; \bt)&lt;/math&gt;<br /> |equation= }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pth(\theta)&lt;/math&gt;<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> V_{\rm pop} &amp;\sim&amp; {\cal N}\left(30,3^2\right) \\<br /> k_{\rm pop} &amp;\sim&amp; {\cal N}\left(0.1,0.01^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pc(\bc)&lt;/math&gt;<br /> |equation=<br /> &lt;math&gt;\begin{eqnarray}<br /> w_i &amp;\sim&amp; {\cal N}\left(70,10^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pcpsic(\bpsi {{!}}\bc;\theta)&lt;/math&gt;<br /> |equation=&lt;math&gt;<br /> \begin{eqnarray}<br /> \hat{V}_i &amp;=&amp; V_{\rm pop}\left(\frac{w_i}{70}\right)^\beta \\[0.4cm]<br /> \log(V_i) &amp;\sim&amp; {\cal N}\left(\log(\hat{V}_i), \, \omega_V^2\right) \\<br /> \log(k_i) &amp;\sim&amp; {\cal N}\left(\log(k_{\rm pop}),\, \omega_k^2\right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{Equation2<br /> |name=&lt;math&gt;\pcypsi(y{{!}}\bpsi; \bt) &lt;/math&gt;<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> f(t;V_i,k_i) &amp;=&amp; \frac{500}{V_i}e^{-k_i \, t} \\[0.2cm]<br /> y_{ij} &amp;\sim&amp; {\cal N} \left(f(t_{ij};V_i,k_i) , a^2 \right)<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> |style=&quot;width:50%&quot;|<br /> {{MLXTranForTable<br /> |name=jointModel2.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none&quot;&gt;<br /> [POPULATION PARAMETER]<br /> <br /> DEFINITION:<br /> V_pop = {distribution=normal, mean=30, sd=3}<br /> k_pop = {distribution=normal, mean=0.1, sd=0.01}<br /> <br /> <br /> [COVARIATE]<br /> <br /> DEFINITION:<br /> weight = {distribution=normal, mean=70, sd=10}<br /> <br /> <br /> <br /> [INDIVIDUAL PARAMETER]<br /> input={V_pop,k_pop,omega_V,omega_k,beta,weight}<br /> <br /> EQUATION:<br /> V_pred = V_pop*(weight/70)^beta<br /> <br /> DEFINITION:<br /> V = {distribution=logNormal, prediction=V_pred,sd=omega_V}<br /> k = {distribution=logNormal, prediction=k_pop,sd=omega_k}<br /> <br /> <br /> [OBSERVATION]<br /> input={V,k,a}<br /> <br /> EQUATION:<br /> f = 500/V*exp(-k*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, sd=a}<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> We can use the approach described above for various tasks, e.g., simulating $(\by,\bpsi, \bc, \theta_R)$ for a given input $(\theta_F, \bt)$, simulating the population parameters $(V_{\rm pop},k_{\rm pop})$ with the conditional distribution $p_{\theta_R|\by, \bc}( \, \cdot \, | \by, \bc ; \theta_F,\bt)$, estimating the log-likelihood, maximizing the observed likelihood and computing the MAP.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> ==Bibliography==<br /> TO DO<br /> --&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=The individual approach<br /> |linkNext=Description, representation and implementation of a model }}</div> Admin http://wiki.webpopix.org/index.php/The_individual_approach The individual approach 2013-06-21T08:35:45Z <p>Admin : </p> <hr /> <div><br /> == Overview ==<br /> <br /> Before we start looking at modeling a whole population at the same time, we are going to consider only one individual from that population. Much of the basic methodology for modeling one individual follows through to population modeling. We will see that when stepping up from one individual to a population, the difference is that some parameters shared by individuals are considered to be drawn from a [http://en.wikipedia.org/wiki/Probability_distribution probability distribution].<br /> <br /> Let us begin with a simple example.<br /> An individual receives 100mg of a drug at time $t=0$. At that time and then every hour for fifteen hours, the<br /> concentration of a marker in the bloodstream is measured and plotted against time:<br /> <br /> ::[[File:New_Individual1.png|link=]]<br /> <br /> We aim to find a mathematical model to describe what we see in the figure. The eventual goal is then to extend this approach to the ''simultaneous modeling'' of a whole population.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Model and methods for the individual approach ==<br /> <br /> &lt;br&gt;<br /> ===Defining a model===<br /> <br /> In our example, the concentration is a ''continuous'' variable, so we will try to use continuous functions to model it.<br /> Different types of data (e.g., [http://en.wikipedia.org/wiki/Count_data count data], [http://en.wikipedia.org/wiki/Categorical_data categorical data], [http://en.wikipedia.org/wiki/Survival_analysis time-to-event data], etc.) require different types of models. All of these data types will be considered in due time, but for now let us concentrate on a continuous data model.<br /> <br /> A model for continuous data can be represented mathematically as follows:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{j} = f(t_j ; \psi) + e_j, \quad \quad 1\leq j \leq n, &lt;/math&gt; }}<br /> <br /> where:<br /> <br /> <br /> * $f$ is called the ''structural model''. It corresponds to the basic type of curve we suspect the data is following, e.g., linear, logarithmic, exponential, etc. Sometimes, a model of the associated biological processes leads to equations that define the curve's shape.<br /> <br /> * $(t_1,t_2,\ldots , t_n)$ is the vector of observation times. Here, $t_1 = 0$ hours and $t_n = t_{16} = 15$ hours.<br /> <br /> * $\psi=(\psi_1, \psi_2, \ldots, \psi_d)$ is a vector of $d$ parameters that influences the value of $f$.<br /> <br /> * $(e_1, e_2, \ldots, e_n)$ are called the ''residual errors''. Usually, we suppose that they come from some centered probability distribution: $\esp{e_j} =0$. <br /> <br /> <br /> In fact, we usually state a continuous data model in a slightly more flexible way:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;cont&quot;&gt;&lt;math&gt;<br /> y_{j} = f(t_j ; \psi) + g(t_j ; \psi)\teps_j , \quad \quad 1\leq j \leq n,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> where now:<br /> <br /> <br /> &lt;ul&gt;<br /> * $g$ is called the ''residual error model''. It may be a function of the time $t_j$ and parameters $\psi$.<br /> <br /> * $(\teps_1, \teps_2, \ldots, \teps_n)$ are the ''normalized'' residual errors. We suppose that these come from a probability distribution which is centered and has unit variance: $\esp{\teps_j} = 0$ and $\var{\teps_j} =1$.<br /> &lt;/ul&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ===Choosing a residual error model===<br /> <br /> <br /> The choice of a residual error model $g$ is very flexible, and allows us to account for many different hypotheses we may have on the error's distribution. Let $f_j=f(t_j;\psi)$. Here are some simple error models.<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Constant error model'': $g=a$. That is, $y_j=f_j+a\teps_j$.<br /> <br /> <br /> * ''Proportional error model'': $g=b\,f$. That is, $y_j=f_j+bf_j\teps_j$. This is for when we think the magnitude of the error is proportional to the value of the predicted value $f$.<br /> <br /> <br /> * ''Combined error model'': $g=a+b f$. Here, $y_j=f_j+(a+bf_j)\teps_j$.<br /> <br /> <br /> * ''Alternative combined error model'': $g^2=a^2+b^2f^2$. Here, $y_j=f_j+\sqrt{a^2+b^2f_j^2}\teps_j$.<br /> <br /> <br /> * ''Exponential error model'': here, the model is instead $\log(y_j)=\log(f_j) + a\teps_j$, that is, $g=a$. It is exponential in the sense that if we exponentiate, we end up with $y_j = f_j e^{a\teps_j}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Tasks===<br /> <br /> To model a vector of observations $y = (y_j,\, 1\leq j \leq n$) we must perform several tasks:<br /> <br /> &lt;ul&gt;<br /> * Select a structural model $f$ and a residual error model $g$.<br /> <br /> <br /> * Estimate the model's parameters $\psi$.<br /> <br /> <br /> * ''Assess and validate'' the selected model.<br /> &lt;/ul&gt;<br /> <br /> <br /> <br /> &lt;br&gt;<br /> === Selecting structural and residual error models ===<br /> <br /> As we are interested in [http://en.wikipedia.org/wiki/Parametric_model parametric modeling], we must choose parametric structural and residual error models. In the absence of biological (or other) information, we might suggest possible structural models just by looking at the graphs of time-evolution of the data. For example, if $y_j$ is increasing with time, we might suggest an affine, quadratic or logarithmic model, depending on the approximate trend of the data. If $y_j$ is instead decreasing ever slower to zero, an exponential model might be appropriate.<br /> <br /> However, often we have biological (or other) information to help us make our choice. For instance, if we have a system of [http://en.wikipedia.org/wiki/Differential_equation differential equations] describing how the drug is eliminated from the body, its solution may provide the formula (i.e., structural model) we are looking for.<br /> <br /> As for the residual error model, if it is not immediately obvious which one to choose, several can be tested in conjunction with one or several possible structural models. After parameter estimation, each structural and residual error model pair can be assessed, compared against the others, and/or validated in various ways.<br /> <br /> Now we can have a first look at parameter estimation, and further on, model assessment and validation.<br /> <br /> <br /> &lt;br&gt;<br /> ===Parameter estimation===<br /> <br /> <br /> Given the observed data and the choice of a parametric model to describe it, our goal becomes to find the &quot;best&quot; parameters for the model. A traditional framework to solve this kind of problem is called [http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimation] or MLE, in which the &quot;most likely&quot; parameters are found, given the data that was observed.<br /> <br /> The likelihood $L$ is a function defined as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; L(\psi ; y_1,y_2,\ldots,y_n) \ \ \eqdef \ \ \py( y_1,y_2,\ldots,y_n; \psi) , &lt;/math&gt; }}<br /> <br /> i.e., the conditional [http://en.wikipedia.org/wiki/Joint_probability_distribution joint density function] of $(y_j)$ given the parameters $\psi$, but looked at as if the data are known and the parameters not. The $\hat{\psi}$ which maximizes $L$ is known as the ''maximum likelihood estimator''.<br /> <br /> Suppose that we have chosen a structural model $f$ and residual error model $g$. If we assume for instance that $\teps_j \sim_{i.i.d} {\cal N}(0,1)$, then the $y_j$ are independent of each other and [[#cont|(1)]] means that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y_{j} \sim {\cal N}\left(f(t_j ; \psi) , g(t_j ; \psi)^2\right), \quad \quad 1\leq j \leq n .&lt;/math&gt; }}<br /> <br /> Due to this independence, the pdf of $y = (y_1, y_2, \ldots, y_n)$ is the product of the pdfs of each $y_j$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \py(y_1, y_2, \ldots y_n ; \psi) &amp;=&amp; \prod_{j=1}^n \pyj(y_j ; \psi) \\ \\<br /> &amp; = &amp; \frac{1}{\prod_{j=1}^n \sqrt{2\pi} g(t_j ; \psi)} \ {\rm exp}\left\{-\frac{1}{2} \sum_{j=1}^n \left( \displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2\right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This is the same thing as the likelihood function $L$ when seen as a function of $\psi$. Maximizing $L$ is equivalent to minimizing the deviance, i.e., -2 $\times$ the $\log$-likelihood ($LL$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;LLL&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \hat{\psi} &amp;=&amp; \argmin{\psi} \left\{ -2 \,LL \right\}\\<br /> &amp;=&amp; \argmin{\psi} \left\{<br /> \sum_{j=1}^n \log\left(g(t_j ; \psi)^2\right) + \sum_{j=1}^n \left(\displaystyle{ \frac{y_j - f(t_j ; \psi)}{g(t_j ; \psi)} }\right)^2 \right\} . <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> <br /> This minimization problem does not usually have an [http://en.wikipedia.org/wiki/Analytical_expression analytical solution] for nonlinear models, so an [http://en.wikipedia.org/wiki/Mathematical_optimization optimization] procedure needs to be used.<br /> However, for a few specific models, analytical solutions do exist.<br /> <br /> For instance, suppose we have a constant error model: $y_{j} = f(t_j ; \psi) + a \, \teps_j,\,\, 1\leq j \leq n,$ that is: $g(t_j;\psi) = a$. In practice, $f$ is not itself a function of $a$, so we can write $\psi = (\phi,a)$ and therefore: $y_{j} = f(t_j ; \phi) + a \, \teps_j.$ Thus, [[#LLL|(2)]] simplifies to:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; (\hat{\phi},\hat{a}) \ \ = \ \ \argmin{(\phi,a)} \left\{<br /> n \log(a^2) + \sum_{j=1}^n \left(\displaystyle{ \frac{y_j - f(t_j ; \phi)}{a} }\right)^2 \right\} .<br /> &lt;/math&gt; }}<br /> <br /> The solution is then:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\phi} &amp;=&amp; \argmin{\phi} \sum_{j=1}^n \left( y_j - f(t_j ; \phi)\right)^2 \\<br /> \hat{a}^2&amp;=&amp; \frac{1}{n}\sum_{j=1}^n \left( y_j - f(t_j ; \hat{\phi})\right)^2 ,<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> where $\hat{a}^2$ is found by setting the [http://en.wikipedia.org/wiki/Partial_derivative partial derivative] of $-2LL$ to zero.<br /> <br /> Whether this has an analytical solution or not depends on the form of $f$. For example, if $f(t_j;\phi)$ is just a linear function of the components of the vector $\phi$, we can represent it as a matrix $F$ whose $j$th row gives the coefficients at time $t_j$. Therefore, we have the matrix equation $y = F \phi + a \teps$.<br /> <br /> The solution for $\hat{\phi}$ is thus the least-squares one, and for $\hat{a}^2$ it is the same as before:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\phi} &amp;=&amp; (F^\prime F)^{-1} F^\prime y \\<br /> \hat{a}^2&amp;=&amp; \frac{1}{n}\sum_{j=1}^n \left( y_j - F_j \hat{\phi}\right)^2 . \\<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> ===Computing the Fisher information matrix===<br /> <br /> The [http://en.wikipedia.org/wiki/Fisher_information Fisher information] is a way of measuring the amount of information that an observable random variable carries about an unknown parameter upon which its probability distribution depends.<br /> <br /> Let $\psis$ be the true unknown value of $\psi$, and let $\hatpsi$ be the maximum likelihood estimate of $\psi$. If the observed likelihood function is sufficiently smooth, asymptotic theory for maximum-likelihood estimation holds and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;intro_individualCLT&quot;&gt;&lt;math&gt;<br /> I_n(\psis)^{\frac{1}{2} }(\hatpsi-\psis) \limite{n\to \infty}{} {\mathcal N}(0,\id) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> where $I_n(\psis)$ is (minus) the Hessian (i.e., the matrix of the second derivatives) of the log-likelihood:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;I_n(\psis)=- \displaystyle{ \frac{\partial^2}{\partial \psi \partial \psi^\prime} } LL(\psis;y_1,y_2,\ldots,y_n)<br /> &lt;/math&gt; }}<br /> <br /> is the ''observed Fisher information matrix''. Here, &quot;observed&quot; means that it is a function of observed variables $y_1,y_2,\ldots,y_n$.<br /> <br /> Thus, an estimate of the covariance of $\hatpsi$ is the inverse of the observed Fisher information matrix as expressed by the formula:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;C(\hatpsi) = - I_n(\hatpsi)^{-1} . &lt;/math&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> ===Deriving confidence intervals for parameters===<br /> <br /> Let $\psi_k$ be the $k$th of $d$ components of $\psi$. Imagine that we have estimated $\psi_k$ with $\hatpsi_k$, the $k$th component of the MLE $\hatpsi$, that is, a random variable that converges to $\psi_k^{\star}$ when $n \to \infty$ under very general conditions.<br /> <br /> An estimator of its variance is the $k$th element of the diagonal of the covariance matrix $C(\hatpsi)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\widehat{\rm Var}(\hatpsi_k) = C_{kk}(\hatpsi) .&lt;/math&gt; }}<br /> <br /> We can thus derive an estimator of its [http://en.wikipedia.org/wiki/Standard_error standard error]:<br /> {{Equation1<br /> |equation=&lt;math&gt;\widehat{\rm s.e.}(\hatpsi_k) = \sqrt{C_{kk}(\hatpsi)} ,&lt;/math&gt; }}<br /> <br /> and a [http://en.wikipedia.org/wiki/Confidence_interval confidence interval] of level $1-\alpha$ for $\psi_k^\star$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;{\rm CI}(\psi_k^\star) = \left[\hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(\frac{\alpha}{2}\right), \ \hatpsi_k + \widehat{\rm s.e.}(\hatpsi_k)\,q\left(1-\frac{\alpha}{2}\right)\right] , &lt;/math&gt; }}<br /> <br /> where $q(w)$ is the [http://en.wikipedia.org/wiki/Quantile quantile] of order $w$ of a ${\cal N}(0,1)$ distribution.<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Approximating the fraction $\hatpsi/\widehat{\rm s.e}(\hatpsi_k)$ by the normal distribution is a &quot;good&quot; approximation only when the number of observations $n$ is large. A better approximation should be used for small $n$. In the model $y_j = f(t_j ; \phi) + a\teps_j$, the distribution of $\hat{a}^2$ can be approximated by a [http://en.wikipedia.org/wiki/Chi-squared_distribution chi-squared distribution] with $(n-d_\phi)$ [http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29 degrees of freedom], where $d_\phi$ is the dimension of $\phi$. The quantiles of the normal distribution can then be replaced by those of a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] with $(n-d_\phi)$ degrees of freedom.<br /> &lt;!-- %$${\rm CI}(\psi_k) = [\hatpsi_k - \widehat{\rm s.e}(\hatpsi_k)q((1-\alpha)/2,n-d) , \hatpsi_k + \widehat{\rm s.e}(\hatpsi_k)q((1+\alpha)/2,n-d)]$$ --&gt;<br /> &lt;!-- %where $q(\alpha,\nu)$ is the quantile of order $\alpha$ of a $t$-distribution with $\nu$ degrees of freedom. --&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Deriving confidence intervals for predictions===<br /> <br /> <br /> The structural model $f$ can be predicted for any $t$ using the estimated value $f(t; \hatphi)$. For that $t$, we can then derive a confidence interval for $f(t,\phi)$ using the estimated variance of $\hatphi$. Indeed, as a first approximation we have:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; f(t ; \hatphi) \simeq f(t ; \phis) + \nabla f (t,\phis) (\hatphi - \phis) ,&lt;/math&gt; }}<br /> <br /> where $\nabla f(t,\phis)$ is the gradient of $f$ at $\phis$, i.e., the vector of the first-order partial derivatives of $f$ with respect to the components of $\phi$, evaluated at $\phis$. Of course, we do not actually know $\phis$, but we can estimate $\nabla f(t,\phis)$ with $\nabla f(t,\hatphi)$. The variance of $f(t ; \hatphi)$ can then be estimated by<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \widehat{\rm Var}\left(f(t ; \hatphi)\right) \simeq \nabla f (t,\hatphi)\widehat{\rm Var}(\hatphi) \left(\nabla f (t,\hatphi) \right)^\prime . &lt;/math&gt; }}<br /> <br /> We can then derive an estimate of the standard error of $f (t,\hatphi)$ for any $t$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\widehat{\rm s.e.}(f(t ; \hatphi)) = \sqrt{\widehat{\rm Var}\left(f(t ; \hatphi)\right)} , &lt;/math&gt; }}<br /> <br /> and a confidence interval of level $1-\alpha$ for $f(t ; \phi^\star)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;{\rm CI}(f(t ; \phi^\star)) = \left[f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(\frac{\alpha}{2}\right), \ f(t ; \hatphi) + \widehat{\rm s.e.}(f(t ; \hatphi))\,q\left(1-\frac{\alpha}{2}\right)\right].&lt;/math&gt; }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> ===Estimating confidence intervals using Monte Carlo simulation===<br /> <br /> The use of [http://en.wikipedia.org/wiki/Monte_Carlo_method Monte Carlo methods] to estimate a distribution does not require any approximation of the model.<br /> <br /> We proceed in the following way. Suppose we have found a MLE $\hatpsi$ of $\psi$. We then simulate a data vector $y^{(1)}$ by first randomly generating the vector $\teps^{(1)}$ and then calculating for $1 \leq j \leq n$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; y^{(1)}_j = f(t_j ;\hatpsi) + g(t_j ;\hatpsi)\teps^{(1)}_j . &lt;/math&gt; }}<br /> <br /> In a sense, this gives us an example of &quot;new&quot; data from the &quot;same&quot; model. We can then compute a new MLE $\hat{\psi}^{(1)}$ of $\psi$ using $y^{(1)}$.<br /> <br /> Repeating this process $M$ times gives $M$ estimates of $\psi$ from which we can obtain an empirical estimation of the distribution of $\hatpsi$, or any quantile we like.<br /> <br /> Any confidence interval for $\psi_k$ (resp. $f(t,\psi_k)$) can then be approximated by a prediction interval for $\hatpsi_k$ (resp. $f(t,\hatpsi_k)$). For instance, a two-sided confidence interval of level $1-\alpha$ for $\psi_k^\star$ can be estimated by the prediction interval<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; [\hat{\psi}_{k,([\frac{\alpha}{2} M])} \ , \ \hat{\psi}_{k,([ (1-\frac{\alpha}{2})M])} ], &lt;/math&gt; }}<br /> <br /> where $[\cdot]$ denotes the [http://en.wikipedia.org/wiki/Floor_and_ceiling_functions integer part] and $(\psi_{k,(m)},\ 1 \leq m \leq M)$ the order statistic, i.e., the parameters $(\hatpsi_k^{(m)}, 1 \leq m \leq M)$ reordered so that $\hatpsi_{k,(1)} \leq \hatpsi_{k,(2)} \leq \ldots \leq \hatpsi_{k,(M)}$.<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> ==A PK example ==<br /> <br /> In the real world, it is often not enough to look at the data, choose one possible model and estimate the parameters. The chosen structural model may or may not be &quot;good&quot; at representing the data. It may be good but the chosen residual error model bad, meaning that the overall model is poor, and so on. That is why in practice we may want to try out several structural and residual error models. After performing parameter estimation for each model, various assessment tasks can then be performed in order to conclude which model is best.<br /> <br /> <br /> &lt;br&gt;<br /> ===The data===<br /> <br /> This modeling process is illustrated in detail in the following [http://en.wikipedia.org/wiki/Pharmacokinetics PK] example. Let us consider a dose D=50mg of a drug administered orally to a patient at time $t=0$. The concentration of the drug in the bloodstream is then measured at times $(t_j) = (0.5, 1,\,1.5,\,2,\,3,\,4,\,8,\,10,\,12,\,16,\,20,\,24).$ Here is the file {{Verbatim|individualFitting_data.txt}} with the data:<br /> <br /> <br /> {| class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width: 30%;margin-left:15em&quot;<br /> !| Time || Concentration <br /> |-<br /> |0.5 || 0.94<br /> |-<br /> | 1.0 || 1.30<br /> |-<br /> | 1.5 || 1.64<br /> |-<br /> | 2.0 || 3.38<br /> |-<br /> | 3.0 || 3.72<br /> |-<br /> | 4.0 || 3.29<br /> |-<br /> | 8.0 || 1.31<br /> |-<br /> | 10.0 || 0.80<br /> |-<br /> | 12.0 || 0.39<br /> |-<br /> | 16.0 || 0.31<br /> |-<br /> | 20.0 || 0.10<br /> |-<br /> | 24.0 || 0.09<br /> |}<br /> <br /> <br /> We are going to perform the analyses for this example with the free statistical software [http://www.r-project.org/ {{Verbatim|R}}]. First, we import the data and plot it to have a look:<br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width: 50%&quot; | <br /> [[File:NewIndividual1.png|link=]]<br /> | style=&quot;width: 50%&quot; | {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> pk1=read.table(&quot;individualFitting_data.txt&quot;,header=T) <br /> t=pk1$time <br /> y=pk1$concentration<br /> plot(t, y, xlab=&quot;time(hour)&quot;,<br /> ylab=&quot;concentration(mg/l)&quot;, col=&quot;blue&quot;) <br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Fitting two PK models===<br /> <br /> We are going to consider two possible structural models that may describe the observed time-course of the concentration:<br /> <br /> <br /> &lt;ul&gt;<br /> * A [http://en.wikipedia.org/wiki/Multi-compartment_model#Single-compartment_model one compartment model] with first-order [http://en.wikipedia.org/wiki/Absorption_%28pharmacokinetics%29 absorption] and linear elimination:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \phi_1 &amp;=&amp; (k_a, V, k_e) \\<br /> f_1(t ; \phi_1) &amp;=&amp; \frac{D\, k_a}{V(k_a-k_e)} \left( e^{-k_e \, t} - e^{-k_a \, t} \right).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> * A one compartment model with zero-order absorption and linear elimination:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \phi_2 &amp;=&amp; (T_{k0}, V, k_e) \\<br /> f_2(t ; \phi_2) &amp;=&amp; \left\{ \begin{array}{ll}<br /> \displaystyle{ \frac{D}{V \,T_{k0} \, k_e} }\left( 1- e^{-k_e \, t} \right) &amp; {\rm if }\ t\leq T_{k0} \\<br /> \displaystyle{ \frac{D}{V \,T_{k0} \, k_e} } \left( 1- e^{-k_e \, T_{k0} } \right)e^{-k_e \, (t- T_{k0})} &amp; {\rm otherwise} .<br /> \end{array}<br /> \right.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> We define each of these functions in {{Verbatim|R}}:<br /> <br /> <br /> {{Rcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> predc1=function(t,x){<br /> f=50*x/x/(x-x)*(exp(-x*t)-exp(-x*t)) }<br /> <br /> predc2=function(t,x){<br /> ff=50/x/x/x*(1-exp(-x*t))<br /> ff[t&gt;x]=50/x/x/x*(1-exp(-x*x))*exp(-x*(t[t&gt;x]-x))<br /> f=ff} &lt;/pre&gt;<br /> }}<br /> <br /> We then define two models ${\cal M}_1$ and ${\cal M}_2$ that assume (for now) constant residual error models:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\cal M}_1 : \quad y_j &amp; = &amp; f_1(t_j ; \phi_1) + a_1\teps_j \\<br /> {\cal M}_2 : \quad y_j &amp; = &amp; f_2(t_j ; \phi_2) + a_2\teps_j .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can fit these two models to our data by computing the MLE $\hatpsi_1=(\hatphi_1,\hat{a}_1)$ and $\hatpsi_2=(\hatphi_2,\hat{a}_2)$ of $\psi$ under each model:<br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;10&quot; <br /> | style=&quot;width:50%&quot; | <br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> fmin1=function(x,y,t)<br /> {f=predc1(t,x)<br /> g=x<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> fmin2=function(x,y,t)<br /> {f=predc2(t,x)<br /> g=x<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> #--------- MLE --------------------------------<br /> <br /> pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian=&quot;true&quot;)<br /> psi1=pk.nlm1$estimate<br /> <br /> pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian=&quot;true&quot;)<br /> psi2=pk.nlm2$estimate<br /> &lt;/pre&gt;<br /> }}<br /> | style=&quot;width:50%&quot; | <br /> :Here are the parameter estimation results:<br /> <br /> <br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; cat(&quot; psi1 =&quot;,psi1,&quot;\n\n&quot;)<br /> psi1 = 0.3240916 6.001204 0.3239337 0.4366948<br /> <br /> &gt; cat(&quot; psi2 =&quot;,psi2,&quot;\n\n&quot;)<br /> psi2 = 3.203111 8.999746 0.229977 0.2555242<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Assessing and selecting the PK model===<br /> <br /> The estimated parameters $\hatphi_1$ and $\hatphi_2$ can then be used for computing the predicted concentrations $\hat{f}_1(t)$ and $\hat{f}_2(t)$ under both models at any time $t$. These curves can then be plotted over the original data and compared:<br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width:50%&quot; | <br /> [[File:New_Individual2.png|link=]]<br /> | style=&quot;width:50%&quot; |<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> tc=seq(from=0,to=25,by=0.1)<br /> fc1=predc1(tc,phi1)<br /> fc2=predc2(tc,phi2)<br /> <br /> plot(t,y,ylim=c(0,4.1),xlab=&quot;time (hour)&quot;, <br /> ylab=&quot;concentration (mg/l)&quot;,col = &quot;blue&quot;)<br /> lines(tc,fc1, type = &quot;l&quot;, col = &quot;green&quot;, lwd=2)<br /> lines(tc,fc2, type = &quot;l&quot;, col = &quot;red&quot;, lwd=2)<br /> abline(a=0,b=0,lty=2)<br /> legend(13,4,c(&quot;observations&quot;, &quot;first order absorption&quot;,<br /> &quot;zero order absorption&quot;), lty=c(-1,1,1), <br /> pch=c(1,-1,-1), lwd=2,<br /> col=c(&quot;blue&quot;,&quot;green&quot;,&quot;red&quot;))<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> We clearly see that a much better fit is obtained with model ${\cal M}_2$, i.e., the one assuming a zero-order absorption process.<br /> <br /> Another useful goodness-of-fit plot is obtained by displaying the observations $(y_j)$ versus the predictions $\hat{y}_j=f(t_j ; \hatpsi)$ given by the models:<br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width:50%&quot; | <br /> [[File:individual3.png|link=]]<br /> | style=&quot;width:50%&quot; |<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> f1=predc1(t,phi1)<br /> f2=predc2(t,phi2)<br /> <br /> par(mfrow= c(1,2))<br /> plot(f1,y,xlim=c(0,4),ylim=c(0,4),main=&quot;model 1&quot;)<br /> abline(a=0,b=1,lty=1)<br /> plot(f2,y,xlim=c(0,4),ylim=c(0,4),main=&quot;model 2&quot;)<br /> abline(a=0,b=1,lty=1)<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Model selection===<br /> <br /> <br /> Again, ${\cal M}_2$ would seem to have a slight edge. This can be tested more analytically using the [http://en.wikipedia.org/wiki/Bayesian_information_criterion Bayesian Information Criteria] (BIC):<br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;10&quot; <br /> | style=&quot;width:50%&quot; | <br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> deviance1=pk.nlm1$minimum + n*log(2*pi)<br /> bic1=deviance1+log(n)*length(psi1)<br /> deviance2=pk.nlm2$minimum + n*log(2*pi)<br /> bic2=deviance2+log(n)*length(psi2)<br /> &lt;/pre&gt; }}<br /> | style=&quot;width:50%&quot; | <br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; cat(&quot; bic1 =&quot;,bic1,&quot;\n\n&quot;)<br /> bic1 = 24.10972<br /> <br /> &gt; cat(&quot; bic2 =&quot;,bic2,&quot;\n\n&quot;)<br /> bic2 = 11.24769<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> A smaller BIC is better. Therefore, this also suggests that model ${\cal M}_2$ should be selected.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Fitting different error models===<br /> <br /> <br /> For the moment, we have only considered constant error models. However, the &quot;observations vs predictions&quot; figure hints that the amplitude of the residual errors may increase with the size of the predicted value. Let us therefore take a closer look at four different residual error models, each of which we will associate with the &quot;best&quot; structural model $f_2$:<br /> <br /> {| cellpadding=&quot;2&quot; cellspacing=&quot;8&quot; style=&quot;text-align:left; margin-left:4%&quot;<br /> |${\cal M}_2$ || Constant error model: || $y_j=f_2(t_j;\phi_2)+a_2\teps_j$<br /> |-<br /> |${\cal M}_3$ || Proportional error model: || $y_j=f_2(t_j;\phi_3)+b_3f_2(t_j;\phi_3)\teps_j$<br /> |-<br /> |${\cal M}_4$ || Combined error model: || $y_j=f_2(t_j;\phi_4)+(a_4+b_4f_2(t_j;\phi_4))\teps_j$ <br /> |-<br /> |${\cal M}_5$ || Exponential error model: || $\log(y_j)=\log(f_2(t_j;\phi_5)) + a_5\teps_j$.<br /> |}<br /> <br /> The three new ones need to be entered into {{Verbatim|R}}:<br /> <br /> <br /> {{Rcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> fmin3=function(x,y,t)<br /> {f=predc2(t,x)<br /> g=x*f<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> fmin4=function(x,y,t)<br /> {f=predc2(t,x)<br /> g=abs(x)+abs(x)*f<br /> e=sum( ((y-f)/g)^2 + log(g^2))<br /> }<br /> <br /> fmin5=function(x,y,t)<br /> {f=predc2(t,x)<br /> g=x<br /> e=sum( ((log(y)-log(f))/g)^2 + log(g^2))<br /> }<br /> &lt;/pre&gt; }}<br /> <br /> <br /> We can now compute the MLE $\hatpsi_3=(\hatphi_3,\hat{b}_3)$, $\hatpsi_4=(\hatphi_4,\hat{a}_4,\hat{b}_4)$ and $\hatpsi_5=(\hatphi_5,\hat{a}_5)$ of $\psi$ under models ${\cal M}_3$, ${\cal M}_4$ and ${\cal M}_5$:<br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;10&quot; <br /> |style=&quot;width:50%&quot; |<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> #---------------- MLE -------------------<br /> <br /> pk.nlm3=nlm(fmin3, c(phi2,0.1), y, t, <br /> hessian=&quot;true&quot;)<br /> psi3=pk.nlm3$estimate<br /> <br /> pk.nlm4=nlm(fmin4, c(phi2,1,0.1), y, t, <br /> hessian=&quot;true&quot;)<br /> psi4=pk.nlm4$estimate<br /> psi4[c(4,5)]=abs(psi4[c(4,5)])<br /> <br /> pk.nlm5=nlm(fmin5, c(phi2,0.1), y, t, <br /> hessian=&quot;true&quot;)<br /> psi5=pk.nlm5$estimate <br /> &lt;/pre&gt; }}<br /> |style=&quot;width:50%&quot; |<br /> {{JustCodeForTable<br /> |code=&lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; cat(&quot; psi3 =&quot;,psi3,&quot;\n\n&quot;)<br /> psi3 = 2.642409 11.44113 0.1838779 0.2189221<br /> <br /> &gt; cat(&quot; psi4 =&quot;,psi4,&quot;\n\n&quot;)<br /> psi4 = 2.890066 10.16836 0.2068221 0.02741416 0.1456332<br /> <br /> &gt; cat(&quot; psi5 =&quot;,psi5,&quot;\n\n&quot;)<br /> psi5 = 2.710984 11.2744 0.188901 0.2310001<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Selecting the error model===<br /> <br /> As before, these curves can be plotted over the original data and compared:<br /> <br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; <br /> |style=&quot;width=50%&quot;|<br /> [[File:New_Individual4.png|link=]]<br /> |style=&quot;width=50%&quot;|<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> tc=seq(from=0,to=25,by=0.1)<br /> fc1=predc1(tc,phi1)<br /> fc2=predc2(tc,phi2)<br /> <br /> plot(t,y,ylim=c(0,4.1), xlab=&quot;time (hour)&quot;, <br /> ylab=&quot;concentration (mg/l)&quot;, col=&quot;blue&quot;)<br /> lines(tc,fc1, type = &quot;l&quot;, col = &quot;green&quot;, lwd=2)<br /> lines(tc,fc2, type = &quot;l&quot;, col = &quot;red&quot;, lwd=2)<br /> abline(a=0,b=0,lty=2)<br /> legend(13,4,c(&quot;observations&quot;, <br /> &quot;first order absorption&quot;,<br /> &quot;zero order absorption&quot;),<br /> lty=c(-1,1,1), pch=c(1,-1,-1), lwd=2, <br /> col=c(&quot;blue&quot;,&quot;green&quot;,&quot;red&quot;))<br /> &lt;/pre&gt; }}<br /> |} <br /> <br /> <br /> As you can see, the three predicted concentrations obtained with models${\cal M}_3$,${\cal M}_4$and${\cal M}_5$are quite similar. We now calculate the BIC for each:<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;10&quot; <br /> |style=&quot;width=50%&quot;|<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> deviance3=pk.nlm3$minimum + n*log(2*pi)<br /> bic3=deviance3 + log(n)*length(psi3)<br /> deviance4=pk.nlm4$minimum + n*log(2*pi)<br /> bic4=deviance4 + log(n)*length(psi4)<br /> deviance5=pk.nlm5$minimum + 2*sum(log(y)) + n*log(2*pi)<br /> bic5=deviance5 + log(n)*length(psi5)<br /> &lt;/pre&gt; }}<br /> |style=&quot;width=50%&quot;|<br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; cat(&quot; bic3 =&quot;,bic3,&quot;\n\n&quot;)<br /> bic3 = 3.443607<br /> <br /> &gt; cat(&quot; bic4 =&quot;,bic4,&quot;\n\n&quot;)<br /> bic4 = 3.475841<br /> <br /> &gt; cat(&quot; bic5 =&quot;,bic5,&quot;\n\n&quot;)<br /> bic5 = 4.108521<br /> &lt;/pre&gt; }}<br /> |} <br /> <br /> All of these BIC are lower than the constant residual error one. BIC selects the residual error model ${\cal M}_3$ with a proportional component.<br /> <br /> There is not a large difference between these three error models, though the proportional and combined error models give the smallest and essentially identical BIC. We decide to use the combined error model ${\cal M}_4$ in the following (the same types of analysis could be done with the proportional error model).<br /> <br /> A 90% confidence interval for $\psi_4$ can derived from the Hessian (i.e., the square matrix of second-order partial derivatives) of the objective function (i.e., -2 $\times \ LL$):<br /> <br /> <br /> {| cellpadding=&quot;10&quot; cellspacing=&quot;10&quot; <br /> |style=&quot;width=50%&quot;|<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> ialpha=0.9<br /> df=n-length(phi4)<br /> I4=pk.nlm4$hessian/2<br /> H4=solve(I4)<br /> s4=sqrt(diag(H4)*n/df)<br /> delta4=s4*qt(0.5+ialpha/2, df)<br /> &lt;/pre&gt; }}<br /> |style=&quot;width=50%&quot;|<br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; ci4<br /> [,1] [,2]<br /> [1,] 2.22576690 3.55436561<br /> [2,] 7.93442421 12.40228967<br /> [3,] 0.16628224 0.24736196<br /> [4,] -0.02444571 0.07927403<br /> [5,] 0.04119983 0.25006660<br /> &lt;/pre&gt;}}<br /> |}<br /> <br /> <br /> We can also calculate a 90% confidence interval for$f_4(t)$using the [http://en.wikipedia.org/wiki/Central_limit_theorem Central Limit Theorem] (see [[#intro_individualCLT|(3)]]):<br /> <br /> <br /> {{Rcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> nlpredci=function(phi,f,H)<br /> {<br /> dphi=length(phi)<br /> nf=length(f)<br /> H=H*n/(n-dphi)<br /> S=H[seq(1,dphi),seq(1,dphi)]<br /> G=matrix(nrow=nf, ncol=dphi)<br /> for (k in seq(1,dphi)) {<br /> dk=phi[k]*(1e-5)<br /> phid=phi<br /> phid[k]=phi[k] + dk<br /> fd=predc2(tc,phid)<br /> G[,k]=(f-fd)/dk<br /> }<br /> M=rowSums((G%*%S)*G)<br /> deltaf=sqrt(M)*qt(0.5+ialpha/2,df)<br /> }<br /> <br /> deltafc4=nlpredci(phi4,fc4,H4)<br /> &lt;/pre&gt;}}<br /> <br /> This can then be plotted:<br /> <br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; <br /> |style=&quot;width=50%&quot;|<br /> [[File:NewIndividual6.png|link=]]<br /> |style=&quot;width=50%&quot;|<br /> {{RcodeForTable<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> plot(t,y,ylim=c(0,4.5), xlab=&quot;time (hour)&quot;, <br /> ylab=&quot;concentration (mg/l)&quot;, col=&quot;blue&quot;)<br /> lines(tc,fc4, type = &quot;l&quot;,col = &quot;red&quot;,lwd=2)<br /> lines(tc, fc4-deltafc4, type = &quot;l&quot;,<br /> col = &quot;red&quot; ,lwd=1, lty=3)<br /> lines(tc,fc4+deltafc4,type = &quot;l&quot;,<br /> col = &quot;red&quot;, lwd=1, lty=3)<br /> abline(a=0,b=0,lty=2)<br /> legend(10.5,4.5,c(&quot;observed concentrations&quot;,<br /> &quot;predicted concentration&quot;, <br /> &quot;CI for predicted concentration&quot;),<br /> lty=c(-1,1,3),pch=c(1,-1,-1),lwd=c(2,2,1),<br /> col=c(&quot;blue&quot;,&quot;red&quot;,&quot;red&quot;))<br /> &lt;/pre&gt; }}<br /> |} <br /> <br /> Alternatively, prediction intervals for$\hatpsi_4$,$\hat{f}_4(t;\hatpsi_4)$and new observations for any time$t$can be estimated by Monte Carlo simulation:<br /> <br /> <br /> {{Rcode<br /> |name=<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> f=predc2(t,phi4)<br /> a4=psi4<br /> b4=psi4<br /> g=a4+b4*f<br /> dpsi=length(psi4)<br /> nc=length(tc)<br /> N=1000<br /> qalpha=c(0.5 - alpha/2,0.5 + alpha/2)<br /> PSI=matrix(nrow=N,ncol=dpsi)<br /> FC=matrix(nrow=N,ncol=nc)<br /> Y=matrix(nrow=N,ncol=nc)<br /> for (k in seq(1,N)) {<br /> eps=rnorm(n)<br /> ys=f+g*eps<br /> pk.nlm=nlm(fmin4, psi4, ys, t)<br /> psie=pk.nlm$estimate<br /> psie[c(4,5)]=abs(psie[c(4,5)])<br /> PSI[k,]=psie<br /> fce=predc2(tc,psie[c(1,2,3)])<br /> FC[k,]=fce<br /> gce=a4+b4*fce<br /> Y[k,]=fce + gce*rnorm(1)<br /> }<br /> <br /> ci4s=matrix(nrow=dpsi,ncol=2)<br /> for (k in seq(1,dpsi)){<br /> ci4s[k,]=quantile(PSI[,k],qalpha,names=FALSE)<br /> }<br /> m4s=colMeans(PSI)<br /> sd4s=apply(PSI,2,sd)<br /> <br /> cifc4s=matrix(nrow=nc,ncol=2)<br /> for (k in seq(1,nc)){<br /> cifc4s[k,]=quantile(FC[,k],qalpha,names=FALSE)<br /> }<br /> <br /> ciy4s=matrix(nrow=nc,ncol=2)<br /> for (k in seq(1,nc)){<br /> ciy4s[k,]=quantile(Y[,k],qalpha,names=FALSE)<br /> }<br /> <br /> par(mfrow= c(1,1))<br /> plot(t,y,ylim=c(0,4.5),xlab=&quot;time (hour)&quot;,<br /> ylab=&quot;concentration (mg/l)&quot;,col = &quot;blue&quot;)<br /> lines(tc,fc4, type = &quot;l&quot;, col = &quot;red&quot;, lwd=2)<br /> lines(tc,cifc4s[,1], type = &quot;l&quot;, col = &quot;red&quot;, lwd=1, lty=3)<br /> lines(tc,cifc4s[,2], type = &quot;l&quot;, col = &quot;red&quot;, lwd=1, lty=3)<br /> lines(tc,ciy4s[,1], type = &quot;l&quot;, col = &quot;green&quot;, lwd=1, lty=3)<br /> lines(tc,ciy4s[,2], type = &quot;l&quot;, col = &quot;green&quot;, lwd=1, lty=3)<br /> abline(a=0,b=0,lty=2)<br /> legend(10.5,4.5,c(&quot;observed concentrations&quot;, &quot;predicted concentration&quot;, <br /> &quot;CI for predicted concentration&quot;, &quot;CI for observed concentrations&quot;), <br /> lty=c(-1,1,3,3), pch=c(1,-1,-1,-1), lwd=c(2,2,1,1), col=c(&quot;blue&quot;,&quot;red&quot;,&quot;red&quot;,&quot;green&quot;))<br /> &lt;/pre&gt; }}<br /> <br /> <br /> {| cellpadding=&quot;5&quot; cellspacing=&quot;0&quot; <br /> |style=&quot;width=50%&quot;|<br /> [[File:NewIndividual7.png|link=]]<br /> |style=&quot;width=50%&quot;|<br /> {{JustCodeForTable<br /> |code=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none; color:blue&quot;&gt;<br /> &gt; ci4s<br /> [,1] [,2]<br /> [1,] 2.350653e+00 3.53526320<br /> [2,] 8.350764e+00 12.04910579<br /> [3,] 1.818431e-01 0.24156832<br /> [4,] 5.445459e-09 0.08819339<br /> [5,] 1.563625e-02 0.19638889<br /> &lt;/pre&gt; }}<br /> |}<br /> <br /> <br /> The R code and input data used in this section can be downloaded here: {{filepath:R_IndividualFitting.rar}}.<br /> &lt;br&gt;<br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{buonaccorsi2010measurement,<br /> title={Measurement Error: Models, Methods, and Applications},<br /> author={Buonaccorsi, J.P.},<br /> isbn={9781420066586},<br /> lccn={2009048849},<br /> series={Chapman &amp; Hall/CRC Interdisciplinary Statistics},<br /> url={http://books.google.fr/books?id=QVtVmaCqLHMC},<br /> year={2010},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;&lt;bibtex&gt;<br /> @book{carroll2010measurement,<br /> title={Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition},<br /> author={Carroll, R.J. and Ruppert, D. and Stefanski, L.A. and Crainiceanu, C.M.},<br /> isbn={9781420010138},<br /> lccn={2006045485},<br /> series={Chapman &amp; Hall/CRC Monographs on Statistics &amp; Applied Probability},<br /> url={http://books.google.fr/books?id=9kBx5CPZCqkC},<br /> year={2010},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @book{fitzmaurice2004applied,<br /> title={Applied Longitudinal Analysis},<br /> author={Fitzmaurice, G.M. and Laird, N.M. and Ware, J.H.},<br /> isbn={9780471214878},<br /> lccn={04040891},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=gCoTIFejMgYC},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{gallant2009nonlinear,<br /> title={Nonlinear Statistical Models},<br /> author={Gallant, A.R.},<br /> isbn={9780470317372},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=imv-NMozseEC},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{huet2003statistical,<br /> title={Statistical tools for nonlinear regression: a practical guide with S-PLUS and R examples},<br /> author={Huet, S. and Bouvier, A. and Poursat, M.A. and Jolivet, E.},<br /> year={2003},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{ritz2008nonlinear,<br /> title={Nonlinear regression with R},<br /> author={Ritz, C. and Streibig, J.C.},<br /> volume={33},<br /> year={2008},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{ross1990nonlinear,<br /> title={Nonlinear estimation},<br /> author={Ross, G.J.S.},<br /> isbn={9780387972787},<br /> lccn={90032797},<br /> series={Springer series in statistics},<br /> url={http://books.google.fr/books?id=7LkyzdLMghIC},<br /> year={1990},<br /> publisher={Springer-Verlag}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{seber2003nonlinear,<br /> title={Nonlinear Regression},<br /> author={Seber, G.A.F. and Wild, C.J.},<br /> isbn={9780471471356},<br /> lccn={88017194},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=YBYlCpBNo\_cC},<br /> year={2003},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;&lt;bibtex&gt;<br /> @article{serroyen2009nonlinear,<br /> title={Nonlinear models for longitudinal data},<br /> author={Serroyen, J. and Molenberghs, G. and Verbeke, G. and Davidian, M. },<br /> journal={The American Statistician},<br /> volume={63},<br /> number={4},<br /> pages={378-388},<br /> year={2009},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{wolberg2006data,<br /> title={Data analysis using the method of least squares: extracting the most information from experiments},<br /> author={Wolberg, J.R.},<br /> volume={1},<br /> year={2006},<br /> publisher={Springer Berlin, Germany}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Overview <br /> |linkNext=What is a model? A joint probability distribution! }}</div> Admin http://wiki.webpopix.org/index.php/Overview Overview 2013-06-21T08:34:46Z <p>Admin : </p> <hr /> <div>The desire to model a biological or physical phenomenon often arises when we are able to record some observations issued from that phenomenon. Nothing would be more natural therefore than to begin this introduction by looking at some observed data.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= This first plot displays the [http://en.wikipedia.org/wiki/Viral_load viral load] of four patients with [http://en.wikipedia.org/wiki/Hepatitis_C hepatitis C] who started a treatment at time $t=0$.<br /> |image = NEWintro1.png<br /> }} <br /> <br /> <br /> {{ExampleWithImage<br /> |text=This second example involves weight data for rats measured over 14 weeks, for a sub-chronic [http://en.wikipedia.org/wiki/Toxicity toxicity] study related to the question of [http://en.wikipedia.org/wiki/Genetically_modified_maize genetically modified corn].<br /> |image = NEWintro2.png}}<br /> <br /> <br /> {{ExampleWithImage<br /> |text= In this third example, data are [http://en.wikipedia.org/wiki/Fluorescence fluorescence] intensities measured over time in a cellular biology experiment.<br /> |image=NEWintro3.png }}<br /> <br /> <br /> {{ExampleWithImage<br /> |text= Note that repeated measurements are not necessarily always functions of time.<br /> For example, we may be interested in corn production as a function of fertilizer quantity.<br /> |image= NEWintro4.png}}<br /> <br /> <br /> Even though these examples come from quite different domains, in each case the data is made up of repeated measurements on several individuals from a population. What we will call a &quot;population approach&quot; is therefore relevant for characterizing and modeling this data. The modeling goal is thus twofold: characterize the biological or physical phenomena observed for each individual, and secondly, the variability seen between individuals.<br /> <br /> In the example with the rats, the model needs to integrate a growth model that describes how a rat's weight increases with time, and a statistical model that describes why these kinetics can vary from one rat to another. The goal is thus to finish with a &quot;typical&quot; curve for the population (in red) and to be able to explain the variability in the individual's curves (in green) around this population curve.<br /> <br /> <br /> ::[[File:NEWintro5.png|link=]]<br /> <br /> <br /> The model will explain some of this variability by individual [http://en.wikipedia.org/wiki/Covariate covariates] such as sex or diet (rats 1 and 3 are male while rats 2 and 4 are female), but some of the variability will remain unexplained and will be considered as random. Integrating into the same model effects considered fixed and others considered random leads naturally to the use of [http://en.wikipedia.org/wiki/Mixed_model mixed-effects models].<br /> <br /> An alternative yet equivalent approach considers this model as a [http://en.wikipedia.org/wiki/Multilevel_model hierarchical] one: each curve is described by a single model, and the variability between individual models is described by a population model. In the case of [http://en.wikipedia.org/wiki/Parametric_model parametric models], this means that the observations for a given individual are described by a model of the observations that depends on a vector of individual parameters: this is the classic individual approach. The population approach is then a direct extension of [[The individual approach|the individual approach]]: we add a component to the model that describes the variability of the individual parameters within the population.<br /> <br /> A model can thus be seen as a [[What is a model? A joint probability distribution! | joint probability distribution]], which can easily be extended to the case where other variables in the model are considered as random variables: covariates, population parameters, the design, etc. The hierarchical structure of the model leads to a natural decomposition of the joint distribution into a product of [http://en.wikipedia.org/wiki/Conditional_probability_distribution conditional] and [http://en.wikipedia.org/wiki/Marginal_distribution marginal] distributions.<br /> <br /> Models for [[Modeling the individual parameters |individual parameters]] and models for [[Modeling the observations | observations]] are described in the [[Introduction_%26_notation|Models]] chapter. In particular, models for [[Continuous data models|continuous observations]], [[Model for categorical data|categorical data]], [[Models for count data|count data]] and [[ Models for time-to-event data | survival data]] are presented and illustrated by various examples. Extensions for [[ Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]] and [[Stochastic differential equations based models| stochastic differential equation based models]] are also presented.<br /> <br /> The Tasks &amp; Tools chapter presents practical examples of using these models: [[Visualization|exploration and visualization]], [[Estimation|estimation]], [[Model evaluation#Model diagnostics|model diagnostics]], [[Model evaluation#Model selection|model selection]] and [[Simulation|simulation]]. All approaches and proposed methods are rigorously detailed in the [[Introduction and notation|Methods]] chapter.<br /> <br /> The main purpose of a model is to be used. Mathematical modeling and statistics remain useful tools for many disciplines (biology, agronomy, environmental studies, pharmacology, etc.), but it is important that these tools are used properly. The various software packages used in this wiki have been developed with this in mind: they serve the modeler well, while fully complying with a coherent mathematical formalism and using well-known and theoretically justified methods.<br /> <br /> Tools for model exploration ($\mlxplore$), modeling ($\monolix$) and simulation ($\simulix$) use the same model coding language $\mlxtran$. This allows us to define a complete workflow using the same model implementation, i.e., to run several different tasks based on the same model.<br /> <br /> $\mlxtran$ is extremely flexible and well-adapted to implementing complex mixed-effects models.<br /> With $\mlxtran$ we can easily write ODE-based models, implement [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic models]] with complex administration schedules, include inter-individual variability in parameters, define statistical models for covariates, etc.<br /> Another crucial property of $\mlxtran$ is that it rigorously adopts the model representation formalism proposed in $\wikipopix$. In other words, the model implementation is fully consistent with its mathematical representation.<br /> <br /> $\mlxplore$ provides a clear graphical interface that allows us to visualize not only the structural model but also the statistical model, which is of fundamental importance in the population approach. We can visualize for instance the impact of covariates and inter-individual variability of model parameters on predictions. $\mlxplore$ is an ideal tool for teaching or discovering what a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic model]] is, for example.<br /> <br /> The algorithms implemented in $\monolix$ ([http://en.wikipedia.org/wiki/Stochastic_approximation Stochastic Approximation] of EM, [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo MCMC], [http://en.wikipedia.org/wiki/Simulated_Annealing Simulated Annealing], [http://en.wikipedia.org/wiki/Importance_sampling Importance Sampling], etc.) are extremely efficient for a wide variety of complex models. Furthermore, convergence of [[The SAEM algorithm for estimating population parameters|SAEM]] and its extensions ([[Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]], [[Stochastic differential equations based models|SDE-based models]], censored data, etc.) has been rigorously proved and published in statistical journals.<br /> <br /> $\simulix$ is a model computation engine which enables us to simulate a $\mlxtran$ model from within various environments. $\simulix$ is now available for the Matlab and R platforms, allowing any user to combine the flexibility of R and Matlab scripts with the power of $\mlxtran$ in order to easily encode complex models and simulate data.<br /> <br /> For these reasons, $\wikipopix$ and these tools can be used with confidence for training and teaching. This is even more the case because $\mlxplore$, $\monolix$ and $\simulix$ are free for academic research and education purposes.<br /> <br /> <br /> {{Next<br /> |link=The individual approach }}</div> Admin http://wiki.webpopix.org/index.php/Estimation_of_the_observed_Fisher_information_matrix Estimation of the observed Fisher information matrix 2013-06-17T13:30:45Z <p>Admin : /* Estimation using stochastic approximation */</p> <hr /> <div>==Estimation using stochastic approximation==<br /> <br /> The ''observed'' Fisher information matrix (F.I.M.) is a function of $\theta$ defined as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq_fim1&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> I(\theta) &amp;=&amp; -\DDt{\log ({\like}(\theta;\by))} \\<br /> &amp;=&amp; -\DDt{\log (\py(\by;\theta))} .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Due to the likelihood being quite complex, $I(\theta)$ usually has no closed form expression. It is however possible to estimate it using a stochastic approximation procedure based on &lt;balloon title=&quot;Kuhn05: put here the reference!!!&quot; style=&quot;color:#177245&quot;&gt;Louis' formula&lt;/balloon&gt;:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\DDt{\log (\pmacro(\by;\theta))} = \esp{\DDt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ;\theta} + \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta},<br /> &lt;/math&gt; }}<br /> <br /> where <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \cov{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta} &amp;=&amp;<br /> \esp{ \left(\Dt{\log (\pmacro(\by,\bpsi;\theta))} \right)\left(\Dt{\log (\pmacro(\by,\bpsi;\theta))}\right)^{\transpose} {{!}} \by ; \theta} \\<br /> &amp;&amp; - \esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}\esp{\Dt{\log (\pmacro(\by,\bpsi;\theta))} {{!}} \by ; \theta}^{\transpose} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus, $\DDt{\log (\pmacro(\by;\theta))}$ is defined as a combination of conditional expectations. Each of these conditional expectations can be estimated by Monte Carlo, or equivalently approximated using a stochastic approximation algorithm.<br /> <br /> We can then draw a sequence $(\psi_i^{(k)})$ using a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hasting algorithm]] and estimate the observed F.I.M. online. At iteration $k$ of the algorithm:<br /> <br /> <br /> * '''Simulation step''': for $i=1,2,\ldots,N$, draw $\psi_i^{(k)}$ from $m$ iterations of the Metropolis-Hastings algorithm described in [[The Metropolis-Hastings algorithm for simulating the individual parameters| The Metropolis-Hastings algorithm]] section with $\pmacro(\psi_i |y_i ;{\theta})$ as the limit distribution.<br /> <br /> <br /> * '''Stochastic approximation''': update $D_k$, $G_k$ and $\Delta_k$ according to the following recurrence relations:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \Delta_k &amp; = &amp; \Delta_{k-1} + \gamma_k \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - \Delta_{k-1} \right) \\<br /> D_k &amp; = &amp; D_{k-1} + \gamma_k \left(\DDt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))} - D_{k-1} \right)\\<br /> G_k &amp; = &amp; G_{k-1} + \gamma_k \left((\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})(\Dt{\log (\pmacro(\by,\bpsi^{(k)};{\theta}))})^\transpose -G_{k-1} \right),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> : where $(\gamma_k)$ is a decreasing sequence of positive numbers such that $\gamma_1=1$, $\sum_{k=1}^{\infty} \gamma_k = \infty$, and $\sum_{k=1}^{\infty} \gamma_k^2 &lt; \infty$.<br /> <br /> <br /> * '''Estimation step''': update the estimate $H_k$ of the F.I.M. according to<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;H_k = D_k + G_k - \Delta_k \Delta_k^{\transpose}. &lt;/math&gt; }} <br /> <br /> <br /> <br /> Implementing this algorithm therefore requires computation of the first and second derivatives of<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log (\pmacro(\by,\bpsi;\theta))=\sum_{i=1}^{N} \log (\pmacro(y_i,\psi_i;\theta)).&lt;/math&gt; }}<br /> <br /> Assume first that the joint distribution of $\by$ and $\bpsi$ decomposes as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_dec1&quot;&gt;&lt;math&gt;<br /> \pypsi(\by,\bpsi;\theta) = \pcypsi(\by {{!}} \bpsi)\ppsi(\bpsi;\theta).<br /> &lt;/math&gt;&lt;/div&gt; <br /> |reference=(2) }}<br /> <br /> This assumption means that for any $i=1,2,\ldots,N$, all of the components of $\psi_i$ are random and there exists a sufficient statistic ${\cal S}(\bpsi)$ for the estimation of $\theta$. It is then sufficient to compute the first and second derivatives of $\log (\pmacro(\bpsi;\theta))$ in order to estimate the F.I.M. This can be done relatively simply in closed form when the individual parameters are normally distributed (or a transformation $h$ of them is).<br /> <br /> If some component of $\psi_i$ has no variability, [[#eq:fim_dec1|(2)]] no longer holds, but we can decompose $\theta$ into $(\theta_y,\theta_\psi)$ such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pyipsii(y_i,\psi_i;\theta) = \pcyipsii(y_i {{!}} \psi_i ; \theta_y)\ppsii(\psi_i;\theta_\psi).<br /> &lt;/math&gt; }}<br /> <br /> We then need to compute the first and second derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ and $\log(\ppsii(\psi_i;\theta_\psi))$. Derivatives of $\log(\pcyipsii(y_i |\psi_i ; \theta_y))$ that do not have a closed form expression can be obtained using central differences.<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> 1. Using $\gamma_k=1/k$ for $k \geq 1$ means that each term is approximated with an empirical mean obtained from $(\bpsi^{(k)}, k \geq 1)$. For instance,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_Delta1&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \Delta_k<br /> &amp;=&amp; \Delta_{k-1} + \displaystyle{ \frac{1}{k} } \left(\Dt{\log (\pmacro(\by,\bpsi^{(k)};\theta))} - \Delta_{k-1} \right) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_Delta2&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> &amp;=&amp; \displaystyle{ \frac{1}{k} }\sum_{j=1}^{k} \Dt{\log (\pmacro(\by,\bpsi^{(j)};\theta))} . <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> [[#eq:fim_Delta1|(3)]] (resp. [[#eq:fim_Delta2|(4)]]) defines $\Delta_k$ using an online (resp. offline) algorithm. Writing $\Delta_k$ as in [[#eq:fim_Delta1|(3)]] instead of [[#eq:fim_Delta2|(4)]] avoids having to store all simulated sequences $(\bpsi^{(j)}, 1\leq j \leq k)$ when computing $\Delta_k$.<br /> <br /> <br /> 2. This approach is used for computing the F.I.M. $I(\hat{\theta})$ in practice, where $\hat{\theta}$ is the maximum likelihood estimate of $\theta$. The only difference with the [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings]] used for SAEM is that the population parameter $\theta$ is not updated and remains fixed at $\hat{\theta}$.<br /> }}<br /> <br /> <br /> {{OutlineText<br /> |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, a stochastic approximation algorithm for estimating the observed Fisher Information Matrix $I(\hat{\theta)}$ consists of:<br /> <br /> &lt;blockquote&gt;<br /> 1. For $i=1,2,\ldots,N$, run a [[The Metropolis-Hastings algorithm for simulating the individual parameters|Metropolis-Hastings algorithm]] to draw a sequence $\psi_i^{(k)}$ with limit distribution $\pmacro(\psi_i {{!}}y_i ;\hat{\theta})$.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 2. At iteration $k$ of the Metropolis-Hastings algorithm, compute the first and second derivatives of $\pypsi(\by,\bpsi^{(k)};\hat{\theta})$.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 3.Update $\Delta_k$, $G_k$, $D_k$ and compute an estimate $H_k$ of the F.I.M.<br /> &lt;/blockquote&gt;<br /> }} <br /> <br /> <br /> {{Example<br /> |title=Example 1<br /> |text=Consider the model<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_i {{!}} \psi_i &amp;\sim&amp; \pcyipsii(y_i {{!}} \psi_i) \\<br /> h(\psi_i) &amp;\sim_{i.i.d}&amp; {\cal N}( h(\psi_{\rm pop}) , \Omega),<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $\Omega = {\rm diag}(\omega_1^2,\omega_2^2,\ldots,\omega_d^2)$ is a diagonal matrix and $h(\psi_i)=(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots , h_d(\psi_{i,d}) )^{\transpose}$.<br /> The vector of population parameters is $\theta = (\psi_{\rm pop} , \Omega)=(\psi_{ {\rm pop},1},\ldots,\psi_{ {\rm pop},d},\omega_1^2,\ldots,\omega_d^2)$.<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \log (\pyipsii(y_i,\psi_i;\theta)) = \log (\pcyipsii(y_i {{!}} \psi_i)) + \log (\ppsii(\psi_i;\theta)).<br /> &lt;/math&gt; }}<br /> <br /> Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \Dt{\log (\pyipsii(y_i,\psi_i;\theta))} &amp;=&amp; \Dt{\log (\ppsii(\psi_i;\theta))} \\<br /> \DDt{\log (\pyipsii(y_i,\psi_i;\theta))} &amp;=&amp; \DDt{\log (\ppsii(\psi_i;\theta))} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> More precisely,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \log (\ppsii(\psi_i;\theta)) &amp;=&amp; -\displaystyle{\frac{d}{2} }\log(2\pi) + \sum_{\iparam=1}^d \log(h_\iparam^{\prime}(\psi_{i,\iparam}))<br /> -\displaystyle{ \frac{1}{2} } \sum_{\iparam=1}^d \log(\omega_\iparam^2)<br /> -\sum_{\iparam=1}^d \displaystyle{ \frac{1}{2\, \omega_\iparam^2} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\<br /> \partial \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} &amp;=&amp;<br /> \displaystyle{\frac{1}{\omega_\iparam^2} }h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) ) \\<br /> \partial \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} &amp;=&amp;<br /> -\displaystyle{ \frac{1}{2\omega_\iparam^2} }<br /> +\displaystyle{\frac{1}{2\, \omega_\iparam^4} }( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2 \\<br /> \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \psi_{ {\rm pop},\jparam} &amp;=&amp;<br /> \left\{<br /> \begin{array}{ll}<br /> &lt;!-- % \frac{1}{\omega_\iparam^2} --&gt;<br /> \left( h_\iparam^{\prime\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )- h_\iparam^{\prime \, 2}(\psi_{ {\rm pop},\iparam}) \right)/\omega_\iparam^2 &amp; {\rm if \quad } \iparam=\jparam \\<br /> 0 &amp; {\rm otherwise}<br /> \end{array}<br /> \right.<br /> \\<br /> \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \omega^2_{\iparam} \partial \omega^2_{\jparam} &amp;=&amp; \left\{<br /> \begin{array}{ll}<br /> &lt;!-- % \frac{1}{2\omega_\iparam^4} - \frac{1}{\omega_\iparam^6} --&gt;<br /> 1/(2\omega_\iparam^4) -<br /> ( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )^2/\omega_\iparam^6 &amp; {\rm if \quad} \iparam=\jparam \\<br /> 0 &amp; {\rm otherwise}<br /> \end{array}<br /> \right.<br /> \\<br /> \partial^2 \log (\ppsii(\psi_i;\theta))/\partial \psi_{ {\rm pop},\iparam} \partial \omega^2_{\jparam} &amp;=&amp; \left\{<br /> \begin{array}{ll}<br /> &lt;!-- % -\frac{1}{\omega_\iparam^4} --&gt;<br /> -h_\iparam^{\prime}(\psi_{ {\rm pop},\iparam})( h_\iparam(\psi_{i,\iparam}) - h_\iparam(\psi_{ {\rm pop},\iparam}) )/\omega_\iparam^4 &amp; {\rm if \quad} \iparam=\jparam \\<br /> 0 &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{Example<br /> |title=Example 2<br /> |text= We consider the same model for continuous data, assuming a constant error model and that the variance $a^2$ of the residual error has no variability:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} {{!}} \psi_i &amp;\sim&amp; {\cal N}(f(t_{ij}, \psi_i) \ , \ a^2), \ \ 1 \leq j \leq n_i \\<br /> h(\psi_i) &amp;\sim_{i.i.d}&amp; {\cal N}( h(\psi_{\rm pop}) , \Omega).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, $\theta_y=a^2$, $\theta_\psi=(\psi_{\rm pop},\Omega)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \log(\pyipsii(y_i,\psi_i;\theta)) = \log(\pcyipsii(y_i {{!}} \psi_i ; a^2)) + \log(\ppsii(\psi_i;\psi_{\rm pop},\Omega)),<br /> &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \log(\pcyipsii(y_i {{!}} \psi_i ; a^2))<br /> =-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i))^2 .<br /> &lt;/math&gt; }}<br /> <br /> Derivatives of $\log(\pcyipsii(y_i {{!}} \psi_i ; a^2))$ with respect to $a^2$ are straightforward to compute. Derivatives of $\log(\ppsii(\psi_i;\psi_{\rm pop},\Omega))$ with respect to $\psi_{\rm pop}$ and $\Omega$ remain unchanged.<br /> }}<br /> <br /> <br /> <br /> <br /> {{Example<br /> |title=Example 3<br /> |text= Consider again the same model for continuous data, assuming now that a subset $\xi$ of the parameters of the structural model has no variability:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} {{!}} \psi_i &amp;\sim&amp; {\cal N}(f(t_{ij}, \psi_i,\xi) \ , \ a^2), \ \ 1 \leq j \leq n_i \\<br /> h(\psi_i) &amp;\sim_{i.i.d}&amp; {\cal N}( h(\psi_{\rm pop}) , \Omega).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let $\psi$ remain as the subset of individual parameters with variability. Here, $\theta_y=(\xi,a^2)$, $\theta_\psi=(\psi_{\rm pop},\Omega)$, and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \log(\pcyipsii(y_i {{!}} \psi_i ; \xi,a^2))<br /> =-\displaystyle{\frac{n_i}{2} }\log(2\pi)- \displaystyle{\frac{n_i}{2} }\log(a^2) - \displaystyle{\frac{1}{2 a^2} }\sum_{j=1}^{n_i}(y_{ij} - f(t_{ij}, \psi_i,\xi))^2 .<br /> &lt;/math&gt; }}<br /> <br /> Derivatives of $\log(\pcyipsii(y_i {{!}} \psi_i ; \xi, a^2))$ with respect to $\xi$ require computation of the derivative of $f$ with respect to $\xi$. These derivatives are usually not calculable. One possibility is to numerically approximate them using finite differences.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Estimation using linearization of the model == <br /> <br /> Consider here a model for continuous data that uses a $\phi$-parametrization for the individual parameters:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;= &amp; f(t_{ij} , \phi_i) + g(t_{ij} , \phi_i)\teps_{ij} \\<br /> \phi_i &amp;=&amp; \phi_{\rm pop} + \eta_i .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Let $\hphi_i$ be some predicted value of $\phi_i$, such as for instance the estimated mean or estimated mode of the conditional distribution $\pmacro(\phi_i |y_i ; \hat{\theta})$.<br /> <br /> We can then choose to linearize the model for the observations $(y_{ij}, 1\leq j \leq n_i)$ of individual $i$ around the vector of predicted individual parameters. Let $\Dphi{f(t , \phi)}$ be the row vector of derivatives of $f(t , \phi)$ with respect to $\phi$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\simeq&amp; f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_i - \hphi_i) + g(t_{ij} , \hphi_i)\teps_{ij} \\<br /> &amp;\simeq&amp; f(t_{ij} , \hphi_i) + \Dphi{f(t_{ij} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i)<br /> + \Dphi{f(t_{ij} , \hphi_i)} \, \eta_i + g(t_{ij} , \hphi_i)\teps_{ij} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Then, we can approximate the marginal distribution of the vector $y_i$ as a normal distribution:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_approx&quot;&gt;&lt;math&gt;<br /> y_{i} \approx {\cal N}\left(f(t_{i} , \hphi_i) + \Dphi{f(t_{i} , \hphi_i)} \, (\phi_{\rm pop} - \hphi_i) ,<br /> \Dphi{f(t_{i} , \hphi_i)} \Omega \Dphi{f(t_{i} , \hphi_i)}^{\transpose} + g(t_{i} , \hphi_i)\Sigma_{n_i} g(t_{ij} , \hphi_i)^{\transpose} \right),<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $\Sigma_{n_i}$ is the variance-covariance matrix of $\teps_{i,1},\ldots,\teps_{i,n_i}$. If the $\teps_{ij}$ are i.i.d., then<br /> $\Sigma_{n_i}$ is the identity matrix.<br /> <br /> We can equivalently use the original $\psi$-parametrization and the fact that $\phi_i=h(\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \Dphi{f(t_{i} , \hphi_i)} = \Dpsi{f(t_{i} , \hpsi_i)} J_h(\hpsi_i)^{\transpose} , &lt;/math&gt; }}<br /> <br /> where $J_h$ is the Jacobian of $h$.<br /> <br /> We then can approximate the observed log-likelihood ${\llike}(\theta) = \log(\like(\theta;\by))=\sum_{i=1}^N \log(\pyi(y_i;\theta))$ using this normal approximation. We can also derive the F.I.M. by computing the matrix of second-order partial derivatives of ${\llike}(\theta)$.<br /> <br /> Except for very simple models, computing these second-order partial derivatives in closed form is not straightforward. In such cases, finite differences can be used for numerically approximating them. We can use for instance a central difference approximation of the second derivative of $\llike(\theta)$. To this end, let $\nu&gt;0$. For $j=1,2,\ldots, m$, let $\nu^{(j)}=(\nu^{(j)}_{k}, 1\leq k \leq m)$ be the $m$-vector such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \nu^{(j)}_{k} = \left\{<br /> \begin{array}{ll}<br /> \nu &amp; {\rm if \quad j= k} \\<br /> 0 &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> Then, for $\nu$ small enough,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \partial_{\theta_j}{ {\llike}(\theta)} &amp;\approx&amp; \displaystyle{ \frac{ {\llike}(\theta+\nu^{(j)})- {\llike}(\theta-\nu^{(j)})}{2\nu} } \\<br /> \end{eqnarray}&lt;/math&gt; }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:fim_diff&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> \partial^2_{\theta_j,\theta_k}{ {\llike}(\theta)} &amp;\approx&amp; \displaystyle{\frac{ {\llike}(\theta+\nu^{(j)}+\nu^{(k)})- {\llike}(\theta+\nu^{(j)}-\nu^{(k)})<br /> -{\llike}(\theta-\nu^{(j)}+\nu^{(k)})+{\llike}(\theta-\nu^{(j)}-\nu^{(k)})}{4\nu^2} } . <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{OutlineText<br /> |text=In summary, for a given estimate $\hat{\theta}$ of the population parameter $\theta$, the algorithm for approximating the Fisher Information Matrix $I(\hat{\theta)}$ using a linear approximation of the model consists of:<br /> <br /> &lt;blockquote&gt;<br /> 1. For $i=1,2,\ldots,N$, obtain some estimate $(\hpsi_i)$ of the individual parameters $(\psi_i)$ (we can average for example the final terms of the sequence $(\psi_i^{(k)})$ drawn during the final iterations of the [[The SAEM algorithm for estimating population parameters| SAEM algorithm]]).<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 2. For $i=1,2,\ldots,N$, compute $\hphi_i=h(\hpsi_i)$, the mean and the variance of the normal distribution defined in [[#eq:fim_approx|(5)]], and ${\llike}(\theta)$ using this normal approximation.<br /> &lt;/blockquote&gt;<br /> &lt;blockquote&gt;<br /> 3. Use [[#eq:fim_diff|(6)]] to approximate the matrix of second-order derivatives of ${\llike}(\theta)$.<br /> &lt;/blockquote&gt;<br /> }}<br /> <br /> {{Back&amp;Next<br /> |linkNext=Estimation of the log-likelihood<br /> |linkBack=The Metropolis-Hastings algorithm for simulating the individual parameters }}</div> Admin http://wiki.webpopix.org/index.php/Stochastic_differential_equations_based_models Stochastic differential equations based models 2013-06-07T14:03:09Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> Diffusion models are known to be a relevant tool for modeling stochastic dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control.<br /> In a population approach, a mixed-effects diffusion model describes each individual series of observations using a system of stochastic differential equations (SDE) while also taking into account variability between individuals.<br /> <br /> For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general dynamical system with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.<br /> <br /> Note that the conditional distribution $\qcypsi$ of the observations usually does not have a closed-form expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations, $\pcypsi(y_i|\psi_i)$ can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter ''Kalman filter'' (KF)]. When the system is not linear, the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter ''extended Kalman filter'' (EKF)] provides an approximation of the conditional pdf.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Diffusion model==<br /> <br /> <br /> We assume that one diffusion trajectory is observed with noise at discrete time points $t_1&lt;\ldots&lt;t_j&lt;\ldots&lt;t_n$. Let us note $(X(t),t&gt;0) \in \Rset^d$ the underlying dynamical process and $y_j \in \Rset$ a noisy function of $X(t_j)$, $j=1,\ldots,n$. The general form of the diffusion model is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:SDEmodel&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{lll}<br /> dX(t) &amp;=&amp; b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]<br /> y_{j} &amp;=&amp; c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]<br /> \varepsilon_{j} &amp;\underset{i.i.d.}{\sim}&amp; \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,<br /> \end{array}<br /> \right. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> with the initial condition $X(t_1) = x \in \Rset^d$. Here, $(W(t),t&gt;0)$ is a standard [http://en.wikipedia.org/wiki/Wiener_process Wiener process] in $\Rset^d$ and $\varepsilon_j \in \Rset$ represents the measurement error occurring at the $j^{\mathrm{th}}$ observation, independent of $W(t)$. The measurement function $c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function $b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$ and the diffusion function $\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where $\mathcal{M}_d(\Rset)$ is the set of $d \times d$ matrices with real elements, are known functions that depend on an unknown parameter $\psi \in \Rset^p$.<br /> <br /> We can in fact consider an SDE-based model as a ODE-based one with a stochastic component.<br /> <br /> <br /> {{Example1<br /> |title1=Example: <br /> |title2= &amp;#32; IV bolus with linear elimination<br /> <br /> |text= The ordinary differential equation <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ode1&quot;&gt;&lt;math&gt; <br /> dA_c(t) = -k A_c(t) dt<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> is usually used to describe the kinetics of a drug administered by rapid injection (IV bolus) into plasma. In bolus-specific compartmental models, plasma is treated as the single compartment of the human body. $A_c(t)$ represents the amount of a drug ingredient in plasma at time $t$ after injection, and $k$ is the elimination rate constant. The figure below displays the typical evolution of the amount found in the central compartment when $k=4$.<br /> <br /> {{ImageWithCaption|image=sde0.png|caption=Drug concentration evolution for ODE diffusion example }}<br /> <br /> <br /> Imagine now that we aim to describe the evolution of the drug amount over time by means of stochastic differential equations rather than ordinary differential equations, in order to better describe the ''intra-individual variability'' of the observed process. We can assume for example that the system [[#eq:ode1|(2)]] is randomly perturbed by an additive [http://en.wikipedia.org/wiki/Wiener_process Wiener process]:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sde1&quot;&gt;&lt;math&gt;<br /> dA_c(t) = -k A_c(t) dt + \gamma dW(t). <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> The figure below displays four kinetics for the amount in the central compartment, simulated from this model with $k=4$ and $\gamma=2$.<br /> <br /> <br /> {{ImageWithCaption|image=sde1.png|caption=Drug concentration evolution for SDE diffusion example }}<br /> <br /> }}<br /> <br /> <br /> These kinetics are clearly stochastic. Nevertheless, they are not realistic because:<br /> <br /> <br /> * they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.<br /> <br /> * they do not comply with certain constraints on biological dynamics (sign, monotony).<br /> <br /> <br /> A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate &quot;constant&quot; $k$ as a stochastic process $k(t)$ that randomly varies around a typical value $k^\star$.<br /> <br /> More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> ==Diffusion models for dynamical systems with linear transfers==<br /> <br /> <br /> Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:linearTransferODEModel&quot;&gt;&lt;math&gt;<br /> dA(t) = K\, A(t)dt,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> where $A(t)$ is a vector whose $l^{\textrm{th}}$ component represents the condition of the $l^{\textrm{th}}$ entity at time $t$ and $K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$ a deterministic matrix defined as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:K&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{ll}<br /> K_{l,l^\prime} = k_{l,l^\prime} &amp; \textrm{if} \quad l \neq l^\prime\\<br /> K_{l,l} = - k_{l,0} - \sum_{l^\prime} k_{l,l^\prime} ,<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $k_{l,l^\prime}$ represents the transfer rate from entity $l$ to entity $l^\prime$, and $k_{l,0}$ the elimination rate from entity $l$. An example of such a dynamical system with $3$ components is schematized below.<br /> <br /> <br /> {{ImageWithCaption|image=linear.png|caption=A dynamical system with $3$ components (circles) and linear transfers between components (arrows) }}<br /> <br /> <br /> In this particular example, matrix $K$ would be defined as<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;<br /> K = \begin{pmatrix}<br /> -k_{10} -k_{12} -k_{13} &amp; k_{21} &amp; k_{31}\\<br /> k_{12} &amp; -k_{20} -k_{21} -k_{23} &amp; k_{32}\\<br /> k_{13} &amp; k_{23} &amp; -k_{30} -k_{31} -k_{32}<br /> \end{pmatrix}.<br /> &lt;/math&gt; }}<br /> <br /> The model defined by equations [[#eq:linearTransferODEModel|(4)]] and [[#eq:K|(5)]] is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:linearTransferSDEModel&quot;&gt;&lt;math&gt;<br /> dA(t) = K(t)A(t)dt,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> where $K$ has the same structure as in [[#eq:K|(5)]] but now some components $k_{l,l^\prime}$ are stochastic processes which take non-negative values and randomly fluctuate around a typical value $k_{l,l^\prime}^\star$.<br /> <br /> Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.<br /> <br /> <br /> {{Example1<br /> |title1=Example 1: <br /> |title2= &amp;#32; IV bolus administration with stochastic linear elimination<br /> <br /> |text= We will first extend the ODE based model defined in [[#eq:ode1|(2)]] by assuming that $k$ is a diffusion process which takes non-negative values and fluctuates around a typical value $k^\star$.<br /> In this example, non-negativity of $k(t)$ is ensured by defining the logarithm of the transfer rate as an Ornstein-Uhlenbeck diffusion process:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; d\log k(t) = - \alpha \left( \log k(t) - \log k^\star \right) dt + \gamma d W(t), &lt;/math&gt; }}<br /> <br /> where $W$ is a standard one-dimensional [http://en.wikipedia.org/wiki/Wiener_process Wiener process]. This results in the following diffusion system:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; dX(t) = b(X(t))dt + \gamma(X(t))dW(t), &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> X(t) = \begin{pmatrix} A_c(t) \\ \log k(t) \end{pmatrix}, \ \ \ \<br /> b(x) = \begin{pmatrix} -x_1 \exp(x_2) \\ -\alpha (x_2-\log k^{\star}) \end{pmatrix}, \ \ \ \<br /> \gamma(x) = \begin{pmatrix} 0 &amp; 0 \\ 0 &amp; \gamma \end{pmatrix}.<br /> &lt;/math&gt; }}<br /> <br /> Note that in this specific example, the Jacobian matrix of the drift function $b$ has a simple form: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; B(x)=\begin{pmatrix} - \exp(x_2) &amp; -x_1 \exp(x_2)\\ 0 &amp; -\alpha \end{pmatrix}. &lt;/math&gt; }}<br /> <br /> The two figures below display four simulated processes $k(t)$ and the associated amount processes $A_c(t)$.<br /> <br /> <br /> ::[[File:sde2.png|link=]]<br /> <br /> :::[[File:sde3.png|link=]]<br /> <br /> <br /> We measure the concentration at times $(t_{j}, 1\leq j \leq n)$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;y_j = \displaystyle{\frac{A_c(t_{j})}{V} } + a \, \teps_j . &lt;/math&gt; }}<br /> <br /> The parameter vector of the model is therefore $\psi = (V, k^\star, \alpha, \gamma, a)$. We see in this example that the simulated kinetics are much more realistic than those obtained with the previous model, because:<br /> <br /> <br /> * the elimination rate process $k(t)$ is a stochastic process that takes non-negative values,<br /> <br /> * even though the amount process is stochastic, it is smooth and decreases monotonically with time.<br /> }}<br /> <br /> <br /> <br /> {{Example1<br /> |title1=Example 2: <br /> |title2= &amp;#32; Oral administration with first-order absorption and stochastic linear elimination<br /> <br /> |text=Oral PK models with first-order absorption and linear elimination are widely used to describe the time-course of a drug orally administered to a unique compartment of the human body. The drug is administrated in a depot compartment, absorbed by the central compartment with absorption rate $k_a$ and eliminated with elimination rate $k_e$. Such a model is described by the following system of ODEs:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:oral1&quot;&gt;&lt;math&gt;<br /> \displaystyle{ \frac{d}{dt} } \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix} \ \ = \ \ \begin{pmatrix} -k_a &amp; 0\\ k_a &amp; -k_e\end{pmatrix} \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix},<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where $A_d(t)$ and $A_c(t)$ respectively represent the amounts of drug at time $t$ in the depot and central compartments. Assume now that the elimination constant is driven by a stochastic process, solution to the stochastic differential equation<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; d k_e(t) = - \alpha (k_e - k_e^\star ) dt + \gamma \sqrt{k_e(t)} dW(t),<br /> &lt;/math&gt; }}<br /> <br /> where $W$ is a standard one-dimensional [http://en.wikipedia.org/wiki/Wiener_process Wiener process]. Then [[#eq:oral1|(7)]] becomes:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; dX(t) = b(X(t))dt + \gamma(X(t))dW(t). &lt;/math&gt; }}<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> X(t)= \begin{pmatrix} A_d(t) \\ A_c(t) \\ k_e(t) \end{pmatrix}, \ \ \ \<br /> b(x) = \begin{pmatrix} -k_a x_1 \\ k_a x_1 -x_3 x_2 \\ -\alpha(x_3-k_e^\star ) \end{pmatrix}, \ \ \ \<br /> \gamma(x) = \begin{pmatrix} 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0\\ 0 &amp; 0 &amp; \gamma \sqrt{x_3}\end{pmatrix} ,<br /> &lt;/math&gt; }}<br /> <br /> and the parameter vector of the model is $\psi = (V, k_a, k^\star, \alpha, \gamma, a) .$<br /> }}<br /> <br /> In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters $\psi$ as an individual random vector.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed-effects diffusion models==<br /> <br /> Let us now consider model [[#eq:SDEmodel|(1)]] with observations coming from several subjects. An adequate adaptation of model [[#eq:SDEmodel|(1)]] in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider $N$ different subjects randomly chosen from a population and note $n_i$ the number of observations for individual $i$, so that $t_{i1}&lt;\ldots&lt;t_{i,n_i}$ are subject $i$'s observation time points. $(X_i(t),t&gt;0) \in \Rset^d$ and $y_{ij} \in \Rset$ will respectively denote individual $i$'s diffusion and the observation $X_i(t_{ij})$. The $y_{ij}$, $i=1,\ldots,N$, $j=1,\ldots,n_i$ are governed by a mixed-effects model based on a $d$-dimensional real-valued system of stochastic differential equations with the general form:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:SDEmixedModel&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{l}<br /> dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]<br /> y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]<br /> \teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> with initial condition $X_i(t_1) = x_{i1} \in \Rset^d$ for $i=1,\ldots,N$. The $\psi_i$'s are unobserved independent $d$-dimensional random subject-specific parameters, drawn from a distribution $\qpsi$ which depends on a set of population parameters $\theta$, $(W_1(t),t&gt;0), \ldots, (W_N(t),t&gt;0)$ are standard independent [http://en.wikipedia.org/wiki/Wiener_process Wiener processes], and the $\teps_{ij}$ are independent Gaussian random variables representing residual errors such that the $\psi_i$, $W_i$ and $\teps_{ij}$ are mutually independent.<br /> The measurement function $c$, the drift function $b$ and the diffusion function $\gamma$ are known functions that are common to the $N$ subjects and depend on the unknown parameters $\psi_i$.<br /> <br /> Assuming that the $N$ individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Computing the conditional distribution $\pcyipsii$ of the observations for any individual $i$ requires here to compute the conditional distribution of each observation given the past:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \pyipsiONE(y_{i1} {{!}} \psi_i)\prod_{j=2}^{n_i} p(y_{i,j} {{!}} y_{i,1},\ldots,y_{i,j-1} {{!}} \psi_i) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Except in some very specific classes of mixed-effects diffusion models, the transition density $\pmacro(y_{i,j}|y_{i,1},\ldots,y_{i,j-1} | \psi_i)$ does not have a closed-form expression since it involves the transition densities of the underlying diffusion processes $X_i$.<br /> When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the Kalman filter. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the extended Kalman filter for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{delattre2013sii,<br /> title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Statistics and Its Interface},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Ditlevsen2005,<br /> title = {Mixed Effects in Stochastic Differential Equation Models},<br /> author = {Ditlevsen, S. and De Gaetano, A.},<br /> journal = {REVSTAT Statistical Journal},<br /> volume = {3},<br /> year = {2005},<br /> pages = {137-153}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Donnet2008,<br /> title = {Parametric Inference for Mixed Models Defined by Stochastic Differential Equations},<br /> author = {Donnet, S. and Samson, A.},<br /> journal = {ESAIM: Probability and Statistics},<br /> volume = {12},<br /> year = {2008},<br /> pages = {196-218}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @inproceedings{doucet2011tutorial,<br /> title={A tutorial on particle filtering and smoothing: Fifteen years later},<br /> author={Doucet, A. and Johansen, A. M.},<br /> booktitle={Oxford Handbook of Nonlinear Filtering},<br /> year={2011},<br /> organization={Citeseer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Klim2009,<br /> author = {Klim, S. and Mortensen, S. B. and Kristensen, N. R. and Overgaard, R. V. and Madsen, H.},<br /> title = {Population stochastic modelling (PSM)-an R package for mixed-effects models based on stochastic differential equations},<br /> journal = {Computer methods and programs in biomedicine},<br /> volume = {94},<br /> pages = {279-289},<br /> year = {2009}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Kristensen2005,<br /> title = {Using Stochastic Differential Equations for PK/PD Model Development},<br /> author = {Kristensen, N. R. and Madsen, H. and Ingwersen, S. H.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {32},<br /> year = {2005},<br /> pages = {109-141}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Mazzoni2008,<br /> title = {Computational aspects of continuous-discrete extended Kalman-filtering},<br /> author = {Mazzoni, T.},<br /> journal = {Computational Statistics},<br /> volume = {23},<br /> year = {2008},<br /> pages = {519-39}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{PSM,<br /> title = {Population Stochastic Modelling (PSM): Model definition, description and examples},<br /> author = {Mortensen, S. and Klim, S.}, <br /> year = {2008},<br /> url = {http://www2.imm.dtu.dk/projects/psm/},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Mortensen2007,<br /> title = {A Matlab framework for estimation of NLME models using stochastic differential equations - Applications for estimation of insulin secretion rates},<br /> author = {Mortensen, S. B. and Klim, S. and Dammann, B. and Kristensen, N. R. and Madsen, H. and Overgaard, R. V.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {34},<br /> year = {2007},<br /> pages = {623-642}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Overgaard2005,<br /> title = {Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm},<br /> author = {Overgaard, R. V. and Jonsson, N. and Torn&amp;oslash;e, C. W. and Madsen, H.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {32},<br /> year = {2005},<br /> pages = {85-107}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Picchini2010,<br /> title = {Stochastic Differential Mixed-Effects Models},<br /> author = {Picchini, U. and De Gaetano, A. and Ditlevsen, S.},<br /> journal = {Scandinavian Journal of Statistics},<br /> volume = {37},<br /> year = {2010},<br /> pages = {67-90}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Picchini2011,<br /> title = {Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models},<br /> author = {Picchini, U. and Ditlevsen, S.},<br /> journal = {Computational Statistics and Data Analysis},<br /> volume = {55},<br /> number = {3},<br /> year = {2011},<br /> pages = {1426-1444}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Tornoe2005,<br /> title = {Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison with Ordinary Differential Equations},<br /> author = {Torn&amp;oslash;e, C. W. and Overgaard, R. V. and Agers&amp;oslash;, H. and Nielsen, H. A. and Madsen, H. and Jonsson, E. N.},<br /> journal = {Pharmaceutical Research},<br /> volume = {22},<br /> year = {2005},<br /> pages = {1247-1258}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back<br /> |link=Hidden Markov models }}</div> Admin http://wiki.webpopix.org/index.php/Stochastic_differential_equations_based_models Stochastic differential equations based models 2013-06-07T14:00:49Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> Diffusion models are known to be a relevant tool for modeling stochastic dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control.<br /> In a population approach, a mixed-effects diffusion model describes each individual series of observations using a system of stochastic differential equations (SDE) while also taking into account variability between individuals.<br /> <br /> For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general dynamical system with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.<br /> <br /> Note that the conditional distribution $\qcypsi$ of the observations usually does not have a closed-form expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations, $\pcypsi(y_i|\psi_i)$ can be computed using the [http://en.wikipedia.org/wiki/Kalman_filter ''Kalman filter'' (KF)]. When the system is not linear, the [http://en.wikipedia.org/wiki/Extended_Kalman_Filter ''extended Kalman filter'' (EKF)] provides an approximation of the conditional pdf.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Diffusion model==<br /> <br /> <br /> We assume that one diffusion trajectory is observed with noise at discrete time points $t_1&lt;\ldots&lt;t_j&lt;\ldots&lt;t_n$. Let us note $(X(t),t&gt;0) \in \Rset^d$ the underlying dynamical process and $y_j \in \Rset$ a noisy function of $X(t_j)$, $j=1,\ldots,n$. The general form of the diffusion model is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:SDEmodel&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{lll}<br /> dX(t) &amp;=&amp; b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]<br /> y_{j} &amp;=&amp; c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]<br /> \varepsilon_{j} &amp;\underset{i.i.d.}{\sim}&amp; \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,<br /> \end{array}<br /> \right. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> with the initial condition $X(t_1) = x \in \Rset^d$. Here, $(W(t),t&gt;0)$ is a standard Wiener process in $\Rset^d$ and $\varepsilon_j \in \Rset$ represents the measurement error occurring at the $j^{\mathrm{th}}$ observation, independent of $W(t)$. The measurement function $c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function $b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$ and the diffusion function $\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where $\mathcal{M}_d(\Rset)$ is the set of $d \times d$ matrices with real elements, are known functions that depend on an unknown parameter $\psi \in \Rset^p$.<br /> <br /> We can in fact consider an SDE-based model as a ODE-based one with a stochastic component.<br /> <br /> <br /> {{Example1<br /> |title1=Example: <br /> |title2= &amp;#32; IV bolus with linear elimination<br /> <br /> |text= The ordinary differential equation <br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:ode1&quot;&gt;&lt;math&gt; <br /> dA_c(t) = -k A_c(t) dt<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> is usually used to describe the kinetics of a drug administered by rapid injection (IV bolus) into plasma. In bolus-specific compartmental models, plasma is treated as the single compartment of the human body. $A_c(t)$ represents the amount of a drug ingredient in plasma at time $t$ after injection, and $k$ is the elimination rate constant. The figure below displays the typical evolution of the amount found in the central compartment when $k=4$.<br /> <br /> {{ImageWithCaption|image=sde0.png|caption=Drug concentration evolution for ODE diffusion example }}<br /> <br /> <br /> Imagine now that we aim to describe the evolution of the drug amount over time by means of stochastic differential equations rather than ordinary differential equations, in order to better describe the ''intra-individual variability'' of the observed process. We can assume for example that the system [[#eq:ode1|(2)]] is randomly perturbed by an additive Wiener process:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sde1&quot;&gt;&lt;math&gt;<br /> dA_c(t) = -k A_c(t) dt + \gamma dW(t). <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> The figure below displays four kinetics for the amount in the central compartment, simulated from this model with $k=4$ and $\gamma=2$.<br /> <br /> <br /> {{ImageWithCaption|image=sde1.png|caption=Drug concentration evolution for SDE diffusion example }}<br /> <br /> }}<br /> <br /> <br /> These kinetics are clearly stochastic. Nevertheless, they are not realistic because:<br /> <br /> <br /> * they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.<br /> <br /> * they do not comply with certain constraints on biological dynamics (sign, monotony).<br /> <br /> <br /> A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate &quot;constant&quot; $k$ as a stochastic process $k(t)$ that randomly varies around a typical value $k^\star$.<br /> <br /> More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> ==Diffusion models for dynamical systems with linear transfers==<br /> <br /> <br /> Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:linearTransferODEModel&quot;&gt;&lt;math&gt;<br /> dA(t) = K\, A(t)dt,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> <br /> where $A(t)$ is a vector whose $l^{\textrm{th}}$ component represents the condition of the $l^{\textrm{th}}$ entity at time $t$ and $K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$ a deterministic matrix defined as:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:K&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{ll}<br /> K_{l,l^\prime} = k_{l,l^\prime} &amp; \textrm{if} \quad l \neq l^\prime\\<br /> K_{l,l} = - k_{l,0} - \sum_{l^\prime} k_{l,l^\prime} ,<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $k_{l,l^\prime}$ represents the transfer rate from entity $l$ to entity $l^\prime$, and $k_{l,0}$ the elimination rate from entity $l$. An example of such a dynamical system with $3$ components is schematized below.<br /> <br /> <br /> {{ImageWithCaption|image=linear.png|caption=A dynamical system with $3$ components (circles) and linear transfers between components (arrows) }}<br /> <br /> <br /> In this particular example, matrix $K$ would be defined as<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;<br /> K = \begin{pmatrix}<br /> -k_{10} -k_{12} -k_{13} &amp; k_{21} &amp; k_{31}\\<br /> k_{12} &amp; -k_{20} -k_{21} -k_{23} &amp; k_{32}\\<br /> k_{13} &amp; k_{23} &amp; -k_{30} -k_{31} -k_{32}<br /> \end{pmatrix}.<br /> &lt;/math&gt; }}<br /> <br /> The model defined by equations [[#eq:linearTransferODEModel|(4)]] and [[#eq:K|(5)]] is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:linearTransferSDEModel&quot;&gt;&lt;math&gt;<br /> dA(t) = K(t)A(t)dt,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> where $K$ has the same structure as in [[#eq:K|(5)]] but now some components $k_{l,l^\prime}$ are stochastic processes which take non-negative values and randomly fluctuate around a typical value $k_{l,l^\prime}^\star$.<br /> <br /> Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.<br /> <br /> <br /> {{Example1<br /> |title1=Example 1: <br /> |title2= &amp;#32; IV bolus administration with stochastic linear elimination<br /> <br /> |text= We will first extend the ODE based model defined in [[#eq:ode1|(2)]] by assuming that $k$ is a diffusion process which takes non-negative values and fluctuates around a typical value $k^\star$.<br /> In this example, non-negativity of $k(t)$ is ensured by defining the logarithm of the transfer rate as an Ornstein-Uhlenbeck diffusion process:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; d\log k(t) = - \alpha \left( \log k(t) - \log k^\star \right) dt + \gamma d W(t), &lt;/math&gt; }}<br /> <br /> where $W$ is a standard one-dimensional Wiener process. This results in the following diffusion system:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; dX(t) = b(X(t))dt + \gamma(X(t))dW(t), &lt;/math&gt; }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> X(t) = \begin{pmatrix} A_c(t) \\ \log k(t) \end{pmatrix}, \ \ \ \<br /> b(x) = \begin{pmatrix} -x_1 \exp(x_2) \\ -\alpha (x_2-\log k^{\star}) \end{pmatrix}, \ \ \ \<br /> \gamma(x) = \begin{pmatrix} 0 &amp; 0 \\ 0 &amp; \gamma \end{pmatrix}.<br /> &lt;/math&gt; }}<br /> <br /> Note that in this specific example, the Jacobian matrix of the drift function $b$ has a simple form: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; B(x)=\begin{pmatrix} - \exp(x_2) &amp; -x_1 \exp(x_2)\\ 0 &amp; -\alpha \end{pmatrix}. &lt;/math&gt; }}<br /> <br /> The two figures below display four simulated processes $k(t)$ and the associated amount processes $A_c(t)$.<br /> <br /> <br /> ::[[File:sde2.png|link=]]<br /> <br /> :::[[File:sde3.png|link=]]<br /> <br /> <br /> We measure the concentration at times $(t_{j}, 1\leq j \leq n)$:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;y_j = \displaystyle{\frac{A_c(t_{j})}{V} } + a \, \teps_j . &lt;/math&gt; }}<br /> <br /> The parameter vector of the model is therefore $\psi = (V, k^\star, \alpha, \gamma, a)$. We see in this example that the simulated kinetics are much more realistic than those obtained with the previous model, because:<br /> <br /> <br /> * the elimination rate process $k(t)$ is a stochastic process that takes non-negative values,<br /> <br /> * even though the amount process is stochastic, it is smooth and decreases monotonically with time.<br /> }}<br /> <br /> <br /> <br /> {{Example1<br /> |title1=Example 2: <br /> |title2= &amp;#32; Oral administration with first-order absorption and stochastic linear elimination<br /> <br /> |text=Oral PK models with first-order absorption and linear elimination are widely used to describe the time-course of a drug orally administered to a unique compartment of the human body. The drug is administrated in a depot compartment, absorbed by the central compartment with absorption rate $k_a$ and eliminated with elimination rate $k_e$. Such a model is described by the following system of ODEs:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:oral1&quot;&gt;&lt;math&gt;<br /> \displaystyle{ \frac{d}{dt} } \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix} \ \ = \ \ \begin{pmatrix} -k_a &amp; 0\\ k_a &amp; -k_e\end{pmatrix} \begin{pmatrix} A_d(t) \\ A_c(t) \end{pmatrix},<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where $A_d(t)$ and $A_c(t)$ respectively represent the amounts of drug at time $t$ in the depot and central compartments. Assume now that the elimination constant is driven by a stochastic process, solution to the stochastic differential equation<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; d k_e(t) = - \alpha (k_e - k_e^\star ) dt + \gamma \sqrt{k_e(t)} dW(t),<br /> &lt;/math&gt; }}<br /> <br /> where $W$ is a standard one-dimensional Wiener process. Then [[#eq:oral1|(7)]] becomes:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; dX(t) = b(X(t))dt + \gamma(X(t))dW(t). &lt;/math&gt; }}<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> X(t)= \begin{pmatrix} A_d(t) \\ A_c(t) \\ k_e(t) \end{pmatrix}, \ \ \ \<br /> b(x) = \begin{pmatrix} -k_a x_1 \\ k_a x_1 -x_3 x_2 \\ -\alpha(x_3-k_e^\star ) \end{pmatrix}, \ \ \ \<br /> \gamma(x) = \begin{pmatrix} 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0\\ 0 &amp; 0 &amp; \gamma \sqrt{x_3}\end{pmatrix} ,<br /> &lt;/math&gt; }}<br /> <br /> and the parameter vector of the model is $\psi = (V, k_a, k^\star, \alpha, \gamma, a) .$<br /> }}<br /> <br /> In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters $\psi$ as an individual random vector.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed-effects diffusion models==<br /> <br /> Let us now consider model [[#eq:SDEmodel|(1)]] with observations coming from several subjects. An adequate adaptation of model [[#eq:SDEmodel|(1)]] in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider $N$ different subjects randomly chosen from a population and note $n_i$ the number of observations for individual $i$, so that $t_{i1}&lt;\ldots&lt;t_{i,n_i}$ are subject $i$'s observation time points. $(X_i(t),t&gt;0) \in \Rset^d$ and $y_{ij} \in \Rset$ will respectively denote individual $i$'s diffusion and the observation $X_i(t_{ij})$. The $y_{ij}$, $i=1,\ldots,N$, $j=1,\ldots,n_i$ are governed by a mixed-effects model based on a $d$-dimensional real-valued system of stochastic differential equations with the general form:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:SDEmixedModel&quot;&gt;&lt;math&gt;<br /> \left\{<br /> \begin{array}{l}<br /> dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]<br /> y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]<br /> \teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> with initial condition $X_i(t_1) = x_{i1} \in \Rset^d$ for $i=1,\ldots,N$. The $\psi_i$'s are unobserved independent $d$-dimensional random subject-specific parameters, drawn from a distribution $\qpsi$ which depends on a set of population parameters $\theta$, $(W_1(t),t&gt;0), \ldots, (W_N(t),t&gt;0)$ are standard independent Wiener processes, and the $\teps_{ij}$ are independent Gaussian random variables representing residual errors such that the $\psi_i$, $W_i$ and $\teps_{ij}$ are mutually independent.<br /> The measurement function $c$, the drift function $b$ and the diffusion function $\gamma$ are known functions that are common to the $N$ subjects and depend on the unknown parameters $\psi_i$.<br /> <br /> Assuming that the $N$ individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> Computing the conditional distribution $\pcyipsii$ of the observations for any individual $i$ requires here to compute the conditional distribution of each observation given the past:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \pyipsiONE(y_{i1} {{!}} \psi_i)\prod_{j=2}^{n_i} p(y_{i,j} {{!}} y_{i,1},\ldots,y_{i,j-1} {{!}} \psi_i) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Except in some very specific classes of mixed-effects diffusion models, the transition density $\pmacro(y_{i,j}|y_{i,1},\ldots,y_{i,j-1} | \psi_i)$ does not have a closed-form expression since it involves the transition densities of the underlying diffusion processes $X_i$.<br /> When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the Kalman filter. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the extended Kalman filter for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{delattre2013sii,<br /> title={Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixed-effects diffusion models},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Statistics and Its Interface},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Ditlevsen2005,<br /> title = {Mixed Effects in Stochastic Differential Equation Models},<br /> author = {Ditlevsen, S. and De Gaetano, A.},<br /> journal = {REVSTAT Statistical Journal},<br /> volume = {3},<br /> year = {2005},<br /> pages = {137-153}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Donnet2008,<br /> title = {Parametric Inference for Mixed Models Defined by Stochastic Differential Equations},<br /> author = {Donnet, S. and Samson, A.},<br /> journal = {ESAIM: Probability and Statistics},<br /> volume = {12},<br /> year = {2008},<br /> pages = {196-218}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @inproceedings{doucet2011tutorial,<br /> title={A tutorial on particle filtering and smoothing: Fifteen years later},<br /> author={Doucet, A. and Johansen, A. M.},<br /> booktitle={Oxford Handbook of Nonlinear Filtering},<br /> year={2011},<br /> organization={Citeseer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Klim2009,<br /> author = {Klim, S. and Mortensen, S. B. and Kristensen, N. R. and Overgaard, R. V. and Madsen, H.},<br /> title = {Population stochastic modelling (PSM)-an R package for mixed-effects models based on stochastic differential equations},<br /> journal = {Computer methods and programs in biomedicine},<br /> volume = {94},<br /> pages = {279-289},<br /> year = {2009}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Kristensen2005,<br /> title = {Using Stochastic Differential Equations for PK/PD Model Development},<br /> author = {Kristensen, N. R. and Madsen, H. and Ingwersen, S. H.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {32},<br /> year = {2005},<br /> pages = {109-141}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Mazzoni2008,<br /> title = {Computational aspects of continuous-discrete extended Kalman-filtering},<br /> author = {Mazzoni, T.},<br /> journal = {Computational Statistics},<br /> volume = {23},<br /> year = {2008},<br /> pages = {519-39}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{PSM,<br /> title = {Population Stochastic Modelling (PSM): Model definition, description and examples},<br /> author = {Mortensen, S. and Klim, S.}, <br /> year = {2008},<br /> url = {http://www2.imm.dtu.dk/projects/psm/},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Mortensen2007,<br /> title = {A Matlab framework for estimation of NLME models using stochastic differential equations - Applications for estimation of insulin secretion rates},<br /> author = {Mortensen, S. B. and Klim, S. and Dammann, B. and Kristensen, N. R. and Madsen, H. and Overgaard, R. V.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {34},<br /> year = {2007},<br /> pages = {623-642}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Overgaard2005,<br /> title = {Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm},<br /> author = {Overgaard, R. V. and Jonsson, N. and Torn&amp;oslash;e, C. W. and Madsen, H.},<br /> journal = {Journal of Pharmacokinetics and Pharmacodynamics},<br /> volume = {32},<br /> year = {2005},<br /> pages = {85-107}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Picchini2010,<br /> title = {Stochastic Differential Mixed-Effects Models},<br /> author = {Picchini, U. and De Gaetano, A. and Ditlevsen, S.},<br /> journal = {Scandinavian Journal of Statistics},<br /> volume = {37},<br /> year = {2010},<br /> pages = {67-90}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Picchini2011,<br /> title = {Practical Estimation of High Dimensional Stochastic Differential Mixed-Effects Models},<br /> author = {Picchini, U. and Ditlevsen, S.},<br /> journal = {Computational Statistics and Data Analysis},<br /> volume = {55},<br /> number = {3},<br /> year = {2011},<br /> pages = {1426-1444}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @Article{Tornoe2005,<br /> title = {Stochastic Differential Equations in NONMEM: Implementation, Application, and Comparison with Ordinary Differential Equations},<br /> author = {Torn&amp;oslash;e, C. W. and Overgaard, R. V. and Agers&amp;oslash;, H. and Nielsen, H. A. and Madsen, H. and Jonsson, E. N.},<br /> journal = {Pharmaceutical Research},<br /> volume = {22},<br /> year = {2005},<br /> pages = {1247-1258}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back<br /> |link=Hidden Markov models }}</div> Admin http://wiki.webpopix.org/index.php/Hidden_Markov_models Hidden Markov models 2013-06-07T13:58:54Z <p>Admin : /* Distributions of observations */</p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> Markov chains are a useful tool for analyzing categorical longitudinal data. However, sometimes the Markov process cannot be directly observed, though some output, dependent on the<br /> (hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br /> HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br /> <br /> Here, we will consider a parametric framework with [http://en.wikipedia.org/wiki/Markov_chain Markov chains] in a discrete and finite state space $\mathbf{K} = \{1,\ldots,K\}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed hidden Markov models==<br /> <br /> <br /> HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes $(z_j,y_j, j=1,2,\ldots)$, where the latent sequence $(z_j)$ is a Markov chain and where the distribution of the observation $y_j$ at time $t_j$ depends on the state $z_j$.<br /> <br /> <br /> {{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br /> <br /> <br /> In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br /> Let $y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$ and $z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$ denote respectively the sequences of observations and hidden states for individual $i$.<br /> <br /> We suppose that the joint distribution of $(z_i,y_i)$ is a parametric distribution that depends on a vector of parameters $\psi_i$ and can be decomposed as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:hmm1&quot;&gt;&lt;math&gt;<br /> \pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> For each individual $i$, $z_i$ is a Markov chain whose probability distribution is defined by<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$ of the first state $z_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that $q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br /> <br /> <br /> The conditional distribution $\qcyizpsii$ depends on the model for the observations: for each state, observation $y_{ij}$ has a certain distribution. Let us see some examples:<br /> <br /> <br /> &lt;br&gt;<br /> === Examples ===<br /> <br /> <br /> 1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between $K$ possible residual error models.<br /> <br /> <br /> {{Example<br /> |title=Example 1<br /> |text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br /> <br /> <br /> ::[[File:hmm1bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> 2. In a Poisson model for count data, the Poisson parameter might randomly switch between $K$ intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br /> <br /> <br /> {{Example<br /> |title=Example 2<br /> |text= Instead of assuming a single Poisson distribution for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> :: [[File:hmm2bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Distributions of observations==<br /> <br /> <br /> Assuming that the $N$ individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Then, computing the conditional distribution of the observations $\qcyipsii$ for any individual $i$ requires integration of the joint conditional distribution $\qcyzipsii$ over the states:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Though this looks complicated, it turns out that forward recursion of the [http://en.wikipedia.org/wiki/Baum-Welch_algorithm Baum-Welch algorithm] provides a quick way to numerically compute it.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{Albert1991,<br /> title = &quot;A two state Markov mixture model for a time series of epileptic seizure counts&quot;,<br /> author = &quot;Albert, P. S.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;47&quot;,<br /> year = &quot;1991&quot;,<br /> pages = &quot;1371-1381&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Altman2007,<br /> title = &quot;Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting&quot;,<br /> author = &quot;Altman, R. M.&quot;,<br /> journal = &quot;Journal of the American Statistical Association&quot;,<br /> volume = &quot;102&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;201-210&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Anisimov2007,<br /> title = &quot;Analysis of responses in migraine modelling using hidden Markov models&quot;,<br /> author = &quot;Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.&quot;,<br /> journal = &quot;Statistics in Medicine&quot;,<br /> volume = &quot;26&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;4163-4178&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{Cappe2005,<br /> author = &quot;Capp&amp;eacute;e, O. and Moulines, E. and Ryd&amp;eacute;en, T.&quot;,<br /> title = &quot;Inference in hidden Markov models&quot;,<br /> year = &quot;2005&quot;,<br /> publisher= &quot;Springer Series in Statistics&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{ChaubertPereira2011,<br /> title = &quot;Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br /> Forest Tree Growth Components&quot;,<br /> author = &quot;Chaubert-Pereira, F. and Gu&amp;eacute;don, Y. and Lavergne, C. and Trottier, C.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;66&quot;,<br /> year = &quot;2011&quot;,<br /> pages = &quot;753-762&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012analysis,<br /> title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br /> author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> pages={1-9},<br /> year={2012},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Maruotti2009,<br /> title = &quot;A semiparametric approach to hidden Markov models under longitudinal<br /> observations&quot;,<br /> author = &quot;Maruotti, A. and Ryd&amp;eacute;en, T.&quot;,<br /> journal = &quot;Statistics and Computing&quot;,<br /> volume = &quot;19&quot;,<br /> year = &quot;2009&quot;,<br /> pages = &quot;381-393&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rabiner1989,<br /> title = &quot;A tutorial on Hidden Markov Models and selected applications in speech recognition&quot;,<br /> author = &quot;Rabiner, L. R.&quot;,<br /> journal = &quot;Proceedings of the IEEE&quot;,<br /> volume = &quot;77&quot;,<br /> year = &quot;1989&quot;,<br /> pages = &quot;257-286&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rijmen2008,<br /> title = &quot;Qualitative longitudinal analysis of symptoms in patients with primary<br /> and metastatic brain tumours&quot;,<br /> author = &quot;Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.&quot;,<br /> journal = &quot;Journal of the Royal Statistical Society - Series A.&quot;,<br /> volume = &quot;171, Part 3&quot;,<br /> year = &quot;2008&quot;,<br /> pages = &quot;739-753&quot;}<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Mixture models<br /> |linkNext= Stochastic differential equations based models }}</div> Admin http://wiki.webpopix.org/index.php/Hidden_Markov_models Hidden Markov models 2013-06-07T13:58:05Z <p>Admin : /* Introduction */</p> <hr /> <div>&lt;!-- Menu for the Extensions chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Extensions]]<br /> *[[Extensions| Introduction ]] | [[ Mixture models ]] | [[Hidden Markov models]] | [[Stochastic differential equations based models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> <br /> Markov chains are a useful tool for analyzing categorical longitudinal data. However, sometimes the Markov process cannot be directly observed, though some output, dependent on the<br /> (hidden) state, is visible. More precisely, we assume that the distribution of this observable output depends on the underlying hidden state. Such models are called hidden Markov models (HMMs).<br /> HMMs can be applied in many contexts and have turned out to be particularly pertinent in several biological contexts. For example, they are useful when characterizing diseases for which the existence of several discrete stages of illness is a realistic assumption, e.g., epilepsy and migraines.<br /> <br /> Here, we will consider a parametric framework with [http://en.wikipedia.org/wiki/Markov_chain Markov chains] in a discrete and finite state space $\mathbf{K} = \{1,\ldots,K\}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Mixed hidden Markov models==<br /> <br /> <br /> HMMs have been developed to describe how a given system moves from one state to another over time, in situations where the successive visited states are unknown and a set of observations is the only available information to describe the dynamics of the system. HMMs can be seen as a variant of mixture models that allow for possible memory in the sequence of hidden states. An HMM is thus defined as a pair of processes $(z_j,y_j, j=1,2,\ldots)$, where the latent sequence $(z_j)$ is a Markov chain and where the distribution of the observation $y_j$ at time $t_j$ depends on the state $z_j$.<br /> <br /> <br /> {{ImageWithCaption|image=hmm0.png|caption=Dynamics of a hidden Markov model}}<br /> <br /> <br /> In a population approach, HMMs from several individuals can be described simultaneously by considering ''mixed'' HMMs.<br /> Let $y_i=\left(y_{i,1},\ldots,y_{i,n_i}\right)$ and $z_i= \left(z_{i,1}, \ldots,z_{i,n_i}\right)$ denote respectively the sequences of observations and hidden states for individual $i$.<br /> <br /> We suppose that the joint distribution of $(z_i,y_i)$ is a parametric distribution that depends on a vector of parameters $\psi_i$ and can be decomposed as<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:hmm1&quot;&gt;&lt;math&gt;<br /> \pcyzipsii(z_i,y_i {{!}} \psi_i) = \pczipsii(z_i {{!}}\psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> For each individual $i$, $z_i$ is a Markov chain whose probability distribution is defined by<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k},\ k=1,2,\ldots,K)$ of the first state $z_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{z_{i,1} = k {{!}} \psi_i} . &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j} \ ; \, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k} \ ; \, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that $q_{i,j}^{\ell,k} = \prob{z_{i,j} = k | z_{i,j-1}=\ell , \psi_i}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ImageWithCaption|image=markov_1.png|caption=Transitions of a Markov chain with 3 states}}<br /> <br /> <br /> The conditional distribution $\qcyizpsii$ depends on the model for the observations: for each state, observation $y_{ij}$ has a certain distribution. Let us see some examples:<br /> <br /> <br /> &lt;br&gt;<br /> === Examples ===<br /> <br /> <br /> 1. In a continuous data model, one possibility is that the residual error model is a hidden Markov model that can randomly switch between $K$ possible residual error models.<br /> <br /> <br /> {{Example<br /> |title=Example 1<br /> |text=In this example, we consider a 2-state Markov chain. A constant error model is assumed in each state:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,1} \teps_{ij} \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;=&amp; \sin(\alpha \, t_{ij}) + a_{i,2} \teps_{ij} \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The figure below displays simulated data from this model for 4 individuals. Observations drawn from state 1 (resp. state 2) are displayed in magenta (resp. black). Of course, the states are unknown in the case of hidden Markov models, i.e., only the values are observed in practice, not the colors.<br /> <br /> <br /> ::[[File:hmm1bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> 2. In a Poisson model for count data, the Poisson parameter might randomly switch between $K$ intensities. Such models have been used for describing the evolution of seizures in epileptic patients:<br /> <br /> <br /> {{Example<br /> |title=Example 2<br /> |text= Instead of assuming a single Poisson distribution for the observed numbers of seizures, this model assumes that patients go through alternating periods of low and high epileptic susceptibility. Therefore we consider what is called a 2-state Poisson mixed-HMM:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,1}) \quad \text{if } z_{ij}=1 \\<br /> y_{ij} &amp;\sim&amp; {\rm Poisson}(\lambda_{i,2}) \quad \text{if } z_{ij}=2.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> :: [[File:hmm2bis.png|link=]]<br /> <br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Distributions of observations==<br /> <br /> <br /> Assuming that the $N$ individuals are independent, the joint pdf is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:sdepdf&quot;&gt;&lt;math&gt;<br /> \pcypsi(y_1,\ldots,y_N {{!}} \psi_1,\ldots,\psi_N ) = \prod_{i=1}^{N}\pcyipsii(y_i {{!}} \psi_i).<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> Then, computing the conditional distribution of the observations $\qcyipsii$ for any individual $i$ requires integration of the joint conditional distribution $\qcyzipsii$ over the states:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pcyzipsii(z_i, y_i {{!}} \psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \pczipsii(z_i {{!}} \psi_i) \, \pcyizpsii(y_i {{!}} z_i,\psi_i) \\<br /> &amp;=&amp; \sum_{z_i \in \mathbf{S} } \left\{ \pi_{i,1}^{z_{i,1} } \pcyiONEzpsii(y_{i,1} {{!}} z_{i,1},\psi_i)\prod_{j=2}^{n} \left( q_{i,j}^{z_{i,j-1},z_{i,j} } \, \pcyijzpsii(y_{i,j} {{!}} z_{i,j},\psi_i) \right) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Though this looks complicated, it turns out that forward recursion of the Baum-Welch algorithm provides a quick way to numerically compute it.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{Albert1991,<br /> title = &quot;A two state Markov mixture model for a time series of epileptic seizure counts&quot;,<br /> author = &quot;Albert, P. S.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;47&quot;,<br /> year = &quot;1991&quot;,<br /> pages = &quot;1371-1381&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Altman2007,<br /> title = &quot;Mixed hidden Markov models : an extension of the hidden Markov model to the longitudinal data setting&quot;,<br /> author = &quot;Altman, R. M.&quot;,<br /> journal = &quot;Journal of the American Statistical Association&quot;,<br /> volume = &quot;102&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;201-210&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Anisimov2007,<br /> title = &quot;Analysis of responses in migraine modelling using hidden Markov models&quot;,<br /> author = &quot;Anisimov, W. and Maas, H. J. and Danhof, M. and Della Pasqua, O.&quot;,<br /> journal = &quot;Statistics in Medicine&quot;,<br /> volume = &quot;26&quot;,<br /> year = &quot;2007&quot;,<br /> pages = &quot;4163-4178&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{Cappe2005,<br /> author = &quot;Capp&amp;eacute;e, O. and Moulines, E. and Ryd&amp;eacute;en, T.&quot;,<br /> title = &quot;Inference in hidden Markov models&quot;,<br /> year = &quot;2005&quot;,<br /> publisher= &quot;Springer Series in Statistics&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{ChaubertPereira2011,<br /> title = &quot;Markov and Semi-Markov Switching Linear Mixed Models Used to Identify<br /> Forest Tree Growth Components&quot;,<br /> author = &quot;Chaubert-Pereira, F. and Gu&amp;eacute;don, Y. and Lavergne, C. and Trottier, C.&quot;,<br /> journal = &quot;Biometrics&quot;,<br /> volume = &quot;66&quot;,<br /> year = &quot;2011&quot;,<br /> pages = &quot;753-762&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012maximum,<br /> title={Maximum likelihood estimation in discrete mixed hidden Markov models using the SAEM algorithm},<br /> author={Delattre, M. and Lavielle, M.},<br /> journal={Computational Statistics &amp; Data Analysis},<br /> year={2012},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{delattre2012analysis,<br /> title={Analysis of exposure-response of CI-945 in patients with epilepsy: application of novel mixed hidden Markov modeling methodology},<br /> author={Delattre, M. and Savic, R. M. and Miller, R. and Karlsson, M. O. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> pages={1-9},<br /> year={2012},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Maruotti2009,<br /> title = &quot;A semiparametric approach to hidden Markov models under longitudinal<br /> observations&quot;,<br /> author = &quot;Maruotti, A. and Ryd&amp;eacute;en, T.&quot;,<br /> journal = &quot;Statistics and Computing&quot;,<br /> volume = &quot;19&quot;,<br /> year = &quot;2009&quot;,<br /> pages = &quot;381-393&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rabiner1989,<br /> title = &quot;A tutorial on Hidden Markov Models and selected applications in speech recognition&quot;,<br /> author = &quot;Rabiner, L. R.&quot;,<br /> journal = &quot;Proceedings of the IEEE&quot;,<br /> volume = &quot;77&quot;,<br /> year = &quot;1989&quot;,<br /> pages = &quot;257-286&quot;}<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{Rijmen2008,<br /> title = &quot;Qualitative longitudinal analysis of symptoms in patients with primary<br /> and metastatic brain tumours&quot;,<br /> author = &quot;Rijmen, F. and Ip, E. H. and Rapp, S. and Shaw, E. G.&quot;,<br /> journal = &quot;Journal of the Royal Statistical Society - Series A.&quot;,<br /> volume = &quot;171, Part 3&quot;,<br /> year = &quot;2008&quot;,<br /> pages = &quot;739-753&quot;}<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack= Mixture models<br /> |linkNext= Stochastic differential equations based models }}</div> Admin http://wiki.webpopix.org/index.php/Joint_models Joint models 2013-06-07T13:57:02Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> An important goal of longitudinal studies is to characterize relationships between different types of response data.<br /> <br /> For instance, in a PKPD population study, we may be interested in the relationship between certain pharmacokinetics (absorption, distribution, metabolism and excretion) and pharmacodynamics (biochemical and physiological effects) of a drug. To do this, we need to measure some of both types of response data for several individuals from the same population, then try and characterize their relationship.<br /> <br /> Alternatively, many clinical trials and reliability studies generate both longitudinal and survival (time-to-event) data. For example, in HIV clinical trials the viral load and the concentration of CD4 cells are widely used as biomarkers for progression to AIDS when studying the efficacy of drugs to treat HIV-infected patients. We might then be interested in the relationship between these variables and events such as seroconversion or death.<br /> <br /> Therefore, in general a ''joint model'' is one that allows us to simultaneously describe the distribution of different types of observations made on the same individual. We consider this as usual in the population context.<br /> <br /> Suppose that we have $L$ different types of observations for individual $i$: $y_i^{(1)}=(y_{ij}^{(1)},1\leq j \leq n_{i1})$, $y_i^{(2)}=(y_{ij}^{(2)},1\leq j \leq n_{i2})$, ..., $y_i^{(L)}=(y_{ij}^{(L)},1\leq j \leq n_{i,L})$, where $n_{i,\ell}$ is the number of observations of type $\ell$ made on individual $i$.<br /> Note that $n_{i,\ell}$ may be different for different $\ell$ for the same individual, and the observation times $(t_{ij}^{(\ell)})$ too.<br /> <br /> Denote $y_i$ the set of observations for individual $i$: $y_i = (y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)})$.<br /> For each individual, the joint probability distribution of the observations $y_i$ and the individual parameters $\psi_i$ can be decomposed as follows<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray} <br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pcyipsii(y_i {{!}} \psi_i) \, \ppsii(\psi_i;\theta) \\<br /> &amp; =&amp; \pcyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)} {{!}} \psi_i) \, \ppsii(\psi_i;\theta) . <br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can then distinguish between different types of dependency between observations: independence, conditional independence and conditional dependence.<br /> <br /> <br /> &lt;br&gt;<br /> == Independent observations ==<br /> <br /> Suppose first that the vector of individual parameters $\psi_i$ can be decomposed into $L$ independent sub-vectors $\psi_i^{(1)}$, $\psi_i^{(2)}$, ..., $\psi_i^{(L)}$ such that $y_i^{(\ell)}$ depends only on $\psi_i^{(\ell)}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii\left(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i^{(1)}, \psi_i^{(2)}, \ldots , \psi_i^{(L)};\theta\right) \\<br /> &amp;=&amp; \prod_{\ell=1}^{L} \pmacro\left(y_i^{(\ell)},\psi_i^{(\ell)};\theta\right) \\<br /> &amp;=&amp; \prod_{\ell=1}^{L} \pmacro\left(y_i^{(\ell)} {{!}} \psi_i^{(\ell)}\right) \pmacro\left(\psi_i^{(\ell)};\theta\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, joint modeling does not bring anything new to the picture because all information on $\psi_i^{(\ell)}$ is contained in the related set of observations $y_i^{(\ell)}$. We can therefore model separately<br /> each set of observations.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK and PD model for warfarin data<br /> |text=<br /> Here, 32 healthy volunteers received a 1.5 mg/kg single oral dose of warfarin, an anticoagulant normally used in the prevention of thrombosis.<br /> We then measured at different times the warfarin plasma concentration $C$ and the prothrombin complex activity (PCA) $E$ for these patients.<br /> The figure represents the PK data (on the left) and the PD data (on the right).<br /> <br /> <br /> {{ImageWithCaption|image=warf0.png|caption= warfarin PK and PD data }}<br /> <br /> <br /> First, we consider two entirely independent parametric models for each of the PK and PD data: a simple one compartment model $f_1$ for the PK and rebound model $f_2$ for the PD. For any $t&gt;0$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> C(t) &amp;=&amp; \displaystyle{ \frac{D\, k_a}{V(k_a-k_e)} } \left( e^{-k_e \, t} - e^{-k_a \, t} \right) \\<br /> E(t) &amp;=&amp; 100\left(\displaystyle{ \frac{\beta}{1+\beta} } e^{-\alpha \, t} + \displaystyle{ \frac{1}{1+\beta \, e^{-\gamma \, t} } }\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can then model the observations supposing for example a combined error model for the PK data and an additive one for the PD data:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:warf1&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> y_{ij}^{(1)} &amp;=&amp; C(t_{ij}^{(1)} ; \psi_i^{(1)}) + (a_1 + b_1\,C(t_{ij}^{(1)};\psi_i^{(1)}))\teps_{ij}^{(1)} \end{array}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:warf2&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> y_{ij}^{(2)} &amp;=&amp; E(t_{ij}^{(2)} ; \psi_i^{(2)}) + a_2 \, \teps_{ij}^{(2)} , <br /> \end{array}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $\psi_i^{(1)}=(ka_i,V_i, ke_i)$ and $\psi_i^{(2)}=(\alpha_i,\beta_i,\gamma_i)$ are independent individual parameter vectors that we suppose log-normally distributed.<br /> <br /> Now that the two models have been defined, we can jointly model the two data types. As they are independent, this means that we can simply use the PK model to fit the concentration data<br /> and the PD model to fit the PCA data. The figure shows the observed data and the individual predictions given by the two models for the &lt;balloon title=&quot;Monolix was used to fit the models. Note that the PD model is for illustrative purposes only; even though it fits well the data, it has no biological interpretation&quot; style=&quot;color:#177245&quot;&gt;4 individuals&lt;/balloon&gt;. <br /> <br /> <br /> &lt;div style=&quot;padding-left:4em&quot;&gt;[[File:warfpkfit1.png|link=]]&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=warfpdfit1.png|caption=Jointly fitted PK and PD warfarin data for 4 individuals using two independent models }}<br /> }}<br /> <br /> <br /> In the same way that we jointly modeled these two types of independent continuous data, we can construct joint models using different types of data at the same time, i.e., various combinations of continuous, categorical, count and survival data, etc., if they are independent.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=Longitudinal and time-to-event data model<br /> |text=<br /> Consider the following joint model for survival and longitudinal data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i^{(1)}) + g(t_{ij} ;\psi_i^{(1)})\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; \psi_i^{(2)}) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The continuous outcome $y_{ij}$ and the time to event $T_i$ are independent if $\psi_i^{(1)}$ and $\psi_i^{(2)}$ are independent.<br /> <br /> <br /> {{Remarks <br /> |title=Remark<br /> |text= If the event is ''drop-out'', it is sometimes called MCAR (missing completely at random). This means that the continuous outcome does not provide any information about drop-out. }}<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> == Conditionally independent examples ==<br /> <br /> In this case, the various observation types depend no longer only on disjoint (i.e., independent) individual parameters. We therefore write $\psi_i$ for the overall set of (partially or fully shared)<br /> individual parameters. Observations are nevertheless supposed independent when conditioning on $\psi_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i;\theta) \\<br /> &amp;=&amp; \left( \prod_{\ell=1}^{L} \pmacro(y_i^{(\ell)} {{!}} \psi_i) \right) \pmacro(\psi_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> In such cases, each observation provides information on the individual parameter vector $\psi_i$.<br /> <br /> This is the most common case when we are simultaneously modeling different types of longitudinal data of the form:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij}^{(1)} &amp;=&amp; f_1(t_{ij}^{(1)} ; \psi_i) + g_1(t_{ij}^{(1)};\psi_i)\teps_{ij}^{(1)} \\<br /> y_{ij}^{(2)} &amp;=&amp; f_2(t_{ij}^{(2)} ; \psi_i) + g_2(t_{ij}^{(2)};\psi_i)\teps_{ij}^{(2)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the predictions $f_1$ and $f_2$ both depend on the same vector of individual parameters, which induces dependency between the observations $y_{i}^{(1)}$ and $y_{i}^{(2)}$. However, these observations are ''conditionally independent'' if the residual errors $\teps_{ij}^{(1)}$ and $\teps_{ij}^{(2)}$ are independent.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A joint PKPD model for warfarin data<br /> <br /> |text=<br /> Pertinent PKPD models aim to establish a link between a drug's concentration and its effect.<br /> An indirect response model assumes that a drug does not instantaneously affect the PD response. Instead, the drug affects a precursor which then influences the PD measure. Here, as warfarin levels increase, prothrombin synthesis is inhibited, which in turn has anti-coagulant effects. Such phenomena can be approximated with a very simple ODE-based mathematical model for the PD component (we use the same one compartment model for the PK component):<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> C(t) &amp;=&amp; \displaystyle{ \frac{D\, k_a}{V(k_a-k_e)} } \left( e^{-k_e \, t} - e^{-k_a \, t} \right) \\<br /> E(t) &amp;=&amp; \displaystyle{ \frac{k_{in} }{ k_{out} } }, \ \ \ \ t\leq 0 \\<br /> \displaystyle{ \frac{d}{dt} }E(t) &amp;=&amp; k_{in}\left( 1 - \displaystyle{ \frac{C(t)}{IC_{50} + C(t)} } \right) - k_{out}\,E(t), \ \ \ \ t &gt;0 .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We could then use the same residual error models [[#eq:warf1|(1)]] and [[#eq:warf2|(2)]] given in the previous example.<br /> <br /> We can also suppose that the vectors $\psi_i^{(1)}=(ka_i,V_i, ke_i)$ and $\psi_i^{(2)}=(IC_{50,i},k_{in,i},k_{out,i})$ are independent, but the fact that the effect $E$ predicted by the model is a function<br /> of the concentration $C$ introduces dependence between the two observation types because both depend on the PK parameters $\psi_i^{(1)}$.<br /> <br /> If the residual errors $(\teps_{ij}^{(1)})$ and $(\teps_{ij}^{(2)})$ are independent, then the observations are conditionally independent, i.e., when the predicted concentration $C(t)$ is given, the observed concentrations $\by^{(1)}$ do not bring any further information on the distribution of the PD observations $\by^{(2)}$.<br /> <br /> This joint model can be used to model the same warfarin data as before (again, using $\monolix$).<br /> The figure shows the resulting individual predictions.<br /> <br /> <br /> &lt;div style=&quot;margin-left:4.2em&quot;&gt;[[File:warfpkfit2.png|link=]]&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=warfpdfit2.png|caption=Fitted PK and PD warfarin data for 4 individuals using a conditionally independent joint model}}<br /> }}<br /> <br /> <br /> We can extend this framework to different types of data, considering for example categorical observations $y_i^{(2)}$ for which the probabilities $\prob{y_{ij}^{(2)} = k}$ depend on $f_1(t_{ij}^{(2)};\psi_i)$ and consequently $\psi_i$. We can also consider survival data for which the risk function depends on $f_1$.<br /> <br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=Longitudinal and time-to-event data model<br /> <br /> |text=Consider a joint model for survival and longitudinal data, assuming now that the hazard function (or equivalently the survival function) depends on the continuous data prediction:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i) + g(t_{ij} ;\psi_i)\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; f(t ; \psi_i)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> If for instance $(y_{ij})$ is the measured viral load of an HIV infected patient, we can assume that the probability of events such as death, seroconversion or drop-out depends on the &quot;true&quot; viral load $f(t ; \psi_i)$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= if the event is ''drop-out'', it is sometimes called MAR (missing at random). This means that the probability of drop-out depends on some of the individual parameters, but that the observation itself of the continuous outcome does not provide any additional information. In our example, this means that the probability that a patient leaves the study depends on their true state (i.e., their true but unknown viral load), and not on the measured viral load values. }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Conditionally dependent observations ==<br /> <br /> In this case, there is a dependency structure between types of observation that no longer allows us to decompose the joint model into a product of models with only one type of observation in each.<br /> <br /> This kind of dependency occurs when several types of longitudinal data are obtained at the same times, with correlated measurement errors. The joint conditional distribution $\qcyipsii$ of the observations is<br /> Gaussian if the residual errors are. The dependency structure between observations can then be characterized by a variance-covariance matrix for the errors.<br /> <br /> We can also consider a natural decomposition of this joint distribution into a product of conditional distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i;\theta) \\<br /> &amp;=&amp; \pmacro(y_i^{(1)} {{!}} \psi_i;\theta) \pmacro(y_i^{(2)} {{!}} y_i^{(1)}, \psi_i;\theta)\ldots \pmacro(y_i^{(L)} {{!}} y_i^{(1)},\ldots,y_i^{(L-1)}, \psi_i;\theta) \pmacro(\psi_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the distribution of $y_i^{(2)}$ depends on the observation $y_i^{(1)}$, the distribution of $y_i^{(3)}$ depends on $y_i^{(1)}$ and $y_i^{(2)}$, etc.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A longitudinal data and drop-out model<br /> <br /> |text= Consider a joint model for longitudinal data and drop-out, assuming now that the hazard function (or equivalently the survival function) depends on the observed data itself:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i) + g(t_{ij} ;\psi_i)\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; (y_{ij}, t_{ij}&lt;t)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This drop-out mechanism is sometimes called MNAR (missing not at random).<br /> In this example where $(y_{ij}, t_{ij}&lt;t)$ is the sequence of measured viral loads before time $t$, MNAR means that the probability that a patient leaves the study depends on their previously-measured viral concentrations. <br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> &lt;!--<br /> == $\mlxtran$ for joint models==<br /> <br /> TO DO<br /> --&gt;<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{albert2004modeling,<br /> title={Modeling repeated count data subject to informative dropout},<br /> author={Albert, P. S. and Follmann, D. A.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={3},<br /> pages={667-677},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{chi2006joint,<br /> title={Joint models for multivariate longitudinal and multivariate survival data},<br /> author={Chi, Y.-Y. and Ibrahim, J. G.},<br /> journal={Biometrics},<br /> volume={62},<br /> number={2},<br /> pages={432-445},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{de1994modelling,<br /> title={Modelling progression of CD4-lymphocyte count and its relationship to survival time},<br /> author={De Gruttola, V. and Tu, X. M.},<br /> journal={Biometrics},<br /> pages={1003-1014},<br /> year={1994},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{henderson2000joint,<br /> title={Joint modelling of longitudinal measurements and event time data.},<br /> author={Henderson, R. and Diggle, P. and Dobson, A.},<br /> journal={Biostatistics},<br /> volume={1},<br /> number={4},<br /> pages={465-480},<br /> year={2000},<br /> publisher={Biometrika Trust}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hsieh2006joint,<br /> title={Joint modeling of survival and longitudinal data: likelihood approach revisited},<br /> author={Hsieh, F. and Tseng, Y.-K. and Wang, J.-L.},<br /> journal={Biometrics},<br /> volume={62},<br /> number={4},<br /> pages={1037-1043},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hu2003joint,<br /> title={A joint model for nonlinear longitudinal data with informative dropout},<br /> author={Hu, C. and Sale, M. E.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={30},<br /> number={1},<br /> pages={83-103},<br /> year={2003},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{liu2009,<br /> author = {Liu, L. and Huang, X. },<br /> title = {Joint analysis of correlated repeated measures and recurrent events processes in the presence of a dependent terminal event},<br /> journal = {J. ROY. STAT. SOC. C-APP.},<br /> volume = {58},<br /> pages = {65-81},<br /> year = {2009}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{rizopoulos2012,<br /> author = {Rizopoulos, D. },<br /> title = {Joint Models for Longitudinal and Time-to-Event Data. With Applications in R.},<br /> publisher = {Chapman &amp; Hall/CRC Biostatistics},<br /> address = {Boca Raton},<br /> year = {2012}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{rondeau2007,<br /> author = {Rondeau, V. and Mathoulin-Pelissier, S. and Jacqmin-Gadda, H. and Brouste, V. and Soubeyran, P. },<br /> title = {Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events.},<br /> journal = {Biostatistics},<br /> volume = {8},<br /> pages = {708-721},<br /> year = {2007}<br /> }<br /> &lt;/bibitex&gt;<br /> &lt;bibtex&gt;<br /> @article{song2004semiparametric,<br /> title={A Semiparametric Likelihood Approach to Joint Modeling of Longitudinal and Time-to-Event Data},<br /> author={Song,X. and Davidian,M. and Tsiatis,A. A.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={4},<br /> pages={742-753},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tsiatis2004joint,<br /> title={Joint modeling of longitudinal and time-to-event data: an overview},<br /> author={Tsiatis, A. A. and Davidian, M.},<br /> journal={Statistica Sinica},<br /> volume={14},<br /> number={3},<br /> pages={809-834},<br /> year={2004}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wu2002joint,<br /> title={A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies},<br /> author={Wu, L.},<br /> journal={Journal of the American Statistical association},<br /> volume={97},<br /> number={460},<br /> pages={955-964},<br /> year={2002},<br /> publisher={American Statistical Association}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wulfsohn1997joint,<br /> title={A joint model for survival and longitudinal data measured with error},<br /> author={Wulfsohn, M. S. and Tsiatis, A. A.},<br /> journal={Biometrics},<br /> pages={330-339},<br /> year={1997},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkNext=Extensions<br /> |linkBack=Models for time-to-event data }}</div> Admin http://wiki.webpopix.org/index.php/Joint_models Joint models 2013-06-07T13:56:53Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ==Introduction==<br /> <br /> An important goal of longitudinal studies is to characterize relationships between different types of response data.<br /> <br /> For instance, in a PKPD population study, we may be interested in the relationship between certain pharmacokinetics (absorption, distribution, metabolism and excretion) and pharmacodynamics (biochemical and physiological effects) of a drug. To do this, we need to measure some of both types of response data for several individuals from the same population, then try and characterize their relationship.<br /> <br /> Alternatively, many clinical trials and reliability studies generate both longitudinal and survival (time-to-event) data. For example, in HIV clinical trials the viral load and the concentration of CD4 cells are widely used as biomarkers for progression to AIDS when studying the efficacy of drugs to treat HIV-infected patients. We might then be interested in the relationship between these variables and events such as seroconversion or death.<br /> <br /> Therefore, in general a ''joint model'' is one that allows us to simultaneously describe the distribution of different types of observations made on the same individual. We consider this as usual in the population context.<br /> <br /> Suppose that we have $L$ different types of observations for individual $i$: $y_i^{(1)}=(y_{ij}^{(1)},1\leq j \leq n_{i1})$, $y_i^{(2)}=(y_{ij}^{(2)},1\leq j \leq n_{i2})$, ..., $y_i^{(L)}=(y_{ij}^{(L)},1\leq j \leq n_{i,L})$, where $n_{i,\ell}$ is the number of observations of type $\ell$ made on individual $i$.<br /> Note that $n_{i,\ell}$ may be different for different $\ell$ for the same individual, and the observation times $(t_{ij}^{(\ell)})$ too.<br /> <br /> Denote $y_i$ the set of observations for individual $i$: $y_i = (y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)})$.<br /> For each individual, the joint probability distribution of the observations $y_i$ and the individual parameters $\psi_i$ can be decomposed as follows<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray} <br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pcyipsii(y_i {{!}} \psi_i) \, \ppsii(\psi_i;\theta) \\<br /> &amp; =&amp; \pcyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)} {{!}} \psi_i) \, \ppsii(\psi_i;\theta) . <br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can then distinguish between different types of dependency between observations: independence, conditional independence and conditional dependence.<br /> <br /> <br /> &lt;br&gt;<br /> == Independent observations ==<br /> <br /> Suppose first that the vector of individual parameters $\psi_i$ can be decomposed into $L$ independent sub-vectors $\psi_i^{(1)}$, $\psi_i^{(2)}$, ..., $\psi_i^{(L)}$ such that $y_i^{(\ell)}$ depends only on $\psi_i^{(\ell)}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii\left(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i^{(1)}, \psi_i^{(2)}, \ldots , \psi_i^{(L)};\theta\right) \\<br /> &amp;=&amp; \prod_{\ell=1}^{L} \pmacro\left(y_i^{(\ell)},\psi_i^{(\ell)};\theta\right) \\<br /> &amp;=&amp; \prod_{\ell=1}^{L} \pmacro\left(y_i^{(\ell)} {{!}} \psi_i^{(\ell)}\right) \pmacro\left(\psi_i^{(\ell)};\theta\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, joint modeling does not bring anything new to the picture because all information on $\psi_i^{(\ell)}$ is contained in the related set of observations $y_i^{(\ell)}$. We can therefore model separately<br /> each set of observations.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK and PD model for warfarin data<br /> |text=<br /> Here, 32 healthy volunteers received a 1.5 mg/kg single oral dose of warfarin, an anticoagulant normally used in the prevention of thrombosis.<br /> We then measured at different times the warfarin plasma concentration $C$ and the prothrombin complex activity (PCA) $E$ for these patients.<br /> The figure represents the PK data (on the left) and the PD data (on the right).<br /> <br /> <br /> {{ImageWithCaption|image=warf0.png|caption= warfarin PK and PD data }}<br /> <br /> <br /> First, we consider two entirely independent parametric models for each of the PK and PD data: a simple one compartment model $f_1$ for the PK and rebound model $f_2$ for the PD. For any $t&gt;0$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> C(t) &amp;=&amp; \displaystyle{ \frac{D\, k_a}{V(k_a-k_e)} } \left( e^{-k_e \, t} - e^{-k_a \, t} \right) \\<br /> E(t) &amp;=&amp; 100\left(\displaystyle{ \frac{\beta}{1+\beta} } e^{-\alpha \, t} + \displaystyle{ \frac{1}{1+\beta \, e^{-\gamma \, t} } }\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can then model the observations supposing for example a combined error model for the PK data and an additive one for the PD data:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:warf1&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> y_{ij}^{(1)} &amp;=&amp; C(t_{ij}^{(1)} ; \psi_i^{(1)}) + (a_1 + b_1\,C(t_{ij}^{(1)};\psi_i^{(1)}))\teps_{ij}^{(1)} \end{array}&lt;/math&gt;&lt;/div&gt; <br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;eq:warf2&quot;&gt;&lt;math&gt;\begin{array}{c}<br /> y_{ij}^{(2)} &amp;=&amp; E(t_{ij}^{(2)} ; \psi_i^{(2)}) + a_2 \, \teps_{ij}^{(2)} , <br /> \end{array}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $\psi_i^{(1)}=(ka_i,V_i, ke_i)$ and $\psi_i^{(2)}=(\alpha_i,\beta_i,\gamma_i)$ are independent individual parameter vectors that we suppose log-normally distributed.<br /> <br /> Now that the two models have been defined, we can jointly model the two data types. As they are independent, this means that we can simply use the PK model to fit the concentration data<br /> and the PD model to fit the PCA data. The figure shows the observed data and the individual predictions given by the two models for the &lt;balloon title=&quot;Monolix was used to fit the models. Note that the PD model is for illustrative purposes only; even though it fits well the data, it has no biological interpretation&quot; style=&quot;color:#177245&quot;&gt;4 individuals&lt;/balloon&gt;. <br /> <br /> <br /> &lt;div style=&quot;padding-left:4em&quot;&gt;[[File:warfpkfit1.png|link=]]&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=warfpdfit1.png|caption=Jointly fitted PK and PD warfarin data for 4 individuals using two independent models }}<br /> }}<br /> <br /> <br /> In the same way that we jointly modeled these two types of independent continuous data, we can construct joint models using different types of data at the same time, i.e., various combinations of continuous, categorical, count and survival data, etc., if they are independent.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=Longitudinal and time-to-event data model<br /> |text=<br /> Consider the following joint model for survival and longitudinal data:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i^{(1)}) + g(t_{ij} ;\psi_i^{(1)})\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; \psi_i^{(2)}) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The continuous outcome $y_{ij}$ and the time to event $T_i$ are independent if $\psi_i^{(1)}$ and $\psi_i^{(2)}$ are independent.<br /> <br /> <br /> {{Remarks <br /> |title=Remark<br /> |text= If the event is ''drop-out'', it is sometimes called MCAR (missing completely at random). This means that the continuous outcome does not provide any information about drop-out. }}<br /> }}<br /> <br /> &lt;br&gt;<br /> <br /> == Conditionally independent examples ==<br /> <br /> In this case, the various observation types depend no longer only on disjoint (i.e., independent) individual parameters. We therefore write $\psi_i$ for the overall set of (partially or fully shared)<br /> individual parameters. Observations are nevertheless supposed independent when conditioning on $\psi_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i;\theta) \\<br /> &amp;=&amp; \left( \prod_{\ell=1}^{L} \pmacro(y_i^{(\ell)} {{!}} \psi_i) \right) \pmacro(\psi_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> In such cases, each observation provides information on the individual parameter vector $\psi_i$.<br /> <br /> This is the most common case when we are simultaneously modeling different types of longitudinal data of the form:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij}^{(1)} &amp;=&amp; f_1(t_{ij}^{(1)} ; \psi_i) + g_1(t_{ij}^{(1)};\psi_i)\teps_{ij}^{(1)} \\<br /> y_{ij}^{(2)} &amp;=&amp; f_2(t_{ij}^{(2)} ; \psi_i) + g_2(t_{ij}^{(2)};\psi_i)\teps_{ij}^{(2)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the predictions $f_1$ and $f_2$ both depend on the same vector of individual parameters, which induces dependency between the observations $y_{i}^{(1)}$ and $y_{i}^{(2)}$. However, these observations are ''conditionally independent'' if the residual errors $\teps_{ij}^{(1)}$ and $\teps_{ij}^{(2)}$ are independent.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A joint PKPD model for warfarin data<br /> <br /> |text=<br /> Pertinent PKPD models aim to establish a link between a drug's concentration and its effect.<br /> An indirect response model assumes that a drug does not instantaneously affect the PD response. Instead, the drug affects a precursor which then influences the PD measure. Here, as warfarin levels increase, prothrombin synthesis is inhibited, which in turn has anti-coagulant effects. Such phenomena can be approximated with a very simple ODE-based mathematical model for the PD component (we use the same one compartment model for the PK component):<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> C(t) &amp;=&amp; \displaystyle{ \frac{D\, k_a}{V(k_a-k_e)} } \left( e^{-k_e \, t} - e^{-k_a \, t} \right) \\<br /> E(t) &amp;=&amp; \displaystyle{ \frac{k_{in} }{ k_{out} } }, \ \ \ \ t\leq 0 \\<br /> \displaystyle{ \frac{d}{dt} }E(t) &amp;=&amp; k_{in}\left( 1 - \displaystyle{ \frac{C(t)}{IC_{50} + C(t)} } \right) - k_{out}\,E(t), \ \ \ \ t &gt;0 .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We could then use the same residual error models [[#eq:warf1|(1)]] and [[#eq:warf2|(2)]] given in the previous example.<br /> <br /> We can also suppose that the vectors $\psi_i^{(1)}=(ka_i,V_i, ke_i)$ and $\psi_i^{(2)}=(IC_{50,i},k_{in,i},k_{out,i})$ are independent, but the fact that the effect $E$ predicted by the model is a function<br /> of the concentration $C$ introduces dependence between the two observation types because both depend on the PK parameters $\psi_i^{(1)}$.<br /> <br /> If the residual errors $(\teps_{ij}^{(1)})$ and $(\teps_{ij}^{(2)})$ are independent, then the observations are conditionally independent, i.e., when the predicted concentration $C(t)$ is given, the observed concentrations $\by^{(1)}$ do not bring any further information on the distribution of the PD observations $\by^{(2)}$.<br /> <br /> This joint model can be used to model the same warfarin data as before (again, using $\monolix$).<br /> The figure shows the resulting individual predictions.<br /> <br /> <br /> &lt;div style=&quot;margin-left:4.2em&quot;&gt;[[File:warfpkfit2.png|link=]]&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=warfpdfit2.png|caption=Fitted PK and PD warfarin data for 4 individuals using a conditionally independent joint model}}<br /> }}<br /> <br /> <br /> We can extend this framework to different types of data, considering for example categorical observations $y_i^{(2)}$ for which the probabilities $\prob{y_{ij}^{(2)} = k}$ depend on $f_1(t_{ij}^{(2)};\psi_i)$ and consequently $\psi_i$. We can also consider survival data for which the risk function depends on $f_1$.<br /> <br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=Longitudinal and time-to-event data model<br /> <br /> |text=Consider a joint model for survival and longitudinal data, assuming now that the hazard function (or equivalently the survival function) depends on the continuous data prediction:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i) + g(t_{ij} ;\psi_i)\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; f(t ; \psi_i)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> If for instance $(y_{ij})$ is the measured viral load of an HIV infected patient, we can assume that the probability of events such as death, seroconversion or drop-out depends on the &quot;true&quot; viral load $f(t ; \psi_i)$.<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= if the event is ''drop-out'', it is sometimes called MAR (missing at random). This means that the probability of drop-out depends on some of the individual parameters, but that the observation itself of the continuous outcome does not provide any additional information. In our example, this means that the probability that a patient leaves the study depends on their true state (i.e., their true but unknown viral load), and not on the measured viral load values. }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Conditionally dependent observations ==<br /> <br /> In this case, there is a dependency structure between types of observation that no longer allows us to decompose the joint model into a product of models with only one type of observation in each.<br /> <br /> This kind of dependency occurs when several types of longitudinal data are obtained at the same times, with correlated measurement errors. The joint conditional distribution $\qcyipsii$ of the observations is<br /> Gaussian if the residual errors are. The dependency structure between observations can then be characterized by a variance-covariance matrix for the errors.<br /> <br /> We can also consider a natural decomposition of this joint distribution into a product of conditional distributions:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pyipsii(y_i,\psi_i;\theta) &amp;=&amp; \pyipsii(y_i^{(1)},y_i^{(2)},\ldots,y_i^{(L)},\psi_i;\theta) \\<br /> &amp;=&amp; \pmacro(y_i^{(1)} {{!}} \psi_i;\theta) \pmacro(y_i^{(2)} {{!}} y_i^{(1)}, \psi_i;\theta)\ldots \pmacro(y_i^{(L)} {{!}} y_i^{(1)},\ldots,y_i^{(L-1)}, \psi_i;\theta) \pmacro(\psi_i;\theta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Here, the distribution of $y_i^{(2)}$ depends on the observation $y_i^{(1)}$, the distribution of $y_i^{(3)}$ depends on $y_i^{(1)}$ and $y_i^{(2)}$, etc.<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A longitudinal data and drop-out model<br /> <br /> |text= Consider a joint model for longitudinal data and drop-out, assuming now that the hazard function (or equivalently the survival function) depends on the observed data itself:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> y_{ij} &amp;=&amp; f(t_{ij} ; \psi_i) + g(t_{ij} ;\psi_i)\teps_{ij} \\<br /> \prob{T_i&gt;t} &amp;=&amp; S(t ; (y_{ij}, t_{ij}&lt;t)) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This drop-out mechanism is sometimes called MNAR (missing not at random).<br /> In this example where $(y_{ij}, t_{ij}&lt;t)$ is the sequence of measured viral loads before time $t$, MNAR means that the probability that a patient leaves the study depends on their previously-measured viral concentrations. <br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> &lt;!--<br /> == $\mlxtran$ for joint models==<br /> <br /> TO DO<br /> --&gt;<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{albert2004modeling,<br /> title={Modeling repeated count data subject to informative dropout},<br /> author={Albert, P. S. and Follmann, D. A.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={3},<br /> pages={667-677},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{chi2006joint,<br /> title={Joint models for multivariate longitudinal and multivariate survival data},<br /> author={Chi, Y.-Y. and Ibrahim, J. G.},<br /> journal={Biometrics},<br /> volume={62},<br /> number={2},<br /> pages={432-445},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{de1994modelling,<br /> title={Modelling progression of CD4-lymphocyte count and its relationship to survival time},<br /> author={De Gruttola, V. and Tu, X. M.},<br /> journal={Biometrics},<br /> pages={1003-1014},<br /> year={1994},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{henderson2000joint,<br /> title={Joint modelling of longitudinal measurements and event time data.},<br /> author={Henderson, R. and Diggle, P. and Dobson, A.},<br /> journal={Biostatistics},<br /> volume={1},<br /> number={4},<br /> pages={465-480},<br /> year={2000},<br /> publisher={Biometrika Trust}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hsieh2006joint,<br /> title={Joint modeling of survival and longitudinal data: likelihood approach revisited},<br /> author={Hsieh, F. and Tseng, Y.-K. and Wang, J.-L.},<br /> journal={Biometrics},<br /> volume={62},<br /> number={4},<br /> pages={1037-1043},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hu2003joint,<br /> title={A joint model for nonlinear longitudinal data with informative dropout},<br /> author={Hu, C. and Sale, M. E.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={30},<br /> number={1},<br /> pages={83-103},<br /> year={2003},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{liu2009,<br /> author = {Liu, L. and Huang, X. },<br /> title = {Joint analysis of correlated repeated measures and recurrent events processes in the presence of a dependent terminal event},<br /> journal = {J. ROY. STAT. SOC. C-APP.},<br /> volume = {58},<br /> pages = {65-81},<br /> year = {2009}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{rizopoulos2012,<br /> author = {Rizopoulos, D. },<br /> title = {Joint Models for Longitudinal and Time-to-Event Data. With Applications in R.},<br /> publisher = {Chapman &amp; Hall/CRC Biostatistics},<br /> address = {Boca Raton},<br /> year = {2012}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{rondeau2007,<br /> author = {Rondeau, V. and Mathoulin-Pelissier, S. and Jacqmin-Gadda, H. and Brouste, V. and Soubeyran, P. },<br /> title = {Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events.},<br /> journal = {Biostatistics},<br /> volume = {8},<br /> pages = {708-721},<br /> year = {2007}<br /> }<br /> &lt;/bibitex&gt;<br /> &lt;bibtex&gt;<br /> @article{song2004semiparametric,<br /> title={A Semiparametric Likelihood Approach to Joint Modeling of Longitudinal and Time-to-Event Data},<br /> author={Song,X. and Davidian,M. and Tsiatis,A. A.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={4},<br /> pages={742-753},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tsiatis2004joint,<br /> title={Joint modeling of longitudinal and time-to-event data: an overview},<br /> author={Tsiatis, A. A. and Davidian, M.},<br /> journal={Statistica Sinica},<br /> volume={14},<br /> number={3},<br /> pages={809-834},<br /> year={2004}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wu2002joint,<br /> title={A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies},<br /> author={Wu, L.},<br /> journal={Journal of the American Statistical association},<br /> volume={97},<br /> number={460},<br /> pages={955-964},<br /> year={2002},<br /> publisher={American Statistical Association}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wulfsohn1997joint,<br /> title={A joint model for survival and longitudinal data measured with error},<br /> author={Wulfsohn, M. S. and Tsiatis, A. A.},<br /> journal={Biometrics},<br /> pages={330-339},<br /> year={1997},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkNext=Extensions<br /> |linkBack=Models for time-to-event data }}</div> Admin http://wiki.webpopix.org/index.php/Models_for_time-to-event_data Models for time-to-event data 2013-06-07T13:51:18Z <p>Admin : /* Censoring and probability distributions */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Here, observations are the &quot;times at which events occur&quot;. An event may be one-off (e.g., death, hardware failure) or repeated (e.g., epileptic seizures, metro strike).<br /> <br /> <br /> ==Single event==<br /> <br /> <br /> To begin with, we will consider a one-off event.<br /> Depending on the application, the length of time to this event may be called the ''survival'' time (until death), ''failure'' time (until hardware fails), etc. To be general, we can just say ''event'' time.<br /> <br /> The random variable representing the event time for subject $i$ is typically written $T_i$. Several situations are then possible to define the observations:<br /> <br /> <br /> &lt;ul&gt;<br /> * The event time is exactly observed.<br /> <br /> <br /> ::[[File:survival1.png|link=]]<br /> <br /> <br /> : Then, the observation for individual $i$ is $y_i = t_i$, where $t_i$ is a realization of the random variable $T_i$.<br /> &lt;br&gt;<br /> <br /> * We may know the event has happened in an interval $I_i$ but not know the exact time $t_i$. This is ''interval censoring''. For example, at a routine check-up, cancer recurrence may be detected, and we only know that it has occurred at some point in time since the last check-up.<br /> <br /> <br /> ::[[File:survival3.png|link=]]<br /> <br /> <br /> : The observation for individual $i$ is the event: $y_i =$ &quot;$a_i &lt; t_i \leq b_i$&quot;.<br /> &lt;br&gt;<br /> <br /> * If we assume that the trial ends at time $\tstop$, then the event may happen after the end of the trial period. This is ''right censoring''.<br /> <br /> <br /> ::[[File:survival2.png|link=]]<br /> <br /> <br /> : There are several variations of this for defining what the observations are:<br /> &lt;br&gt;<br /> <br /> * If events (before $\tstop$) are exactly observed, then for $i=1,2,\ldots, N$,<br /> <br /> {{Equation1|<br /> equation=&lt;math&gt;<br /> y_i = \left\{<br /> \begin{array}{ll}<br /> t_i &amp; {\rm if \quad} t_i \leq \tstop \\<br /> {\rm t_i &gt; \tstop \quad} &amp; {\rm otherwise. \quad}<br /> \end{array} \right.<br /> &lt;/math&gt;}}<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ExampleWithText&amp;Table<br /> |title1=Example:<br /> |title2=<br /> |equation=<br /> Assume that a trial starts at $\tstart=0$ and ends at $\tstop=5$, and that we obtain the following observations from 4 individuals: <br /> <br /> $y_1 = 3.2$ <br /> <br /> $y_2=$ &quot;$t_2&gt;5$&quot;<br /> <br /> $y_3= 2.7$ <br /> <br /> $y_4 =$ &quot;$t_4&gt;5$&quot;<br /> <br /> <br /> These observations can be stored in a data file as shown in the table on the right.<br /> <br /> Here, &quot;event=0&quot; at time $t$ means that the event happened after $t$ while &quot;event=1&quot; means that the event happened at time $t$. <br /> <br /> The lines with $t=0$ are used to state the trial start time $\tstart=0$.<br /> <br /> |table=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width: 75%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3.2 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 2.7 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}} <br /> }}<br /> <br /> <br /> &lt;ul&gt;<br /> * If events before $\tstop$ are interval censored, then for $i=1,2,\ldots, N$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_i = \left\{<br /> \begin{array}{ll}<br /> {\rm a_i &lt; t_i \quad \leq \quad b_i} &amp; {\rm if \quad} t_i\leq \tstop \\<br /> {\rm t_i &gt; \tstop \quad} &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ExampleWithText&amp;Table<br /> |title1=Example:<br /> |title2=<br /> |equation=<br /> Assume that we have censoring intervals of length 1: <br /> <br /> <br /> $(0,1],(1,2],\ldots,(4,5]$.<br /> <br /> <br /> For the same four individuals as the previous example, we now have the following observations: <br /> <br /> <br /> $y_1=$ &quot;$3 &lt; t_1 \leq 4$&quot;, <br /> <br /> $y_2=$ &quot;$t_2&gt;5$&quot;, <br /> <br /> $y_3=$ &quot;$2&lt; t_3 \leq 3$&quot;, <br /> <br /> $y_4=$ &quot;$t_4&gt;5$&quot;. <br /> <br /> <br /> These observations can be stored in a data file as shown in the table on the right.<br /> <br /> Here &quot;event=0&quot; at time $t$ means that the event happened after $t$ while &quot;event=1&quot; means that the event happened before time $t$.<br /> |table=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width: 75%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}2 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 2 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}3 {{!}}{{!}} 3 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}4 {{!}}{{!}} 5 {{!}}{{!}} 0 <br /> {{!}}}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Probability distributions == <br /> <br /> <br /> Several functions play key roles in time-to-event analysis: the survival function, the hazard function and the cumulative hazard function.<br /> We are still working under a population approach here and so these functions, detailed below, are therefore individual functions, i.e., each subject has its own. As we are using parametric models, this means that these functions depend on individual parameters $(\psi_i)$.<br /> <br /> <br /> &lt;ul&gt;<br /> * The '''survival function''' $S(t; \psi_i)$ gives the probability that the event happens to individual $i$ after time $t&gt;t_{start}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> S(t; \psi_i) \ \ \eqdef \ \ \prob{T_i&gt;t ; \psi_i} .<br /> &lt;/math&gt; }}<br /> <br /> <br /> <br /> * The '''hazard function''' $\hazard(t;\psi_i)$ is defined for individual $i$ as the instantaneous rate of the event at time $t$, given that the event has not already occurred:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \hazard(t;\psi_i) \ \ \eqdef \ \ \lim_{dt\to 0} \displaystyle{\frac{S(t;\psi_i) - S(t + dt;\psi_i)}{ S(t;\psi_i) \, dt} }. <br /> &lt;/math&gt; }}<br /> <br /> : This is equivalent to: <br /> <br /> {{Equation1<br /> |equation=&lt;div id=&quot;HazardSurvival&quot; &gt;&lt;math&gt; <br /> \hazard(t;\psi_i) \ \ = \ \ -\displaystyle{ \frac{d}{dt} } \log{S(t;\psi_i)}. <br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1)<br /> }} <br /> <br /> <br /> * Another useful quantity is the '''cumulative hazard function''' $\cumhaz(a,b;\psi_i)$, defined for individual $i$ as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \cumhaz(a,b;\psi_i) \ \ \eqdef \ \ \displaystyle{\int_a^b \hazard(t;\psi_i) \, dt }.<br /> &lt;/math&gt;}}<br /> <br /> : Note that [[#HazardSurvival|(1)]] implies that:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> S(t;\psi_i) \ \ = \ \ e^{-\cumhaz(t_{start},t;\psi_i)}.<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> Equation [[#HazardSurvival|(1)]] shows that the hazard function $\hazard(t;\psi_i)$ characterizes the problem, because knowing it is the same as knowing the survival function $S(t;\psi_i)$. The probability distribution of survival data is therefore completely defined by the hazard function.<br /> Let $\qcyipsii$ be the conditional distribution of the observation $y_i$ given the vector of individual parameters $\psi_i$. Its pdf can be easily computed for the various censoring situations discussed above:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt;If the event is exactly observed with $y_i=t_i$, the density is the derivative of the cumulative density function, i.e., the derivative of $1 - S(t_i;\psi_i)$:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \begin{eqnarray}\pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \frac{d}{dt_i}\left(1 - e^{-\cumhaz(t_{start},t_i;\psi_i)}\right)\\<br /> %&amp;=&amp; \left(\frac{d}{dt_i} \int_{t_{start} }^{t_i} \hazard(u;\psi_i) \, du \right) e^{-\cumhaz(t_{start},t_i;\psi_i)}\\<br /> &amp;=&amp;\hazard(t_i;\psi_i)e^{-\cumhaz(t_{start},t_i;\psi_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt;If the event is interval-censored with $y_i=\,$ &quot;$a_i&lt;t_i\leq b_i$&quot;:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \prob{T_i \in (a_i,b_i]\,{{!}} \,\psi_i} \\<br /> %&amp;=&amp; \prob{T_i \leq b_i {{!}} \psi_i} - \prob{T_i \leq a_i {{!}} \psi_i} \\<br /> %&amp;=&amp; (1-S( b_i ; \psi_i)) - (1-S( a_i ; \psi_i)) \\<br /> &amp;=&amp; e^{-\cumhaz(t_{start},a_i;\psi_i)} - e^{-\cumhaz(t_{start},b_i;\psi_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt;If the event is right-censored with $y_i= \,$ &quot;$t_i&gt;t_{stop}$&quot;:&lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcyipsii(y_i {{!}} \psi_i) &amp;=&amp; \prob{T_i &gt; t_{stop} {{!}} \psi_i} \\<br /> %&amp;=&amp; S( t_{stop} ; \psi_i) \\<br /> &amp;=&amp; e^{-\cumhaz(t_{start},t_{stop};\psi_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> ==Repeated events==<br /> <br /> <br /> <br /> Sometimes, an event can potentially happen again and again, e.g., epileptic seizures, heart attacks, etc.<br /> For any given hazard function $\hazard$, the survival function $S$ for individual $i$ now represents survival since the previous event at $t_{i,j-1}$, written here in terms of the cumulative hazard from $t_{i,j-1}$ to $t_{i,j}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> S(t_{i,j} {{!}} t_{i,j-1};\psi_i) &amp;=&amp; \prob{T_{i,j} &gt; t_{i,j}\, {{!}} \,T_{i,j-1} = t_{i,j-1};\psi_i} \\<br /> &amp;=&amp; e^{-\cumhaz(t_{i,j-1},t_{i,j};\psi_i)} \\<br /> &amp;=&amp; \exp\left({-\int_{t_{i,j-1} }^{t_{i,j} } \hazard(t;\psi_i) \, dt}\right) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> &lt;!--%In the most simple case, $y_i$ is a vector of known event times: $y_i = (t_{i1},t_{i2},\ldots,t_{i\,n_i}).$ --&gt;<br /> <br /> <br /> &lt;br&gt;<br /> ==Censoring and probability distributions==<br /> <br /> <br /> Taking into account censoring for repeated events is slightly more complicated than for one-off events.<br /> First, let us assume that a trial starts at time $t_{start}$ and ends at time $t_{stop}$. Let $(T_{i1}, T_{i2}, \ldots )$ be random event times after $t_{start}$. Then, we can distinguish between the two following situations:<br /> <br /> <br /> <br /> &lt;ul&gt;<br /> 1. ''Exactly observed events:'' A sequence of $n_i$ event times is precisely observed before $t_{stop}$, i.e., ${\rm y_i = (t_{i,1},t_{i,2},\ldots,t_{i,n_i}, \quad t_{i,n_i+1}&gt;\tstop)}$. <br /> <br /> : The conditional pdf of $y_i$ is given by:<br /> <br /> : The conditional pdf of $y_i$ is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;repeatcensor&quot; &gt;&lt;math&gt; <br /> \pcyipsii(y_i {{!}} \psi_i) = \left(\prod_{j=1}^{n_i}\hazard(t_{ij};\psi_i)e^{-\cumhaz(t_{i,j-1},t_{i,j};\psi_i)} \right)e^{-\cumhaz(t_{n_i},\tstop;\psi_i)} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> : where $t_{i0}=\tstart$.<br /> &lt;/ul&gt;<br /> <br /> {{ExampleWith2Tables<br /> |title1=Example<br /> |title2=<br /> |text=<br /> Suppose that for individual $i=1$ we know there were 8 events but only 7 of them occurred before $\tstop$. Here is a graphic showing the events that were exactly observed:<br /> <br /> <br /> ::[[File:survival4.png|link=]]<br /> <br /> <br /> This data is then stored in the table on the left below. We see that the 8th and final event is noted &quot;event = 0&quot; with time $\tstop = 18$, indicating that the event was not observed at the end of the time period $\tstop$. In the table on the right, we show the contributions of each observation to the conditional pdf of $y_1$. Indeed, equation [[#repeatcensor|(1)]] means that the pdf of $y_1=(y_{1,1}, \ldots, y_{1,8})$ is the product of the conditional pdfs given in the right table.<br /> <br /> <br /> |table1=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:120%; margin-left:10%;margin-right:10%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 1.4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3.5 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 4.4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 5.6 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 9.7 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 11.4 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 15.8 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 18 {{!}}{{!}} 0 <br /> {{!}}}<br /> <br /> |table2 =<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:200%; margin-right:10%; margin-left:10%&quot;<br /> !{{!}} pdf <br /> {{!}}-<br /> {{!}} 1 <br /> {{!}}-<br /> {{!}} $\hazard(1.4;\psi_1)e^{-\cumhaz(0,1.4;\psi_1)}$<br /> {{!}}-<br /> {{!}} $\hazard(3.5;\psi_1)e^{-\cumhaz(1.4,3.5;\psi_1)}$<br /> {{!}}-<br /> {{!}} $\hazard(4.4;\psi_1)e^{-\cumhaz(3.5,4.4;\psi_1)}$<br /> {{!}}-<br /> {{!}} $\hazard(5.6;\psi_1)e^{-\cumhaz(4.4,5.6;\psi_1)}$<br /> {{!}}-<br /> {{!}} $\hazard(9.7;\psi_1)e^{-\cumhaz(5.6,9.7;\psi_1)}$<br /> {{!}}-<br /> {{!}} $\hazard(11.4;\psi_1)e^{-\cumhaz(9.7,11.4;\psi_1)}$<br /> {{!}}-<br /> {{!}} $\hazard(15.8;\psi_1)e^{-\cumhaz(11.4,15.8;\psi_1)}$<br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(15,18;\psi_1)}$<br /> {{!}}}<br /> }}<br /> <br /> <br /> &lt;ul&gt;<br /> 2. ''Interval-censored events:'' Let $(b_{0}, b_1], (b_{1}, b_2], \ldots , (b_{K-1}, b_K]$ be a sequence of successive intervals with $\tstart=b_0&lt;b_1&lt;b_2 &lt; \ldots &lt;b_K = \tstop$. We do not know the exact event times, but a sequence $(m_{ik}; \, 1 \leq k \leq K)$ is observed, where $m_{ik}$ is the number of events that occurred for individual $i$ in interval $(b_{k-1}, b_k]$.<br /> <br /> : We can show that the conditional pdf of $y_i$ is given by:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;pdf_mult_int&quot; &gt;&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \prod_{k=1}^{K} e^{-\cumhaz(b_{k-1}, b_k;\psi_i)} \displaystyle{\frac{\cumhaz^{m_{ik} }(b_{k-1}, b_k;\psi_i)}{m_{ik}!} } .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> : In other words, the number of events per interval for individual $i$ is a (possibly non-homogeneous) Poisson process with intensity $\cumhaz(b_{k-1}, b_k;\psi_i)$ in interval $(b_{k-1}, b_k]$.<br /> &lt;/ul&gt;<br /> <br /> <br /> {{ExampleWith2Tables<br /> |title1=Example<br /> |title2=<br /> <br /> |text= Here is a graphic that shows an example of the interval boundaries and the number of events that occurred in each interval for individual $i=1$.<br /> <br /> <br /> ::[[File:survival5.png|link=]]<br /> <br /> <br /> The table on the left below shows the same data. Using [[#pdf_mult_int|(2)]] we see that the conditional pdf of $y_1=(y_{1,1}, \ldots, y_{1,6})$ is the product of the conditional pdfs given in the table on the right.<br /> <br /> <br /> |table1=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:120%; margin-left:10%;margin-right:10%&quot;<br /> !{{!}} ID {{!}}{{!}} TIME {{!}}{{!}} EVENT <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 0 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 3 {{!}}{{!}} 1 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 6 {{!}}{{!}} 3 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 9 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 12 {{!}}{{!}} 2 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 15 {{!}}{{!}} 0 <br /> {{!}}-<br /> {{!}}1 {{!}}{{!}} 18 {{!}}{{!}} 1 <br /> {{!}}}<br /> <br /> |table2=<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:200%; margin-right:10%; margin-left:10% &quot;<br /> !{{!}} pdf <br /> {{!}}-<br /> {{!}} 1 <br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(0,3;\psi_1)}\cumhaz(0,3;\psi_1)$<br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(3,6;\psi_1)} {\cumhaz^{3}(3,6;\psi_1)}/{6}$<br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(6,9;\psi_1)}$<br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(9,12;\psi_1)} {\cumhaz^{2}(9,12;\psi_1)}/{2}$<br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(12,15;\psi_1)}$<br /> {{!}}-<br /> {{!}} $e^{-\cumhaz(15,18;\psi_1)}\cumhaz(15,18;\psi_1)$<br /> {{!}}}<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= if the total number $n_i$ of (observed and unobserved) events for individual $i$ is known to be finite, then formula [[#pdf_mult_int|(2)]] is slightly modified when the last event occurs before $\tstop$ ($t_{n_i}&lt;\tstop$).<br /> Assume that the last event for individual $i$ occurs in the $K_i$-th interval. Let $s_{i} = \sum_{i=1}^{k_i-1} m_{ik}$ be the number of events that occurred before this interval. Then, we can show that<br /> <br /> {{EquationWithRef_Special<br /> |equation=&lt;div id=&quot;pdf_mult_int2&quot;&gt;&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \prod_{k=1}^{K_i-1} \left( \displaystyle{ \frac{\cumhaz^{m_{ik} }(b_{k-1}, b_k;\psi_i)}{m_{ik}!} }e^{-\cumhaz(b_{k-1}, b_k;\psi_i)} \right)<br /> \!\times \!\left(1 - \sum_{\ell=0}^{n_i-s_{i} } \displaystyle{ \frac{\cumhaz^{\ell}(b_{k_i -1},b_{k_i};\psi_i)}{\ell!} } e^{-\cumhaz(b_{k_i -1},b_{k_i};\psi_i)}\right) . &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Examples of hazard functions==<br /> <br /> <br /> <br /> &lt;ul&gt;<br /> * ''Constant hazard model:'' <br /> : The most simple case is that of a constant hazard function: $\hazard(t;\psi_i) = \hazard_i \in \Rset$. Here, $\psi_i=\hazard_i$. <br /> &lt;br&gt;<br /> <br /> <br /> * ''Proportional hazards model:''<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hazard(t;\psi_i) = \hazard_0(t;\alpha_i) \, e^{ \langle \beta , c_i \rangle}.<br /> &lt;/math&gt;}}<br /> <br /> : Here, the hazard is decomposed into two terms: a baseline function $\hazard_0$ of $t$, and an &quot;individual&quot; term, function of some individual covariates $c_i$. $\langle \beta , c_i \rangle$ means a scalar product, i.e., a linear function of $c_i$. In a proportional hazards model, a unit increase in the value of a covariate has a multiplicative effect on the hazard.<br /> <br /> : In the usual proportional hazard model, $\alpha_i$ is a population constant ($\alpha_i=\alpha$). Then, $\psi_i$ can be decomposed into a set of population parameters $\alpha$ and an individual parameter $\langle \beta , c_i \rangle$. A straightforward extension consists in assuming that $\alpha_i$ is also an individual parameter.<br /> &lt;br&gt;<br /> <br /> <br /> * ''Extended proportional hazards model:''<br /> <br /> : Another possible extension assumes that the hazard function is a (possibly nonlinear) function $u$ of a regression variable $x_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hazard(t;\bpsi_i) = \hazard_0(t;\alpha_{i}) \, e^{ u(\beta_i,x_i(t))} .<br /> &lt;/math&gt; }}<br /> <br /> :Consider for example that $x_i(t)$ is the plasmatic concentration of a drug at time $t$ for individual $i$. Then, $u(\beta_i,x_i(t))$ is the term that represents (i.e., models) the effect of the drug on the hazard, while $\hazard_0(t;\alpha_i)$ might model the effect of disease progression on the hazard.<br /> &lt;!--%We consider here parametric functions that possibly depend on individual parameters.--&gt;<br /> <br /> : In this example, $x_i(t)$ is the &quot;true&quot; plasmatic concentration for subject $i$ at time $t$, and it is a continuous function of time. However, in practice it is only measured at precise times, so a longitudinal model for plasmatic concentration is needed to give a concentration value for each $t$.<br /> :Therefore, in practice we need to develop a ''joint model'' in order to simultaneously model time-to-events data and longitudinal data. Such an approach is introduced in the [[Joint models]] section.<br /> &lt;br&gt;<br /> <br /> <br /> * ''Accelerated failure time (AFT) model:''<br /> <br /> :Unlike proportional hazards models, the AFT model supposes that a change in a covariate has a multiplicative effect not on the hazard but the ''predicted event time''. This can be written as:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \log(T_i) = \langle \psi_i , c_i \rangle + \xi_i<br /> &lt;/math&gt;<br /> }}<br /> <br /> : where $\xi_i$ is a zero-mean random variable, e.g., a centered normal distribution. Usually, parameters are fixed effects: $\psi_i=\psi$ for each subject $i$.<br /> : To calculate the hazard function, let us first denote $p_{\xi_i}$ the density and $F_{\xi_i}$ the cdf of $\xi_i$, and to simplify, denote $\mu_i = \langle \psi_i , c_i \rangle$ the mean of $\log(T_i)$. We begin by calculating the survival function:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> S(t;\psi_i) &amp;=&amp; \prob{\log{T_i} &gt; \log{t} ; \bpsi_i} \\<br /> &amp;=&amp; \int_{\log{t}-\mu_i}^{\infty} p_{\xi_i}(u; \psi_i) \, du \\<br /> &amp;=&amp; 1 - F_{\xi_i}(\log{t}-\mu_i ; \psi_i) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> :Calculating [[#HazardSurvival|(1)]] then gives the hazard function:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hazard(t;\psi_i) = \displaystyle{ \frac{p_{\xi_i}(\log{t} - \mu_i; \psi_i)}{t(1- F_{\xi_i}(\log{t} - \mu_i; \psi_i))} }\,<br /> &lt;/math&gt; }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary <br /> |title=Summary<br /> |text=<br /> For a given vector of individual parameters $\psi_i$, a model for (repeated) time-to-event data is completely defined by<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the hazard function $\hazard(t ; \psi_i)$, or the survival function $S(t ; \psi_i)$ &lt;/li&gt;<br /> <br /> &lt;li&gt; (possibly) the interval and/or right censoring process &lt;/li&gt;<br /> <br /> &lt;li&gt; (possibly) the maximum number of possible events &lt;/li&gt;<br /> &lt;/ol&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> ==$\mlxtran$ for time-to-event data models==<br /> --&gt;<br /> <br /> &lt;br&gt;<br /> <br /> ==Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{aalen2008,<br /> author = {Aalen, O. and Borgan, O. and Gjessing, H.},<br /> title = {Survival and Event History Analysis. },<br /> publisher = {Springer},<br /> address = {New York},<br /> year = {2008}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{andersen2006survival,<br /> title={Survival analysis},<br /> author={Andersen, P. K.},<br /> year={2006},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{diggle1994,<br /> author = {Diggle, P. and Kenward, M. G.},<br /> title = {Informative drop-out in longitudinal data analysis.},<br /> journal = {Appl. Stats},<br /> volume = {43},<br /> number = {},<br /> pages = {49-93},<br /> year = {1994}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @book{duchateau2008,<br /> author = {Duchateau, L. and Janssen, P.},<br /> title = {The Frailty Model. Statistics for Biology and Health },<br /> publisher = {Springer.},<br /> volume = {},<br /> pages = {},<br /> year = {2008},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fleming2011counting,<br /> title={Counting processes and survival analysis},<br /> author={Fleming, T. R. and Harrington, D. P.},<br /> volume={169},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{huang2007,<br /> author = {Huang, X. and Liu, L.},<br /> title = {A joint frailty model for survival and gap times between recurrent events.},<br /> journal = {Biometrics},<br /> volume = {63},<br /> number = {},<br /> pages = {389-397},<br /> year = {2007}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{ibrahim2005bayesian,<br /> title={Bayesian survival analysis},<br /> author={Ibrahim, J. G. and Chen, M.-H. and Sinha, D.},<br /> year={2005},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{kalbfleisch2011statistical,<br /> title={The statistical analysis of failure time data},<br /> author={Kalbfleisch, J. D. and Prentice, R. L.},<br /> year={2011},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{kelly2000,<br /> author = {Kelly, P. J. and Jim, L. L.},<br /> title = {Survival analysis for recurrent event data: an application to childhood infectious disease.},<br /> journal = {Statistics in Medicine},<br /> volume = {19},<br /> number = {1},<br /> pages = {13-33},<br /> year = {2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{klein2003survival,<br /> title={Survival analysis: techniques for censored and truncated data},<br /> author={Klein, J. P. and Moeschberger, M. L.},<br /> year={2003},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{klein1997,<br /> author = {Klein, J. P. and Moeschberger, M. L.},<br /> title = { Survival Analysis - Techniques for Censored and Truncated Data. },<br /> publisher = {Springer-Verlag},<br /> volume = {},<br /> pages = {},<br /> year = {1997},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{kleinbaum2011survival,<br /> title={Survival analysis},<br /> author={Kleinbaum, D. G.},<br /> year={2011},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{miller2011survival,<br /> title={Survival analysis},<br /> author={Miller Jr, R. G.},<br /> year={2011},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{wienke2010frailty,<br /> title={Frailty models in survival analysis},<br /> author={Wienke, A.},<br /> volume={37},<br /> year={2010},<br /> publisher={Chapman &amp; Hall}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Model for categorical data<br /> |linkNext=Joint models }}</div> Admin http://wiki.webpopix.org/index.php/Model_for_categorical_data Model for categorical data 2013-06-07T13:47:16Z <p>Admin : /* Continuous time Markov chains */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Overview == <br /> <br /> Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$.<br /> Considering the observations $(y_{ij}, 1 \leq j \leq n_i)$ of any individual $i$ as a sequence of independent random variables, the model is completely defined by the probability mass functions $\prob{y_{ij}=c_k | \psi_i}$, for $k=1,\ldots, K$ and $1 \leq j \leq n_i$.<br /> <br /> For a given $(i,j)$, the sum of the $K$ probabilities is 1, so in fact only $K-1$ of them need to be defined.<br /> <br /> In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\prob{y_{ij}=c_k | \psi_i} \in [0,1]$, and $\sum_{k=1}^{K} \prob{y_{ij}=c_k | \psi_i} = 1$. For instance, we could define $K$ time-dependent parametric functions $a_1$, $a_2$, ..., $a_K$ and set for any individual $i$, time $t_{ij}$ and $k \in \{1,\ldots,K\}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;categorical1&quot; &gt;&lt;math&gt; <br /> \prob{y_{ij}=c_k {{!}} \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= Suppose we want to model binary data, i.e., data where $y_{ij} \in \{0,1\}$.<br /> <br /> Let $\psi_i=(\alpha_i,\beta_i)$ and let $a_1(t,\psi_i)=0$ and $a_2(t,\psi_i) = \alpha_i + \beta_i \, t$. Then, [[#categorical1|(1)]] gives a probability distribution for binary outcomes:<br /> <br /> {{Equation1|equation= &lt;math&gt;<br /> \prob{y_{ij}=0 {{!}} \psi_i} = \displaystyle{\frac{1}{1 + e^{\alpha_i + \beta_i \, t_{ij} } } } \quad \ \ \ \text{and} \quad<br /> \ \ \ \prob{y_{ij}=1 {{!}} \psi_i} = \displaystyle{\frac{e^{\alpha_i + \beta_i \, t_{ij} } }{1 + e^{\alpha_i + \beta_i \, t_{ij} } } }. <br /> &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> Such parametrizations are extremely flexible and easy to interpret in simple situations.<br /> In the previous example for instance, $\prob{y_{ij}=1 | \psi_i}$ and $a_2(t_{ij},\psi_i)$ move in the same direction as time increases.<br /> <br /> <br /> &lt;br&gt;<br /> == Ordinal data ==<br /> <br /> <br /> Ordinal data further assumes that the categories are ordered, i.e., there exists an order $\prec$ such that<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> c_1 \prec c_2,\prec \ldots \prec c_K .<br /> &lt;/math&gt;}}<br /> <br /> We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.<br /> <br /> Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\prob{y_{ij} \preceq c_k | \psi_i}$ for $k=1,\ldots ,K-1$, or in the other direction: $\prob{y_{ij} \succeq c_k | \psi_i}$ for $k=2,\ldots, K$. <br /> Any model is possible as long as it defines a probability distribution, i.e., satisfies:<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> 0 \leq \prob{y_{ij} \preceq c_1 {{!}} \psi_i} \leq \prob{y_{ij} \preceq c_2 {{!}} \bpsi_i} \leq \ldots \leq \prob{y_{ij} \preceq c_K {{!}} \psi_i} =1 .<br /> &lt;/math&gt; }}<br /> <br /> Without any loss of generality, we will consider numerical categories in what follows. The order $\prec$ then reduces to the usual order $&lt;$ on $\Rset$.<br /> Currently, the most popular model for ordinal data is the proportional odds model which uses ''logits'' of these cumulative probabilities, also called ''cumulative logits''. We assume that there exist $\alpha_{i,1}\geq0$, $\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K-1}\geq 0$ such that for $k=1,2,\ldots,K-1$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $x(t_{ij})$ is a vector of regression variables and $\beta_i$ a vector of coefficients. Here, $\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K-1},\beta_i)$.<br /> <br /> Recall that $\logit(p) = \log\left(p/(1-p)\right)$. Then, the probability defined in [[#propodds_model|(2)]] can also be expressed as<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> \prob{y_{ij} \leq c_k {{!}} \bpsi_i} = \displaystyle{\frac{1}{1 + e^{ \left(\sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij})} } }.<br /> &lt;/math&gt;}} <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We give to patients a drug which is supposed to decrease the level of a given type of pain. <br /> The level of pain is measured on a scale from 1 to 3: 1=low, 2=moderate, 3=high. We consider the following model with the constraint that $\alpha_{i2}\geq 0$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit \left(\prob{y_{ij} \leq 1 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \beta_{i,1}\, t_{ij} + \beta_{i,2}\, C_{ij} \\<br /> \logit \left(\prob{y_{ij} \leq 2 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \alpha_{i,2} + \beta_{i,1}\, t_{ij} + \beta_{i2}\, C_{ij} \\<br /> \prob{y_{ij} \leq 3 {{!}} \psi_i} &amp;=&amp; 1,<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> where $C_{ij}$ is the concentration of the drug at time $t_{ij}$. The model parameters are quite easy to explain:<br /> <br /> <br /> * $\beta_{i,1}=0$ means that without treatment, the level of pain tends to remains stable over time.<br /> * $\beta_{i,1}&lt;0$ (resp. $\beta_{i1}&gt;0$) means that the pain tends to increase (resp. decrease) over time.<br /> * $\beta_{i,2}=0$ means that the drug has no effect on pain.<br /> * $\beta_{i,2}&gt;0$ means that the level of pain tends to decrease when the drug concentration increases, whereas $\beta_{i2}&lt;0$ means that pain is an adverse drug effect.<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Exclusive use of linear models (or generalized linear models) has no real justification today since very efficient tools are available for nonlinear models.<br /> Model [[#propodds_model|(2)]] can be easily extended to a nonlinear model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model2&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq k {{!}} \psi_i } \right) = \sum_{m=1}^k \alpha_{i,m} + \beta(x(t_{ij})) , &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> where $\beta$ is any (linear or nonlinear) function of $x(t_{ij})$. }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Markovian dependence ==<br /> <br /> <br /> For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.<br /> <br /> We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j-1}$. i.e., for all $k=1,2,\ldots ,K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y_{i,j} = k\, {{!}} \,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i} = \prob{y_{i,j} = k {{!}} y_{i,j-1},\psi_i}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> === Discrete time Markov chains ===<br /> <br /> If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$ of the first observation $y_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{y_{i,1} = k {{!}} \psi_i} &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j}, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> q_{i,j}^{\ell,k} &amp;=&amp; \prob{y_{i,j} = k {{!}} y_{i,j-1}=\ell , \psi_i} \quad \text{ for all } (\ell,k),\\<br /> \sum_{k=1}^{K}q_{ij}^{\ell,k} &amp;=&amp; 1 \quad \text{ for all } (\ell,k).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of $y_i=(y_{i,j}, j=1,2,\ldots, n_i)$ is then well-defined:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \pmacro(y_{i,1}{{!}}\psi_i) \prod_{j=2}^{n_i} \pmacro(y_{i,j} {{!}} y_{i,j-1},\psi_i) .<br /> &lt;/math&gt; }}<br /> <br /> For a given individual $i$, $Q_{i,j}$ defines the transition probabilities between states at a given time $t_{ij}$:<br /> <br /> <br /> ::[[File:markov_1.png|link=]]<br /> <br /> <br /> Our model must therefore give, for each individual $i$, the distribution of first observation $(y_{i,1})$ and a description of how the transition probabilities evolve with time.<br /> <br /> The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit\left(q_{i,j}^{1,2}\right) &amp;=&amp; a_i+b_i \, t_j \\<br /> \logit\left(q_{i,j}^{2,1}\right) &amp;=&amp; c_i+d_i \, t_j \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5 ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $t_j = j$.<br /> <br /> [[File:markov_2.png|link=]]<br /> <br /> In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$).<br /> Transition probabilities are therefore constant over time. Here, $q^{1,2}=1/(1+\exp(2.5))=0.0759$ and $q^{2,1}=1/(1+\exp(2))=0.1192$. As $q^{1,2}$ and $q^{2,1}$ are small with $q^{1,2}&lt;q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix $P$: $\prob{y_{ij}=1}=0.611$ and $\prob{y_{ij}=2}=0.389$.<br /> The figure (left) displays the transition rates $q^{1,2}$ and $q^{2,1}$ as function of the time (top left) and two simulated sequences of states (centre and bottom left).<br /> <br /> In the second example (center), $b_i$ and $d_i$ are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.<br /> <br /> In the third example (right), now $b_i$ and $d_i$ are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1.<br /> Note that the value of $a_i$ (resp. $c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time $t=0$.<br /> <br /> Different choices can be made for defining an initial distribution $\pi_{i,1}$:<br /> <br /> <br /> &lt;ul&gt;<br /> * The initial state can be defined arbitrarily: $y_{i,1}=k_0$. This means that $\pi_{i,1}^{k_0} = 1$ and $\pi_{i,1}^{k} = 0$ for $k\neq k_0$.<br /> &lt;br&gt;<br /> <br /> * More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution $\pi_{i,1}^{k} = 1/K$ for $k=1,2,\ldots , K$.<br /> &lt;br&gt;<br /> <br /> * If a transition matrix $Q_{i1}$ has been defined at time $t_1$, we might consider using its stationary distribution, i.e., taking for $\pi_{i,1}$ the solution to:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pi_{i,1} = \pi_{i,1} Q_{i1} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Continuous time Markov chains ===<br /> <br /> <br /> <br /> The previous situation can be extended to the case where observation times are irregular, by modeling the<br /> sequence of states as a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(t+h) = k\, {{!}} \,y_{i}(t)=\ell , \psi_i} = h \, \rho_{i}^{\ell,k}(t) + o(h),\quad k \neq \ell .<br /> &lt;/math&gt; }}<br /> <br /> The probability that no transition happens between $t$ and $t+h$ is<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(s) = \ell, \forall s\in(t, t+h) \ {{!}} \ y_{i}(t)=\ell , \psi_i} = e^{h \, \rho_{i}^{\ell,\ell}(t)} . <br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text= <br /> A model for independent categorical data is completely defined by:<br /> <br /> &lt;ul&gt;<br /> &lt;li&gt;The probability mass functions $\left(\prob{y_{ij} = k {{!}} \psi_i} \right)$<br /> &lt;li&gt; (or) the cumulative probability functions $\left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right)$ for ordinal data<br /> &lt;li&gt; (or) the cumulative logits $\left(\logit \left( \prob{y_{ij} \leq k {{!}} \psi_i} \right)\right)$ for a proportional odds model<br /> &lt;/ul&gt;<br /> <br /> <br /> A model for categorical data with Markovian dependency is completely defined by:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the probability transitions in the case of a discrete-time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]&lt;/li&gt;<br /> <br /> &lt;li&gt; (or) the transition rates in the case of a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]&lt;/li&gt;<br /> <br /> &lt;li&gt; the probability distribution of the initial states&lt;/li&gt;<br /> &lt;/ol&gt;<br /> }}<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == $\mlxtran$ for categorical data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2= $\quad y_{ij} \in \{0, 1, 2\}$<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (V_i, k_i, \alpha_{0,i}, \alpha_{1,i}, \gamma_i) \\[0.2cm]<br /> D &amp;=&amp;100 \\<br /> C(t,\psi_i) &amp;=&amp; \frac{D_i}{V_i} e^{-k_i \, t} \\[0.2cm]<br /> \prob{y_{ij}\leq 0} &amp;=&amp; \alpha_{0,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 1} &amp;=&amp; \alpha_{0,i} + \alpha_{1,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 2} &amp;=&amp; 1<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt;<br /> INPUT:<br /> input = {V, k, alpha0, alpha1, gamma}<br /> <br /> EQUATION:<br /> D = 100<br /> C = D/V*exp(-k*t)<br /> p0 = alpha0 + gamma*C<br /> p1 = p0 + alpha1<br /> <br /> DEFINITION:<br /> y = {type=categorical,<br /> categories={0, 1, 2},<br /> P(y&lt;=0)=p0,<br /> P(y&lt;=1)=p1<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2= $\quad$ 2-state discrete-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i) \\[0.2cm]<br /> \logit(p_{ij}^{12}) &amp;=&amp; a_i+b_i \, t_{ij} \\<br /> \logit(p_{ij}^{21}) &amp;=&amp; c_i+d_i \, t_{ij} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = 0.5<br /> logit(P(Y=2 | Y_p=1)) = a + b*t<br /> logit(P(Y=1 | Y_p=2)) = c + d*t<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 3:<br /> |title2= $\quad$ 2-state continuous-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i,\pi_i) \\[0.2cm]<br /> q_{i}^{12}(t) &amp;=&amp; e^{a_i+b_i \, t} \\<br /> q_{i}^{21}(t) &amp;=&amp; e^{c_i+d_i \, t} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; \pi_i<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d, pi}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = pi<br /> transitionRate(1,2) = exp(a + b*t)<br /> transitionRate(2,1) = exp(c + d*t)<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> == Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{agresti2010analysis,<br /> title={Analysis of ordinal categorical data},<br /> author={Agresti, A.},<br /> volume={656},<br /> year={2010},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{agresti2007introduction,<br /> title={An introduction to categorical data analysis},<br /> author={Agresti, A.},<br /> volume={423},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D. M.},<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang., J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications.},<br /> publisher = {Springer Series in Statistics},<br /> volume = {},<br /> pages = {},<br /> year = {2007},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> <br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> year={2011},<br /> publisher={Wiley}<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{powers2008statistical,<br /> title={Statistical methods for categorical data analysis},<br /> author={Powers, D. A. and Xie, Y.},<br /> year={2008},<br /> publisher={Emerald Group Publishing}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Models for count data<br /> |linkNext=Models for time-to-event data }}</div> Admin http://wiki.webpopix.org/index.php/Model_for_categorical_data Model for categorical data 2013-06-07T13:46:24Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Overview == <br /> <br /> Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$.<br /> Considering the observations $(y_{ij}, 1 \leq j \leq n_i)$ of any individual $i$ as a sequence of independent random variables, the model is completely defined by the probability mass functions $\prob{y_{ij}=c_k | \psi_i}$, for $k=1,\ldots, K$ and $1 \leq j \leq n_i$.<br /> <br /> For a given $(i,j)$, the sum of the $K$ probabilities is 1, so in fact only $K-1$ of them need to be defined.<br /> <br /> In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\prob{y_{ij}=c_k | \psi_i} \in [0,1]$, and $\sum_{k=1}^{K} \prob{y_{ij}=c_k | \psi_i} = 1$. For instance, we could define $K$ time-dependent parametric functions $a_1$, $a_2$, ..., $a_K$ and set for any individual $i$, time $t_{ij}$ and $k \in \{1,\ldots,K\}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;categorical1&quot; &gt;&lt;math&gt; <br /> \prob{y_{ij}=c_k {{!}} \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= Suppose we want to model binary data, i.e., data where $y_{ij} \in \{0,1\}$.<br /> <br /> Let $\psi_i=(\alpha_i,\beta_i)$ and let $a_1(t,\psi_i)=0$ and $a_2(t,\psi_i) = \alpha_i + \beta_i \, t$. Then, [[#categorical1|(1)]] gives a probability distribution for binary outcomes:<br /> <br /> {{Equation1|equation= &lt;math&gt;<br /> \prob{y_{ij}=0 {{!}} \psi_i} = \displaystyle{\frac{1}{1 + e^{\alpha_i + \beta_i \, t_{ij} } } } \quad \ \ \ \text{and} \quad<br /> \ \ \ \prob{y_{ij}=1 {{!}} \psi_i} = \displaystyle{\frac{e^{\alpha_i + \beta_i \, t_{ij} } }{1 + e^{\alpha_i + \beta_i \, t_{ij} } } }. <br /> &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> Such parametrizations are extremely flexible and easy to interpret in simple situations.<br /> In the previous example for instance, $\prob{y_{ij}=1 | \psi_i}$ and $a_2(t_{ij},\psi_i)$ move in the same direction as time increases.<br /> <br /> <br /> &lt;br&gt;<br /> == Ordinal data ==<br /> <br /> <br /> Ordinal data further assumes that the categories are ordered, i.e., there exists an order $\prec$ such that<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> c_1 \prec c_2,\prec \ldots \prec c_K .<br /> &lt;/math&gt;}}<br /> <br /> We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.<br /> <br /> Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\prob{y_{ij} \preceq c_k | \psi_i}$ for $k=1,\ldots ,K-1$, or in the other direction: $\prob{y_{ij} \succeq c_k | \psi_i}$ for $k=2,\ldots, K$. <br /> Any model is possible as long as it defines a probability distribution, i.e., satisfies:<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> 0 \leq \prob{y_{ij} \preceq c_1 {{!}} \psi_i} \leq \prob{y_{ij} \preceq c_2 {{!}} \bpsi_i} \leq \ldots \leq \prob{y_{ij} \preceq c_K {{!}} \psi_i} =1 .<br /> &lt;/math&gt; }}<br /> <br /> Without any loss of generality, we will consider numerical categories in what follows. The order $\prec$ then reduces to the usual order $&lt;$ on $\Rset$.<br /> Currently, the most popular model for ordinal data is the proportional odds model which uses ''logits'' of these cumulative probabilities, also called ''cumulative logits''. We assume that there exist $\alpha_{i,1}\geq0$, $\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K-1}\geq 0$ such that for $k=1,2,\ldots,K-1$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $x(t_{ij})$ is a vector of regression variables and $\beta_i$ a vector of coefficients. Here, $\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K-1},\beta_i)$.<br /> <br /> Recall that $\logit(p) = \log\left(p/(1-p)\right)$. Then, the probability defined in [[#propodds_model|(2)]] can also be expressed as<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> \prob{y_{ij} \leq c_k {{!}} \bpsi_i} = \displaystyle{\frac{1}{1 + e^{ \left(\sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij})} } }.<br /> &lt;/math&gt;}} <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We give to patients a drug which is supposed to decrease the level of a given type of pain. <br /> The level of pain is measured on a scale from 1 to 3: 1=low, 2=moderate, 3=high. We consider the following model with the constraint that $\alpha_{i2}\geq 0$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit \left(\prob{y_{ij} \leq 1 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \beta_{i,1}\, t_{ij} + \beta_{i,2}\, C_{ij} \\<br /> \logit \left(\prob{y_{ij} \leq 2 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \alpha_{i,2} + \beta_{i,1}\, t_{ij} + \beta_{i2}\, C_{ij} \\<br /> \prob{y_{ij} \leq 3 {{!}} \psi_i} &amp;=&amp; 1,<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> where $C_{ij}$ is the concentration of the drug at time $t_{ij}$. The model parameters are quite easy to explain:<br /> <br /> <br /> * $\beta_{i,1}=0$ means that without treatment, the level of pain tends to remains stable over time.<br /> * $\beta_{i,1}&lt;0$ (resp. $\beta_{i1}&gt;0$) means that the pain tends to increase (resp. decrease) over time.<br /> * $\beta_{i,2}=0$ means that the drug has no effect on pain.<br /> * $\beta_{i,2}&gt;0$ means that the level of pain tends to decrease when the drug concentration increases, whereas $\beta_{i2}&lt;0$ means that pain is an adverse drug effect.<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Exclusive use of linear models (or generalized linear models) has no real justification today since very efficient tools are available for nonlinear models.<br /> Model [[#propodds_model|(2)]] can be easily extended to a nonlinear model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model2&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq k {{!}} \psi_i } \right) = \sum_{m=1}^k \alpha_{i,m} + \beta(x(t_{ij})) , &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> where $\beta$ is any (linear or nonlinear) function of $x(t_{ij})$. }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Markovian dependence ==<br /> <br /> <br /> For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.<br /> <br /> We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j-1}$. i.e., for all $k=1,2,\ldots ,K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y_{i,j} = k\, {{!}} \,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i} = \prob{y_{i,j} = k {{!}} y_{i,j-1},\psi_i}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> === Discrete time Markov chains ===<br /> <br /> If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$ of the first observation $y_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{y_{i,1} = k {{!}} \psi_i} &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j}, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> q_{i,j}^{\ell,k} &amp;=&amp; \prob{y_{i,j} = k {{!}} y_{i,j-1}=\ell , \psi_i} \quad \text{ for all } (\ell,k),\\<br /> \sum_{k=1}^{K}q_{ij}^{\ell,k} &amp;=&amp; 1 \quad \text{ for all } (\ell,k).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of $y_i=(y_{i,j}, j=1,2,\ldots, n_i)$ is then well-defined:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \pmacro(y_{i,1}{{!}}\psi_i) \prod_{j=2}^{n_i} \pmacro(y_{i,j} {{!}} y_{i,j-1},\psi_i) .<br /> &lt;/math&gt; }}<br /> <br /> For a given individual $i$, $Q_{i,j}$ defines the transition probabilities between states at a given time $t_{ij}$:<br /> <br /> <br /> ::[[File:markov_1.png|link=]]<br /> <br /> <br /> Our model must therefore give, for each individual $i$, the distribution of first observation $(y_{i,1})$ and a description of how the transition probabilities evolve with time.<br /> <br /> The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit\left(q_{i,j}^{1,2}\right) &amp;=&amp; a_i+b_i \, t_j \\<br /> \logit\left(q_{i,j}^{2,1}\right) &amp;=&amp; c_i+d_i \, t_j \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5 ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $t_j = j$.<br /> <br /> [[File:markov_2.png|link=]]<br /> <br /> In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$).<br /> Transition probabilities are therefore constant over time. Here, $q^{1,2}=1/(1+\exp(2.5))=0.0759$ and $q^{2,1}=1/(1+\exp(2))=0.1192$. As $q^{1,2}$ and $q^{2,1}$ are small with $q^{1,2}&lt;q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix $P$: $\prob{y_{ij}=1}=0.611$ and $\prob{y_{ij}=2}=0.389$.<br /> The figure (left) displays the transition rates $q^{1,2}$ and $q^{2,1}$ as function of the time (top left) and two simulated sequences of states (centre and bottom left).<br /> <br /> In the second example (center), $b_i$ and $d_i$ are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.<br /> <br /> In the third example (right), now $b_i$ and $d_i$ are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1.<br /> Note that the value of $a_i$ (resp. $c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time $t=0$.<br /> <br /> Different choices can be made for defining an initial distribution $\pi_{i,1}$:<br /> <br /> <br /> &lt;ul&gt;<br /> * The initial state can be defined arbitrarily: $y_{i,1}=k_0$. This means that $\pi_{i,1}^{k_0} = 1$ and $\pi_{i,1}^{k} = 0$ for $k\neq k_0$.<br /> &lt;br&gt;<br /> <br /> * More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution $\pi_{i,1}^{k} = 1/K$ for $k=1,2,\ldots , K$.<br /> &lt;br&gt;<br /> <br /> * If a transition matrix $Q_{i1}$ has been defined at time $t_1$, we might consider using its stationary distribution, i.e., taking for $\pi_{i,1}$ the solution to:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pi_{i,1} = \pi_{i,1} Q_{i1} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Continuous time Markov chains ===<br /> <br /> <br /> <br /> The previous situation can be extended to the case where observation times are irregular, by modeling the<br /> sequence of states as a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(t+h) = k\, {{!}} \,y_{i}(t)=\ell , \psi_i} = h \, \rho_{i}^{\ell,k}(t) + o(h),\quad k \neq \ell .<br /> &lt;/math&gt; }}<br /> <br /> The probability that no transition happens between $t$ and $t+h$ is<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(s) = \ell, \forall s\in(t, t+h) \ {{!}} \ y_{i}(t)=\ell , \psi_i} = e^{h \, \rho_{i}^{\ell,\ell}(t)} . <br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text= <br /> A model for independent categorical data is completely defined by:<br /> <br /> &lt;ul&gt;<br /> &lt;li&gt;The probability mass functions $\left(\prob{y_{ij} = k {{!}} \psi_i} \right)$<br /> &lt;li&gt; (or) the cumulative probability functions $\left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right)$ for ordinal data<br /> &lt;li&gt; (or) the cumulative logits $\left(\logit \left( \prob{y_{ij} \leq k {{!}} \psi_i} \right)\right)$ for a proportional odds model<br /> &lt;/ul&gt;<br /> <br /> <br /> A model for categorical data with Markovian dependency is completely defined by:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the probability transitions in the case of a discrete-time http://en.wikipedia.org/wiki/Markov_chain Markov chain]&lt;/li&gt;<br /> <br /> &lt;li&gt; (or) the transition rates in the case of a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]&lt;/li&gt;<br /> <br /> &lt;li&gt; the probability distribution of the initial states&lt;/li&gt;<br /> &lt;/ol&gt;<br /> }}<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == $\mlxtran$ for categorical data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2= $\quad y_{ij} \in \{0, 1, 2\}$<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (V_i, k_i, \alpha_{0,i}, \alpha_{1,i}, \gamma_i) \\[0.2cm]<br /> D &amp;=&amp;100 \\<br /> C(t,\psi_i) &amp;=&amp; \frac{D_i}{V_i} e^{-k_i \, t} \\[0.2cm]<br /> \prob{y_{ij}\leq 0} &amp;=&amp; \alpha_{0,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 1} &amp;=&amp; \alpha_{0,i} + \alpha_{1,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 2} &amp;=&amp; 1<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt;<br /> INPUT:<br /> input = {V, k, alpha0, alpha1, gamma}<br /> <br /> EQUATION:<br /> D = 100<br /> C = D/V*exp(-k*t)<br /> p0 = alpha0 + gamma*C<br /> p1 = p0 + alpha1<br /> <br /> DEFINITION:<br /> y = {type=categorical,<br /> categories={0, 1, 2},<br /> P(y&lt;=0)=p0,<br /> P(y&lt;=1)=p1<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2= $\quad$ 2-state discrete-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i) \\[0.2cm]<br /> \logit(p_{ij}^{12}) &amp;=&amp; a_i+b_i \, t_{ij} \\<br /> \logit(p_{ij}^{21}) &amp;=&amp; c_i+d_i \, t_{ij} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = 0.5<br /> logit(P(Y=2 | Y_p=1)) = a + b*t<br /> logit(P(Y=1 | Y_p=2)) = c + d*t<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 3:<br /> |title2= $\quad$ 2-state continuous-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i,\pi_i) \\[0.2cm]<br /> q_{i}^{12}(t) &amp;=&amp; e^{a_i+b_i \, t} \\<br /> q_{i}^{21}(t) &amp;=&amp; e^{c_i+d_i \, t} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; \pi_i<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d, pi}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = pi<br /> transitionRate(1,2) = exp(a + b*t)<br /> transitionRate(2,1) = exp(c + d*t)<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> == Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{agresti2010analysis,<br /> title={Analysis of ordinal categorical data},<br /> author={Agresti, A.},<br /> volume={656},<br /> year={2010},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{agresti2007introduction,<br /> title={An introduction to categorical data analysis},<br /> author={Agresti, A.},<br /> volume={423},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D. M.},<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang., J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications.},<br /> publisher = {Springer Series in Statistics},<br /> volume = {},<br /> pages = {},<br /> year = {2007},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> <br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> year={2011},<br /> publisher={Wiley}<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{powers2008statistical,<br /> title={Statistical methods for categorical data analysis},<br /> author={Powers, D. A. and Xie, Y.},<br /> year={2008},<br /> publisher={Emerald Group Publishing}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Models for count data<br /> |linkNext=Models for time-to-event data }}</div> Admin http://wiki.webpopix.org/index.php/Model_for_categorical_data Model for categorical data 2013-06-07T13:46:14Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Overview == <br /> <br /> Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$.<br /> Considering the observations $(y_{ij}, 1 \leq j \leq n_i)$ of any individual $i$ as a sequence of independent random variables, the model is completely defined by the probability mass functions $\prob{y_{ij}=c_k | \psi_i}$, for $k=1,\ldots, K$ and $1 \leq j \leq n_i$.<br /> <br /> For a given $(i,j)$, the sum of the $K$ probabilities is 1, so in fact only $K-1$ of them need to be defined.<br /> <br /> In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\prob{y_{ij}=c_k | \psi_i} \in [0,1]$, and $\sum_{k=1}^{K} \prob{y_{ij}=c_k | \psi_i} = 1$. For instance, we could define $K$ time-dependent parametric functions $a_1$, $a_2$, ..., $a_K$ and set for any individual $i$, time $t_{ij}$ and $k \in \{1,\ldots,K\}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;categorical1&quot; &gt;&lt;math&gt; <br /> \prob{y_{ij}=c_k {{!}} \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= Suppose we want to model binary data, i.e., data where $y_{ij} \in \{0,1\}$.<br /> <br /> Let $\psi_i=(\alpha_i,\beta_i)$ and let $a_1(t,\psi_i)=0$ and $a_2(t,\psi_i) = \alpha_i + \beta_i \, t$. Then, [[#categorical1|(1)]] gives a probability distribution for binary outcomes:<br /> <br /> {{Equation1|equation= &lt;math&gt;<br /> \prob{y_{ij}=0 {{!}} \psi_i} = \displaystyle{\frac{1}{1 + e^{\alpha_i + \beta_i \, t_{ij} } } } \quad \ \ \ \text{and} \quad<br /> \ \ \ \prob{y_{ij}=1 {{!}} \psi_i} = \displaystyle{\frac{e^{\alpha_i + \beta_i \, t_{ij} } }{1 + e^{\alpha_i + \beta_i \, t_{ij} } } }. <br /> &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> Such parametrizations are extremely flexible and easy to interpret in simple situations.<br /> In the previous example for instance, $\prob{y_{ij}=1 | \psi_i}$ and $a_2(t_{ij},\psi_i)$ move in the same direction as time increases.<br /> <br /> <br /> &lt;br&gt;<br /> == Ordinal data ==<br /> <br /> <br /> Ordinal data further assumes that the categories are ordered, i.e., there exists an order $\prec$ such that<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> c_1 \prec c_2,\prec \ldots \prec c_K .<br /> &lt;/math&gt;}}<br /> <br /> We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.<br /> <br /> Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\prob{y_{ij} \preceq c_k | \psi_i}$ for $k=1,\ldots ,K-1$, or in the other direction: $\prob{y_{ij} \succeq c_k | \psi_i}$ for $k=2,\ldots, K$. <br /> Any model is possible as long as it defines a probability distribution, i.e., satisfies:<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> 0 \leq \prob{y_{ij} \preceq c_1 {{!}} \psi_i} \leq \prob{y_{ij} \preceq c_2 {{!}} \bpsi_i} \leq \ldots \leq \prob{y_{ij} \preceq c_K {{!}} \psi_i} =1 .<br /> &lt;/math&gt; }}<br /> <br /> Without any loss of generality, we will consider numerical categories in what follows. The order $\prec$ then reduces to the usual order $&lt;$ on $\Rset$.<br /> Currently, the most popular model for ordinal data is the proportional odds model which uses ''logits'' of these cumulative probabilities, also called ''cumulative logits''. We assume that there exist $\alpha_{i,1}\geq0$, $\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K-1}\geq 0$ such that for $k=1,2,\ldots,K-1$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $x(t_{ij})$ is a vector of regression variables and $\beta_i$ a vector of coefficients. Here, $\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K-1},\beta_i)$.<br /> <br /> Recall that $\logit(p) = \log\left(p/(1-p)\right)$. Then, the probability defined in [[#propodds_model|(2)]] can also be expressed as<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> \prob{y_{ij} \leq c_k {{!}} \bpsi_i} = \displaystyle{\frac{1}{1 + e^{ \left(\sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij})} } }.<br /> &lt;/math&gt;}} <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We give to patients a drug which is supposed to decrease the level of a given type of pain. <br /> The level of pain is measured on a scale from 1 to 3: 1=low, 2=moderate, 3=high. We consider the following model with the constraint that $\alpha_{i2}\geq 0$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit \left(\prob{y_{ij} \leq 1 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \beta_{i,1}\, t_{ij} + \beta_{i,2}\, C_{ij} \\<br /> \logit \left(\prob{y_{ij} \leq 2 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \alpha_{i,2} + \beta_{i,1}\, t_{ij} + \beta_{i2}\, C_{ij} \\<br /> \prob{y_{ij} \leq 3 {{!}} \psi_i} &amp;=&amp; 1,<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> where $C_{ij}$ is the concentration of the drug at time $t_{ij}$. The model parameters are quite easy to explain:<br /> <br /> <br /> * $\beta_{i,1}=0$ means that without treatment, the level of pain tends to remains stable over time.<br /> * $\beta_{i,1}&lt;0$ (resp. $\beta_{i1}&gt;0$) means that the pain tends to increase (resp. decrease) over time.<br /> * $\beta_{i,2}=0$ means that the drug has no effect on pain.<br /> * $\beta_{i,2}&gt;0$ means that the level of pain tends to decrease when the drug concentration increases, whereas $\beta_{i2}&lt;0$ means that pain is an adverse drug effect.<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Exclusive use of linear models (or generalized linear models) has no real justification today since very efficient tools are available for nonlinear models.<br /> Model [[#propodds_model|(2)]] can be easily extended to a nonlinear model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model2&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq k {{!}} \psi_i } \right) = \sum_{m=1}^k \alpha_{i,m} + \beta(x(t_{ij})) , &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> where $\beta$ is any (linear or nonlinear) function of $x(t_{ij})$. }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Markovian dependence ==<br /> <br /> <br /> For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.<br /> <br /> We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j-1}$. i.e., for all $k=1,2,\ldots ,K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y_{i,j} = k\, {{!}} \,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i} = \prob{y_{i,j} = k {{!}} y_{i,j-1},\psi_i}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> === Discrete time Markov chains ===<br /> <br /> If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$ of the first observation $y_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{y_{i,1} = k {{!}} \psi_i} &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j}, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> q_{i,j}^{\ell,k} &amp;=&amp; \prob{y_{i,j} = k {{!}} y_{i,j-1}=\ell , \psi_i} \quad \text{ for all } (\ell,k),\\<br /> \sum_{k=1}^{K}q_{ij}^{\ell,k} &amp;=&amp; 1 \quad \text{ for all } (\ell,k).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of $y_i=(y_{i,j}, j=1,2,\ldots, n_i)$ is then well-defined:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \pmacro(y_{i,1}{{!}}\psi_i) \prod_{j=2}^{n_i} \pmacro(y_{i,j} {{!}} y_{i,j-1},\psi_i) .<br /> &lt;/math&gt; }}<br /> <br /> For a given individual $i$, $Q_{i,j}$ defines the transition probabilities between states at a given time $t_{ij}$:<br /> <br /> <br /> ::[[File:markov_1.png|link=]]<br /> <br /> <br /> Our model must therefore give, for each individual $i$, the distribution of first observation $(y_{i,1})$ and a description of how the transition probabilities evolve with time.<br /> <br /> The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit\left(q_{i,j}^{1,2}\right) &amp;=&amp; a_i+b_i \, t_j \\<br /> \logit\left(q_{i,j}^{2,1}\right) &amp;=&amp; c_i+d_i \, t_j \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5 ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $t_j = j$.<br /> <br /> [[File:markov_2.png|link=]]<br /> <br /> In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$).<br /> Transition probabilities are therefore constant over time. Here, $q^{1,2}=1/(1+\exp(2.5))=0.0759$ and $q^{2,1}=1/(1+\exp(2))=0.1192$. As $q^{1,2}$ and $q^{2,1}$ are small with $q^{1,2}&lt;q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix $P$: $\prob{y_{ij}=1}=0.611$ and $\prob{y_{ij}=2}=0.389$.<br /> The figure (left) displays the transition rates $q^{1,2}$ and $q^{2,1}$ as function of the time (top left) and two simulated sequences of states (centre and bottom left).<br /> <br /> In the second example (center), $b_i$ and $d_i$ are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.<br /> <br /> In the third example (right), now $b_i$ and $d_i$ are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1.<br /> Note that the value of $a_i$ (resp. $c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time $t=0$.<br /> <br /> Different choices can be made for defining an initial distribution $\pi_{i,1}$:<br /> <br /> <br /> &lt;ul&gt;<br /> * The initial state can be defined arbitrarily: $y_{i,1}=k_0$. This means that $\pi_{i,1}^{k_0} = 1$ and $\pi_{i,1}^{k} = 0$ for $k\neq k_0$.<br /> &lt;br&gt;<br /> <br /> * More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution $\pi_{i,1}^{k} = 1/K$ for $k=1,2,\ldots , K$.<br /> &lt;br&gt;<br /> <br /> * If a transition matrix $Q_{i1}$ has been defined at time $t_1$, we might consider using its stationary distribution, i.e., taking for $\pi_{i,1}$ the solution to:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pi_{i,1} = \pi_{i,1} Q_{i1} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Continuous time Markov chains ===<br /> <br /> <br /> <br /> The previous situation can be extended to the case where observation times are irregular, by modeling the<br /> sequence of states as a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(t+h) = k\, {{!}} \,y_{i}(t)=\ell , \psi_i} = h \, \rho_{i}^{\ell,k}(t) + o(h),\quad k \neq \ell .<br /> &lt;/math&gt; }}<br /> <br /> The probability that no transition happens between $t$ and $t+h$ is<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(s) = \ell, \forall s\in(t, t+h) \ {{!}} \ y_{i}(t)=\ell , \psi_i} = e^{h \, \rho_{i}^{\ell,\ell}(t)} . <br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text= <br /> A model for independent categorical data is completely defined by:<br /> <br /> &lt;ul&gt;<br /> &lt;li&gt;The probability mass functions $\left(\prob{y_{ij} = k {{!}} \psi_i} \right)$<br /> &lt;li&gt; (or) the cumulative probability functions $\left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right)$ for ordinal data<br /> &lt;li&gt; (or) the cumulative logits $\left(\logit \left( \prob{y_{ij} \leq k {{!}} \psi_i} \right)\right)$ for a proportional odds model<br /> &lt;/ul&gt;<br /> <br /> <br /> A model for categorical data with Markovian dependency is completely defined by:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the probability transitions in the case of a discrete-time http://en.wikipedia.org/wiki/Markov_chain Markov chain]&lt;/li&gt;<br /> <br /> &lt;li&gt; (or) the transition rates in the case of a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]&lt;/li&gt;<br /> <br /> &lt;li&gt; the probability distribution of the initial states&lt;/li&gt;<br /> &lt;/ol&gt;<br /> }}<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == $\mlxtran$ for categorical data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2= $\quad y_{ij} \in \{0, 1, 2\}$<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (V_i, k_i, \alpha_{0,i}, \alpha_{1,i}, \gamma_i) \\[0.2cm]<br /> D &amp;=&amp;100 \\<br /> C(t,\psi_i) &amp;=&amp; \frac{D_i}{V_i} e^{-k_i \, t} \\[0.2cm]<br /> \prob{y_{ij}\leq 0} &amp;=&amp; \alpha_{0,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 1} &amp;=&amp; \alpha_{0,i} + \alpha_{1,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 2} &amp;=&amp; 1<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt;<br /> INPUT:<br /> input = {V, k, alpha0, alpha1, gamma}<br /> <br /> EQUATION:<br /> D = 100<br /> C = D/V*exp(-k*t)<br /> p0 = alpha0 + gamma*C<br /> p1 = p0 + alpha1<br /> <br /> DEFINITION:<br /> y = {type=categorical,<br /> categories={0, 1, 2},<br /> P(y&lt;=0)=p0,<br /> P(y&lt;=1)=p1<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2= $\quad$ 2-state discrete-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i) \\[0.2cm]<br /> \logit(p_{ij}^{12}) &amp;=&amp; a_i+b_i \, t_{ij} \\<br /> \logit(p_{ij}^{21}) &amp;=&amp; c_i+d_i \, t_{ij} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = 0.5<br /> logit(P(Y=2 | Y_p=1)) = a + b*t<br /> logit(P(Y=1 | Y_p=2)) = c + d*t<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 3:<br /> |title2= $\quad$ 2-state continuous-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i,\pi_i) \\[0.2cm]<br /> q_{i}^{12}(t) &amp;=&amp; e^{a_i+b_i \, t} \\<br /> q_{i}^{21}(t) &amp;=&amp; e^{c_i+d_i \, t} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; \pi_i<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d, pi}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = pi<br /> transitionRate(1,2) = exp(a + b*t)<br /> transitionRate(2,1) = exp(c + d*t)<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> == Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{agresti2010analysis,<br /> title={Analysis of ordinal categorical data},<br /> author={Agresti, A.},<br /> volume={656},<br /> year={2010},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{agresti2007introduction,<br /> title={An introduction to categorical data analysis},<br /> author={Agresti, A.},<br /> volume={423},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D. M.},<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang., J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications.},<br /> publisher = {Springer Series in Statistics},<br /> volume = {},<br /> pages = {},<br /> year = {2007},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> <br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> year={2011},<br /> publisher={Wiley}<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{powers2008statistical,<br /> title={Statistical methods for categorical data analysis},<br /> author={Powers, D. A. and Xie, Y.},<br /> year={2008},<br /> publisher={Emerald Group Publishing}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Models for count data<br /> |linkNext=Models for time-to-event data }}</div> Admin http://wiki.webpopix.org/index.php/Model_for_categorical_data Model for categorical data 2013-06-07T13:45:53Z <p>Admin : /* Markovian dependence */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == Overview == <br /> <br /> Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$.<br /> Considering the observations $(y_{ij}, 1 \leq j \leq n_i)$ of any individual $i$ as a sequence of independent random variables, the model is completely defined by the probability mass functions $\prob{y_{ij}=c_k | \psi_i}$, for $k=1,\ldots, K$ and $1 \leq j \leq n_i$.<br /> <br /> For a given $(i,j)$, the sum of the $K$ probabilities is 1, so in fact only $K-1$ of them need to be defined.<br /> <br /> In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\prob{y_{ij}=c_k | \psi_i} \in [0,1]$, and $\sum_{k=1}^{K} \prob{y_{ij}=c_k | \psi_i} = 1$. For instance, we could define $K$ time-dependent parametric functions $a_1$, $a_2$, ..., $a_K$ and set for any individual $i$, time $t_{ij}$ and $k \in \{1,\ldots,K\}$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;categorical1&quot; &gt;&lt;math&gt; <br /> \prob{y_{ij}=c_k {{!}} \psi_i} = \displaystyle{\frac{e^{a_k(t_{ij},\psi_i)} }{\sum_{m=1}^K e^{a_m(t_{ij},\psi_i)} } }. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= Suppose we want to model binary data, i.e., data where $y_{ij} \in \{0,1\}$.<br /> <br /> Let $\psi_i=(\alpha_i,\beta_i)$ and let $a_1(t,\psi_i)=0$ and $a_2(t,\psi_i) = \alpha_i + \beta_i \, t$. Then, [[#categorical1|(1)]] gives a probability distribution for binary outcomes:<br /> <br /> {{Equation1|equation= &lt;math&gt;<br /> \prob{y_{ij}=0 {{!}} \psi_i} = \displaystyle{\frac{1}{1 + e^{\alpha_i + \beta_i \, t_{ij} } } } \quad \ \ \ \text{and} \quad<br /> \ \ \ \prob{y_{ij}=1 {{!}} \psi_i} = \displaystyle{\frac{e^{\alpha_i + \beta_i \, t_{ij} } }{1 + e^{\alpha_i + \beta_i \, t_{ij} } } }. <br /> &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> Such parametrizations are extremely flexible and easy to interpret in simple situations.<br /> In the previous example for instance, $\prob{y_{ij}=1 | \psi_i}$ and $a_2(t_{ij},\psi_i)$ move in the same direction as time increases.<br /> <br /> <br /> &lt;br&gt;<br /> == Ordinal data ==<br /> <br /> <br /> Ordinal data further assumes that the categories are ordered, i.e., there exists an order $\prec$ such that<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> c_1 \prec c_2,\prec \ldots \prec c_K .<br /> &lt;/math&gt;}}<br /> <br /> We can think for instance of levels of pain (low, moderate, severe), or any scores on a discrete scale, e.g., from 1 to 10.<br /> <br /> Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\prob{y_{ij} \preceq c_k | \psi_i}$ for $k=1,\ldots ,K-1$, or in the other direction: $\prob{y_{ij} \succeq c_k | \psi_i}$ for $k=2,\ldots, K$. <br /> Any model is possible as long as it defines a probability distribution, i.e., satisfies:<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> 0 \leq \prob{y_{ij} \preceq c_1 {{!}} \psi_i} \leq \prob{y_{ij} \preceq c_2 {{!}} \bpsi_i} \leq \ldots \leq \prob{y_{ij} \preceq c_K {{!}} \psi_i} =1 .<br /> &lt;/math&gt; }}<br /> <br /> Without any loss of generality, we will consider numerical categories in what follows. The order $\prec$ then reduces to the usual order $&lt;$ on $\Rset$.<br /> Currently, the most popular model for ordinal data is the proportional odds model which uses ''logits'' of these cumulative probabilities, also called ''cumulative logits''. We assume that there exist $\alpha_{i,1}\geq0$, $\alpha_{i,2}\geq 0, \ldots , \alpha_{i,K-1}\geq 0$ such that for $k=1,2,\ldots,K-1$,<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right) = \left( \sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij}) ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> where $x(t_{ij})$ is a vector of regression variables and $\beta_i$ a vector of coefficients. Here, $\bpsi_i=(\alpha_{i1},\alpha_{i2},\ldots,\alpha_{i,K-1},\beta_i)$.<br /> <br /> Recall that $\logit(p) = \log\left(p/(1-p)\right)$. Then, the probability defined in [[#propodds_model|(2)]] can also be expressed as<br /> <br /> {{Equation1|equation=&lt;math&gt;<br /> \prob{y_{ij} \leq c_k {{!}} \bpsi_i} = \displaystyle{\frac{1}{1 + e^{ \left(\sum_{m=1}^k \alpha_{im}\right) + \beta_i \, x(t_{ij})} } }.<br /> &lt;/math&gt;}} <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text= We give to patients a drug which is supposed to decrease the level of a given type of pain. <br /> The level of pain is measured on a scale from 1 to 3: 1=low, 2=moderate, 3=high. We consider the following model with the constraint that $\alpha_{i2}\geq 0$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit \left(\prob{y_{ij} \leq 1 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \beta_{i,1}\, t_{ij} + \beta_{i,2}\, C_{ij} \\<br /> \logit \left(\prob{y_{ij} \leq 2 {{!}} \psi_i}\right) &amp;=&amp; \alpha_{i,1} + \alpha_{i,2} + \beta_{i,1}\, t_{ij} + \beta_{i2}\, C_{ij} \\<br /> \prob{y_{ij} \leq 3 {{!}} \psi_i} &amp;=&amp; 1,<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> where $C_{ij}$ is the concentration of the drug at time $t_{ij}$. The model parameters are quite easy to explain:<br /> <br /> <br /> * $\beta_{i,1}=0$ means that without treatment, the level of pain tends to remains stable over time.<br /> * $\beta_{i,1}&lt;0$ (resp. $\beta_{i1}&gt;0$) means that the pain tends to increase (resp. decrease) over time.<br /> * $\beta_{i,2}=0$ means that the drug has no effect on pain.<br /> * $\beta_{i,2}&gt;0$ means that the level of pain tends to decrease when the drug concentration increases, whereas $\beta_{i2}&lt;0$ means that pain is an adverse drug effect.<br /> }}<br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text= Exclusive use of linear models (or generalized linear models) has no real justification today since very efficient tools are available for nonlinear models.<br /> Model [[#propodds_model|(2)]] can be easily extended to a nonlinear model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;propodds_model2&quot;&gt;&lt;math&gt; \logit \left(\prob{y_{ij} \leq k {{!}} \psi_i } \right) = \sum_{m=1}^k \alpha_{i,m} + \beta(x(t_{ij})) , &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> where $\beta$ is any (linear or nonlinear) function of $x(t_{ij})$. }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Markovian dependence ==<br /> <br /> <br /> For the sake of simplicity, we will assume here that the observations $(y_{ij})$ take their values in $\{1, 2, \ldots, K\}$.<br /> <br /> We have so far assumed that the categorical observations $(y_{ij},\,j=1,2,\ldots,n_i)$ for individual $i$ are independent. It is however possible to introduce dependency between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. For instance, a [http://en.wikipedia.org/wiki/Markov_chain Markov chain] with memory 1 assumes that all is required from the past to determine the distribution of $y_{i,j}$ is the value of the previous observation $y_{i,j-1}$. i.e., for all $k=1,2,\ldots ,K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y_{i,j} = k\, {{!}} \,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i} = \prob{y_{i,j} = k {{!}} y_{i,j-1},\psi_i}.<br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> === Discrete time Markov chains ===<br /> <br /> If the observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},\,j=1,2,\ldots,n_i)$ to be a discrete time [http://en.wikipedia.org/wiki/Markov_chain Markov chain]. Here, for each individual $i$, the probability distribution of the sequence $(y_{ij},\,j=1,2,\ldots,n_i)$ is defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> * the distribution $\pi_{i,1} = (\pi_{i,1}^{k} , k=1,2,\ldots,K)$ of the first observation $y_{i,1}$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pi_{i,1}^{k} = \prob{y_{i,1} = k {{!}} \psi_i} &lt;/math&gt; }}<br /> <br /> <br /> * the sequence of ''transition matrices'' $(Q_{i,j}, j=2,3,\ldots)$, where for each $j$, $Q_{i,j} = (q_{i,j}^{\ell,k}, 1\leq \ell,k \leq K)$ is a matrix of size $K \times K$ such that,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> q_{i,j}^{\ell,k} &amp;=&amp; \prob{y_{i,j} = k {{!}} y_{i,j-1}=\ell , \psi_i} \quad \text{ for all } (\ell,k),\\<br /> \sum_{k=1}^{K}q_{ij}^{\ell,k} &amp;=&amp; 1 \quad \text{ for all } (\ell,k).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of $y_i=(y_{i,j}, j=1,2,\ldots, n_i)$ is then well-defined:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pcyipsii(y_i {{!}} \psi_i) = \pmacro(y_{i,1}{{!}}\psi_i) \prod_{j=2}^{n_i} \pmacro(y_{i,j} {{!}} y_{i,j-1},\psi_i) .<br /> &lt;/math&gt; }}<br /> <br /> For a given individual $i$, $Q_{i,j}$ defines the transition probabilities between states at a given time $t_{ij}$:<br /> <br /> <br /> ::[[File:markov_1.png|link=]]<br /> <br /> <br /> Our model must therefore give, for each individual $i$, the distribution of first observation $(y_{i,1})$ and a description of how the transition probabilities evolve with time.<br /> <br /> The figure below shows several examples of simulated sequences coming from a model with 2 states defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit\left(q_{i,j}^{1,2}\right) &amp;=&amp; a_i+b_i \, t_j \\<br /> \logit\left(q_{i,j}^{2,1}\right) &amp;=&amp; c_i+d_i \, t_j \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5 ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $t_j = j$.<br /> <br /> [[File:markov_2.png|link=]]<br /> <br /> In the first example (left), the logits of the transitions between states are constant ($b_i = d_i = 0$).<br /> Transition probabilities are therefore constant over time. Here, $q^{1,2}=1/(1+\exp(2.5))=0.0759$ and $q^{2,1}=1/(1+\exp(2))=0.1192$. As $q^{1,2}$ and $q^{2,1}$ are small with $q^{1,2}&lt;q^{2,1}$, transitions between the two states are rare, and a larger amount of time (on average) is spent in state 1. Indeed, the stationary distribution is the eigenvector of the transition matrix $P$: $\prob{y_{ij}=1}=0.611$ and $\prob{y_{ij}=2}=0.389$.<br /> The figure (left) displays the transition rates $q^{1,2}$ and $q^{2,1}$ as function of the time (top left) and two simulated sequences of states (centre and bottom left).<br /> <br /> In the second example (center), $b_i$ and $d_i$ are negative. This means that as time progresses, transitions from state 1 to 2 become rarer, and the same is true from 2 to 1.<br /> <br /> In the third example (right), now $b_i$ and $d_i$ are positive. This means that as time progresses, transitions from state 1 to 2 become more and more frequent, and also more frequent from 2 to 1.<br /> Note that the value of $a_i$ (resp. $c_i$) can be seen as the transition probability from state 1 to 2 (resp. 2 to 1) at time $t=0$.<br /> <br /> Different choices can be made for defining an initial distribution $\pi_{i,1}$:<br /> <br /> <br /> &lt;ul&gt;<br /> * The initial state can be defined arbitrarily: $y_{i,1}=k_0$. This means that $\pi_{i,1}^{k_0} = 1$ and $\pi_{i,1}^{k} = 0$ for $k\neq k_0$.<br /> &lt;br&gt;<br /> <br /> * More generally, any simple probability distribution can be put on the choice of the initial state, e.g., the uniform distribution $\pi_{i,1}^{k} = 1/K$ for $k=1,2,\ldots , K$.<br /> &lt;br&gt;<br /> <br /> * If a transition matrix $Q_{i1}$ has been defined at time $t_1$, we might consider using its stationary distribution, i.e., taking for $\pi_{i,1}$ the solution to:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pi_{i,1} = \pi_{i,1} Q_{i1} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Continuous time Markov chains ===<br /> <br /> <br /> <br /> The previous situation can be extended to the case where observation times are irregular, by modeling the<br /> sequence of states as a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by ''transition rates'' instead of transition probabilities:<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(t+h) = k\, {{!}} \,y_{i}(t)=\ell , \psi_i} = h \, \rho_{i}^{\ell,k}(t) + o(h),\quad k \neq \ell .<br /> &lt;/math&gt; }}<br /> <br /> The probability that no transition happens between $t$ and $t+h$ is<br /> <br /> {{Equation1 <br /> |equation=&lt;math&gt;<br /> \prob{y_{i}(s) = \ell, \forall s\in(t, t+h) \ {{!}} \ y_{i}(t)=\ell , \psi_i} = e^{h \, \rho_{i}^{\ell,\ell}(t)} . <br /> &lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;&lt;br&gt;<br /> ------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text= <br /> A model for independent categorical data is completely defined by:<br /> <br /> &lt;ul&gt;<br /> &lt;li&gt;The probability mass functions $\left(\prob{y_{ij} = k {{!}} \psi_i} \right)$<br /> &lt;li&gt; (or) the cumulative probability functions $\left(\prob{y_{ij} \leq c_k {{!}} \psi_i} \right)$ for ordinal data<br /> &lt;li&gt; (or) the cumulative logits $\left(\logit \left( \prob{y_{ij} \leq k {{!}} \psi_i} \right)\right)$ for a proportional odds model<br /> &lt;/ul&gt;<br /> <br /> <br /> A model for categorical data with Markovian dependency is completely defined by:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; the probability transitions in the case of a discrete-time http://en.wikipedia.org/wiki/Markov_chain Markov chain]&lt;/li&gt;<br /> <br /> &lt;li&gt; (or) the transition rates in the case of a continuous-time [http://en.wikipedia.org/wiki/Markov_process Markov process]&lt;/li&gt;<br /> <br /> &lt;li&gt; the probability distribution of the initial states&lt;/li&gt;<br /> &lt;/ol&gt;<br /> }}<br /> <br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == $\mlxtran$ for categorical data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2= $\quad y_{ij} \in \{0, 1, 2\}$<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (V_i, k_i, \alpha_{0,i}, \alpha_{1,i}, \gamma_i) \\[0.2cm]<br /> D &amp;=&amp;100 \\<br /> C(t,\psi_i) &amp;=&amp; \frac{D_i}{V_i} e^{-k_i \, t} \\[0.2cm]<br /> \prob{y_{ij}\leq 0} &amp;=&amp; \alpha_{0,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 1} &amp;=&amp; \alpha_{0,i} + \alpha_{1,i} + \gamma_i \, C(t_{ij},\psi_i) \\<br /> \prob{y_{ij}\leq 2} &amp;=&amp; 1<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt;<br /> INPUT:<br /> input = {V, k, alpha0, alpha1, gamma}<br /> <br /> EQUATION:<br /> D = 100<br /> C = D/V*exp(-k*t)<br /> p0 = alpha0 + gamma*C<br /> p1 = p0 + alpha1<br /> <br /> DEFINITION:<br /> y = {type=categorical,<br /> categories={0, 1, 2},<br /> P(y&lt;=0)=p0,<br /> P(y&lt;=1)=p1<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2= $\quad$ 2-state discrete-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i) \\[0.2cm]<br /> \logit(p_{ij}^{12}) &amp;=&amp; a_i+b_i \, t_{ij} \\<br /> \logit(p_{ij}^{21}) &amp;=&amp; c_i+d_i \, t_{ij} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; 0.5<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = 0.5<br /> logit(P(Y=2 | Y_p=1)) = a + b*t<br /> logit(P(Y=1 | Y_p=2)) = c + d*t<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 3:<br /> |title2= $\quad$ 2-state continuous-time Markov chain<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \psi_i &amp;=&amp; (a_i,b_i,c_i,d_i,\pi_i) \\[0.2cm]<br /> q_{i}^{12}(t) &amp;=&amp; e^{a_i+b_i \, t} \\<br /> q_{i}^{21}(t) &amp;=&amp; e^{c_i+d_i \, t} \\<br /> \prob{y_{i,1}=1} &amp;=&amp; \pi_i<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> input = {a, b, c, d, pi}<br /> <br /> DEFINITION:<br /> Y = { type = categorical,<br /> categories = {1, 2},<br /> dependence = Markov<br /> P(Y_1=1) = pi<br /> transitionRate(1,2) = exp(a + b*t)<br /> transitionRate(2,1) = exp(c + d*t)<br /> }<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> == Bibliography==<br /> <br /> &lt;bibtex&gt;<br /> @book{agresti2010analysis,<br /> title={Analysis of ordinal categorical data},<br /> author={Agresti, A.},<br /> volume={656},<br /> year={2010},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{agresti2007introduction,<br /> title={An introduction to categorical data analysis},<br /> author={Agresti, A.},<br /> volume={423},<br /> year={2007},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D. M.},<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang., J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications.},<br /> publisher = {Springer Series in Statistics},<br /> volume = {},<br /> pages = {},<br /> year = {2007},<br /> series = {},<br /> address = {New York},<br /> edition = {},<br /> month = {}<br /> }<br /> <br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R. C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> year={2011},<br /> publisher={Wiley}<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{powers2008statistical,<br /> title={Statistical methods for categorical data analysis},<br /> author={Powers, D. A. and Xie, Y.},<br /> year={2008},<br /> publisher={Emerald Group Publishing}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Models for count data<br /> |linkNext=Models for time-to-event data }}</div> Admin http://wiki.webpopix.org/index.php/Models_for_count_data Models for count data 2013-06-07T13:42:50Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.<br /> <br /> For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.<br /> <br /> We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.<br /> <br /> If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br /> <br /> Indeed, here we will only consider parametric distributions. In this context, building a model means defining:<br /> <br /> <br /> &lt;ul&gt;<br /> * the parameter function (or &quot;intensity&quot;) $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.&lt;br&gt;<br /> <br /> * the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of the observations is therefore written:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. &lt;/math&gt; }} <br /> <br /> <br /> {{Example<br /> |title=Example<br /> <br /> |text= Let us illustrate this approach for the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution].<br /> A [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with intensity $\lambda$ is defined by its probability mass function:<br /> <br /> {{Equation1|equation=&lt;math&gt; \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. &lt;/math&gt;}}<br /> <br /> <br /> ::[[File:poisson1.png|link=]]<br /> <br /> <br /> One of the main property of the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is that $\lambda$ is both the mean and the variance of the distribution:<br /> <br /> {{Equation1|equation=&lt;math&gt;\esp{y} = \var{y} = \lambda &lt;/math&gt;}}<br /> <br /> All that remains is to define the Poisson intensity function $\lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> There are many variations of the Poisson model:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$. <br /> &lt;br&gt;&lt;br&gt;<br /> * ''Non-homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> * ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an &quot;Imax&quot; model:<br /> <br /> {{Equation1|equation=&lt;math&gt; <br /> \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br /> &lt;/math&gt; }}<br /> <br /> : where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br /> <br /> : This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.&lt;br&gt;<br /> <br /> <br /> * Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> * If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see [[Models for time-to-event data]] section).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let us see now some other examples of distributions for count data:<br /> <br /> <br /> &lt;ul&gt;<br /> * The inflated [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br /> p_0 + (1-p_0)e^{-\lambda} &amp; {\rm if } \ k=0 \\<br /> (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } &amp; {\rm if } \ k&gt;0 .<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;}}<br /> <br /> :where $0\leq p_0 &lt;1$. This is useful when data seem generally to follow a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] except for having an overly large quantity of cases when $k=0$:<br /> <br /> <br /> ::[[File:poisson2.png|link=]]<br /> <br /> <br /> * The [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial distribution] is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br /> &lt;/math&gt;}}<br /> <br /> :with $0\leq p \leq 1$ and $r&gt;0$. If $r$ is an integer, then the [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial (NB) distribution] with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] with probability of success $p$ before $r$ failures occur.<br /> <br /> <br /> ::[[File:poisson3.png|link=]]<br /> <br /> <br /> * The generalized [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br /> &lt;/math&gt; }}<br /> <br /> :with $\lambda&gt;0$ and $0\leq \delta &lt;1$.<br /> :The generalized Poisson (GP) distribution includes the [[http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray} \esp{y} &amp;=&amp; \frac{\lambda}{1-\delta} \\<br /> \var{y} &amp;=&amp; \frac{\lambda}{1-\delta^3}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> ::[[File:poisson4.png|link=]]<br /> &lt;ul&gt;<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -----------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text=<br /> For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> - the probability mass function used to represent the distribution of the data in a given time interval<br /> &lt;br&gt;&lt;br&gt;<br /> - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == $\mlxtran$ for count data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2: <br /> |title2= generalized Poisson model<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\lambda_i,\delta_i) \\<br /> \log\left( \prob{y_{ij}=k} \right) &amp;=&amp; \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br /> &amp;&amp; -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {dlt, lbd}<br /> <br /> DEFINITION:<br /> Y = {<br /> type = count,<br /> log(P(Y=k)) = log(lambda)<br /> + (k-1)*log(lambda+k*delta)<br /> - lambda -k*delta - factln(k)<br /> } &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{blundell2002individual,<br /> title={Individual effects and dynamics in count data models},<br /> author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br /> journal={Journal of Econometrics},<br /> volume={108},<br /> number={1},<br /> pages={113-131},<br /> year={2002},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{cameron1998regression,<br /> title={Regression analysis of count data},<br /> author={Cameron, A. C. and Trivedi, P. K.},<br /> volume={30},<br /> year={1998},<br /> publisher={Cambridge University Press}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{christensen2002bayesian,<br /> title={Bayesian prediction of spatial count data using generalized linear mixed models},<br /> author={Christensen, O. F. and Waagepetersen, R.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={2},<br /> pages={280-286},<br /> year={2002},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fahrmeir1994multivariate,<br /> title={Multivariate statistical modelling based on generalized linear models},<br /> author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br /> volume={2},<br /> year={1994},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hall2004zero,<br /> title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br /> author={Hall, D. B.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={4},<br /> pages={103--1039},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{heilbron2007zero,<br /> title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br /> author={Heilbron, D. C.},<br /> journal={Biometrical Journal},<br /> volume={36},<br /> number={5},<br /> pages={531-547},<br /> year={2007},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lawless1987negative,<br /> title={Negative binomial and mixed Poisson regression},<br /> author={Lawless, J. F.},<br /> journal={Canadian Journal of Statistics},<br /> volume={15},<br /> number={3},<br /> pages={209-225},<br /> year={1987},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lee2006multi,<br /> title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br /> author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br /> journal={Statistical Methods in Medical Research},<br /> volume={15},<br /> number={1},<br /> pages={47-61},<br /> year={2006},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{min2005random,<br /> title={Random effect models for repeated measures of zero-inflated count data},<br /> author={Min, Y. and Agresti, A.},<br /> journal={Statistical Modelling},<br /> volume={5},<br /> number={1},<br /> pages={1-19},<br /> year={2005},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mullahy1998heterogeneity,<br /> title={Heterogeneity, excess zeros, and the structure of count data models},<br /> author={Mullahy, J.},<br /> journal={Journal of Applied Econometrics},<br /> volume={12},<br /> number={3},<br /> pages={337-350},<br /> year={1998},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{savic2009performance,<br /> title={Performance in population models for count data, part ii: A new saem algorithm},<br /> author={Savic, R. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={36},<br /> number={4},<br /> pages={367-379},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1988mixed,<br /> title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br /> author={Thall, P. F.},<br /> journal={Biometrics},<br /> pages={197-209},<br /> year={1988},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1990some,<br /> title={Some covariance models for longitudinal count data with overdispersion},<br /> author={Thall, P. F. and Vail, S. C.},<br /> journal={Biometrics},<br /> pages={657-671},<br /> year={1990},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tempelman1996mixed,<br /> title={A mixed effects model for overdispersed count data in animal breeding},<br /> author={Tempelman, R. J. and Gianola, D.},<br /> journal={Biometrics},<br /> pages={265-279},<br /> year={1996},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{winkelmann2008econometric,<br /> title={Econometric analysis of count data},<br /> author={Winkelmann, R.},<br /> year={2008},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{yau2003zero,<br /> title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br /> author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br /> journal={Biometrical Journal},<br /> volume={45},<br /> number={4},<br /> pages={437-452},<br /> year={2003},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zeileis2008regression,<br /> title={Regression models for count data in R},<br /> author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br /> journal={Journal of Statistical Software},<br /> volume={27},<br /> number={8},<br /> pages={1-25},<br /> year={2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=Continuous data models<br /> |linkNext=Model for categorical data }}</div> Admin http://wiki.webpopix.org/index.php/Models_for_count_data Models for count data 2013-06-07T13:42:36Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.<br /> <br /> For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.<br /> <br /> We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.<br /> <br /> If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br /> <br /> Indeed, here we will only consider parametric distributions. In this context, building a model means defining:<br /> <br /> <br /> &lt;ul&gt;<br /> * the parameter function (or &quot;intensity&quot;) $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.&lt;br&gt;<br /> <br /> * the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of the observations is therefore written:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. &lt;/math&gt; }} <br /> <br /> <br /> {{Example<br /> |title=Example<br /> <br /> |text= Let us illustrate this approach for the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution].<br /> A [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with intensity $\lambda$ is defined by its probability mass function:<br /> <br /> {{Equation1|equation=&lt;math&gt; \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. &lt;/math&gt;}}<br /> <br /> <br /> ::[[File:poisson1.png|link=]]<br /> <br /> <br /> One of the main property of the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is that $\lambda$ is both the mean and the variance of the distribution:<br /> <br /> {{Equation1|equation=&lt;math&gt;\esp{y} = \var{y} = \lambda &lt;/math&gt;}}<br /> <br /> All that remains is to define the Poisson intensity function $\lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> There are many variations of the Poisson model:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$. <br /> &lt;br&gt;&lt;br&gt;<br /> * ''Non-homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> * ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an &quot;Imax&quot; model:<br /> <br /> {{Equation1|equation=&lt;math&gt; <br /> \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br /> &lt;/math&gt; }}<br /> <br /> : where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br /> <br /> : This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.&lt;br&gt;<br /> <br /> <br /> * Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> * If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see [[Models for time-to-event data]] section).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let us see now some other examples of distributions for count data:<br /> <br /> <br /> &lt;ul&gt;<br /> * The inflated [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br /> p_0 + (1-p_0)e^{-\lambda} &amp; {\rm if } \ k=0 \\<br /> (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } &amp; {\rm if } \ k&gt;0 .<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;}}<br /> <br /> :where $0\leq p_0 &lt;1$. This is useful when data seem generally to follow a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] except for having an overly large quantity of cases when $k=0$:<br /> <br /> <br /> ::[[File:poisson2.png|link=]]<br /> <br /> <br /> * The [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial distribution] is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br /> &lt;/math&gt;}}<br /> <br /> :with $0\leq p \leq 1$ and $r&gt;0$. If $r$ is an integer, then the [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial (NB) distribution] with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] with probability of success $p$ before $r$ failures occur.<br /> <br /> <br /> ::[[File:poisson3.png|link=]]<br /> <br /> <br /> * The generalized [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br /> &lt;/math&gt; }}<br /> <br /> :with $\lambda&gt;0$ and $0\leq \delta &lt;1$.<br /> :The generalized Poisson (GP) distribution includes the [[http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray} \esp{y} &amp;=&amp; \frac{\lambda}{1-\delta} \\<br /> \var{y} &amp;=&amp; \frac{\lambda}{1-\delta^3}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> ::[[File:poisson4.png|link=]]<br /> &lt;ul&gt;<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -----------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text=<br /> For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> - the probability mass function used to represent the distribution of the data in a given time interval<br /> &lt;br&gt;&lt;br&gt;<br /> - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == $\mlxtran$ for count data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2: <br /> |title2= generalized Poisson model<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\lambda_i,\delta_i) \\<br /> \log\left( \prob{y_{ij}=k} \right) &amp;=&amp; \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br /> &amp;&amp; -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {dlt, lbd}<br /> <br /> DEFINITION:<br /> Y = {<br /> type = count,<br /> log(P(Y=k)) = log(lambda)<br /> + (k-1)*log(lambda+k*delta)<br /> - lambda -k*delta - factln(k)<br /> } &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{blundell2002individual,<br /> title={Individual effects and dynamics in count data models},<br /> author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br /> journal={Journal of Econometrics},<br /> volume={108},<br /> number={1},<br /> pages={113-131},<br /> year={2002},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{cameron1998regression,<br /> title={Regression analysis of count data},<br /> author={Cameron, A. C. and Trivedi, P. K.},<br /> volume={30},<br /> year={1998},<br /> publisher={Cambridge University Press}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{christensen2002bayesian,<br /> title={Bayesian prediction of spatial count data using generalized linear mixed models},<br /> author={Christensen, O. F. and Waagepetersen, R.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={2},<br /> pages={280-286},<br /> year={2002},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fahrmeir1994multivariate,<br /> title={Multivariate statistical modelling based on generalized linear models},<br /> author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br /> volume={2},<br /> year={1994},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hall2004zero,<br /> title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br /> author={Hall, D. B.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={4},<br /> pages={103--1039},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{heilbron2007zero,<br /> title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br /> author={Heilbron, D. C.},<br /> journal={Biometrical Journal},<br /> volume={36},<br /> number={5},<br /> pages={531-547},<br /> year={2007},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lawless1987negative,<br /> title={Negative binomial and mixed Poisson regression},<br /> author={Lawless, J. F.},<br /> journal={Canadian Journal of Statistics},<br /> volume={15},<br /> number={3},<br /> pages={209-225},<br /> year={1987},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lee2006multi,<br /> title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br /> author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br /> journal={Statistical Methods in Medical Research},<br /> volume={15},<br /> number={1},<br /> pages={47-61},<br /> year={2006},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{min2005random,<br /> title={Random effect models for repeated measures of zero-inflated count data},<br /> author={Min, Y. and Agresti, A.},<br /> journal={Statistical Modelling},<br /> volume={5},<br /> number={1},<br /> pages={1-19},<br /> year={2005},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mullahy1998heterogeneity,<br /> title={Heterogeneity, excess zeros, and the structure of count data models},<br /> author={Mullahy, J.},<br /> journal={Journal of Applied Econometrics},<br /> volume={12},<br /> number={3},<br /> pages={337-350},<br /> year={1998},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{savic2009performance,<br /> title={Performance in population models for count data, part ii: A new saem algorithm},<br /> author={Savic, R. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={36},<br /> number={4},<br /> pages={367-379},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1988mixed,<br /> title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br /> author={Thall, P. F.},<br /> journal={Biometrics},<br /> pages={197-209},<br /> year={1988},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1990some,<br /> title={Some covariance models for longitudinal count data with overdispersion},<br /> author={Thall, P. F. and Vail, S. C.},<br /> journal={Biometrics},<br /> pages={657-671},<br /> year={1990},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tempelman1996mixed,<br /> title={A mixed effects model for overdispersed count data in animal breeding},<br /> author={Tempelman, R. J. and Gianola, D.},<br /> journal={Biometrics},<br /> pages={265-279},<br /> year={1996},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{winkelmann2008econometric,<br /> title={Econometric analysis of count data},<br /> author={Winkelmann, R.},<br /> year={2008},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{yau2003zero,<br /> title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br /> author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br /> journal={Biometrical Journal},<br /> volume={45},<br /> number={4},<br /> pages={437-452},<br /> year={2003},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zeileis2008regression,<br /> title={Regression models for count data in R},<br /> author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br /> journal={Journal of Statistical Software},<br /> volume={27},<br /> number={8},<br /> pages={1-25},<br /> year={2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=Continuous data models<br /> |linkNext=Model for categorical data }}</div> Admin http://wiki.webpopix.org/index.php/Models_for_count_data Models for count data 2013-06-07T13:42:24Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.<br /> <br /> For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.<br /> <br /> We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.<br /> <br /> If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br /> <br /> Indeed, here we will only consider parametric distributions. In this context, building a model means defining:<br /> <br /> <br /> &lt;ul&gt;<br /> * the parameter function (or &quot;intensity&quot;) $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.&lt;br&gt;<br /> <br /> * the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of the observations is therefore written:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. &lt;/math&gt; }} <br /> <br /> <br /> {{Example<br /> |title=Example<br /> <br /> |text= Let us illustrate this approach for the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution].<br /> A [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with intensity $\lambda$ is defined by its probability mass function:<br /> <br /> {{Equation1|equation=&lt;math&gt; \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. &lt;/math&gt;}}<br /> <br /> <br /> ::[[File:poisson1.png|link=]]<br /> <br /> <br /> One of the main property of the [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is that $\lambda$ is both the mean and the variance of the distribution:<br /> <br /> {{Equation1|equation=&lt;math&gt;\esp{y} = \var{y} = \lambda &lt;/math&gt;}}<br /> <br /> All that remains is to define the Poisson intensity function $\lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> There are many variations of the Poisson model:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$. <br /> &lt;br&gt;&lt;br&gt;<br /> * ''Non-homogeneous [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> * ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an &quot;Imax&quot; model:<br /> <br /> {{Equation1|equation=&lt;math&gt; <br /> \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br /> &lt;/math&gt; }}<br /> <br /> : where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br /> <br /> : This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.&lt;br&gt;<br /> <br /> <br /> * Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> * If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see [[Models for time-to-event data]] section).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let us see now some other examples of distributions for count data:<br /> <br /> <br /> &lt;ul&gt;<br /> * The inflated [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution]:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br /> p_0 + (1-p_0)e^{-\lambda} &amp; {\rm if } \ k=0 \\<br /> (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } &amp; {\rm if } \ k&gt;0 .<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;}}<br /> <br /> :where $0\leq p_0 &lt;1$. This is useful when data seem generally to follow a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] except for having an overly large quantity of cases when $k=0$:<br /> <br /> <br /> ::[[File:poisson2.png|link=]]<br /> <br /> <br /> * The [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial distribution] is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br /> &lt;/math&gt;}}<br /> <br /> :with $0\leq p \leq 1$ and $r&gt;0$. If $r$ is an integer, then the [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial (NB) distribution] with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of [http://en.wikipedia.org/wiki/Bernoulli_trial Bernoulli trials] with probability of success $p$ before $r$ failures occur.<br /> <br /> <br /> ::[[File:poisson3.png|link=]]<br /> <br /> <br /> * The generalized [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] is: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br /> &lt;/math&gt; }}<br /> <br /> :with $\lambda&gt;0$ and $0\leq \delta &lt;1$.<br /> :The generalized Poisson (GP) distribution includes the [[http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray} \esp{y} &amp;=&amp; \frac{\lambda}{1-\delta} \\<br /> \var{y} &amp;=&amp; \frac{\lambda}{1-\delta^3}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> ::[[File:poisson4.png|link=]]<br /> &lt;ul&gt;<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -----------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text=<br /> For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> - the probability mass function used to represent the distribution of the data in a given time interval<br /> &lt;br&gt;&lt;br&gt;<br /> - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == $\mlxtran$ for count data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2: <br /> |title2= generalized Poisson model<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\lambda_i,\delta_i) \\<br /> \log\left( \prob{y_{ij}=k} \right) &amp;=&amp; \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br /> &amp;&amp; -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {dlt, lbd}<br /> <br /> DEFINITION:<br /> Y = {<br /> type = count,<br /> log(P(Y=k)) = log(lambda)<br /> + (k-1)*log(lambda+k*delta)<br /> - lambda -k*delta - factln(k)<br /> } &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{blundell2002individual,<br /> title={Individual effects and dynamics in count data models},<br /> author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br /> journal={Journal of Econometrics},<br /> volume={108},<br /> number={1},<br /> pages={113-131},<br /> year={2002},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{cameron1998regression,<br /> title={Regression analysis of count data},<br /> author={Cameron, A. C. and Trivedi, P. K.},<br /> volume={30},<br /> year={1998},<br /> publisher={Cambridge University Press}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{christensen2002bayesian,<br /> title={Bayesian prediction of spatial count data using generalized linear mixed models},<br /> author={Christensen, O. F. and Waagepetersen, R.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={2},<br /> pages={280-286},<br /> year={2002},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fahrmeir1994multivariate,<br /> title={Multivariate statistical modelling based on generalized linear models},<br /> author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br /> volume={2},<br /> year={1994},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hall2004zero,<br /> title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br /> author={Hall, D. B.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={4},<br /> pages={103--1039},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{heilbron2007zero,<br /> title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br /> author={Heilbron, D. C.},<br /> journal={Biometrical Journal},<br /> volume={36},<br /> number={5},<br /> pages={531-547},<br /> year={2007},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lawless1987negative,<br /> title={Negative binomial and mixed Poisson regression},<br /> author={Lawless, J. F.},<br /> journal={Canadian Journal of Statistics},<br /> volume={15},<br /> number={3},<br /> pages={209-225},<br /> year={1987},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lee2006multi,<br /> title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br /> author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br /> journal={Statistical Methods in Medical Research},<br /> volume={15},<br /> number={1},<br /> pages={47-61},<br /> year={2006},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{min2005random,<br /> title={Random effect models for repeated measures of zero-inflated count data},<br /> author={Min, Y. and Agresti, A.},<br /> journal={Statistical Modelling},<br /> volume={5},<br /> number={1},<br /> pages={1-19},<br /> year={2005},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mullahy1998heterogeneity,<br /> title={Heterogeneity, excess zeros, and the structure of count data models},<br /> author={Mullahy, J.},<br /> journal={Journal of Applied Econometrics},<br /> volume={12},<br /> number={3},<br /> pages={337-350},<br /> year={1998},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{savic2009performance,<br /> title={Performance in population models for count data, part ii: A new saem algorithm},<br /> author={Savic, R. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={36},<br /> number={4},<br /> pages={367-379},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1988mixed,<br /> title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br /> author={Thall, P. F.},<br /> journal={Biometrics},<br /> pages={197-209},<br /> year={1988},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1990some,<br /> title={Some covariance models for longitudinal count data with overdispersion},<br /> author={Thall, P. F. and Vail, S. C.},<br /> journal={Biometrics},<br /> pages={657-671},<br /> year={1990},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tempelman1996mixed,<br /> title={A mixed effects model for overdispersed count data in animal breeding},<br /> author={Tempelman, R. J. and Gianola, D.},<br /> journal={Biometrics},<br /> pages={265-279},<br /> year={1996},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{winkelmann2008econometric,<br /> title={Econometric analysis of count data},<br /> author={Winkelmann, R.},<br /> year={2008},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{yau2003zero,<br /> title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br /> author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br /> journal={Biometrical Journal},<br /> volume={45},<br /> number={4},<br /> pages={437-452},<br /> year={2003},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zeileis2008regression,<br /> title={Regression models for count data in R},<br /> author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br /> journal={Journal of Statistical Software},<br /> volume={27},<br /> number={8},<br /> pages={1-25},<br /> year={2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=Continuous data models<br /> |linkNext=Model for categorical data }}</div> Admin http://wiki.webpopix.org/index.php/Models_for_count_data Models for count data 2013-06-07T13:41:33Z <p>Admin : </p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> Count data is a special type of statistical data that can only take non-negative integer values $\{0, 1, 2,\ldots\}$ that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. More precisely, data from individual $i$ is the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ where $y_{ij}$ is the number of events observed in the $j$th time interval $I_{ij}$.<br /> <br /> For the moment, let us assume that all the intervals have the same length. This is the case, for instance, if data are daily seizure counts: $I_{ij}$ is the $j$th day after the start of the experiment and $y_{ij}$ the number of seizures observed during that day.<br /> <br /> We will then model the sequence $y_i=(y_{ij},1\leq j \leq n_i)$ as a sequence of random variables that take its values in $\{ 0, 1, 2,\ldots\}$.<br /> <br /> If we assume that these random variables are independent, then the model is completely defined by the probability mass functions $\prob{y_{ij}=k}$, for $k \geq 0$ and $1 \leq j \leq n_i$. Common distributions used to model count data include [http://en.wikipedia.org/wiki/Poisson_distribution Poisson], [http://en.wikipedia.org/wiki/Binomial_distribution binomial] and [http://en.wikipedia.org/wiki/Negative_binomial_distribution negative binomial].<br /> <br /> Indeed, here we will only consider parametric distributions. In this context, building a model means defining:<br /> <br /> <br /> &lt;ul&gt;<br /> * the parameter function (or &quot;intensity&quot;) $\lambda_{ij} = \lambda(t_{ij},\psi_i)$ for any individual $i$ that depends on individual parameters $\psi_i$ and possibly the time $t_{ij}$.&lt;br&gt;<br /> <br /> * the probability mass function $\prob{y_{ij}=k; \lambda_{ij}}$.<br /> &lt;/ul&gt;<br /> <br /> <br /> The conditional distribution of the observations is therefore written:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; \prob{y_{ij}=k {{!}} \psi_i} = \prob{y_{ij}=k ; \lambda_{ij} }. &lt;/math&gt; }} <br /> <br /> <br /> {{Example<br /> |title=Example<br /> <br /> |text= Let us illustrate this approach for the Poisson distribution.<br /> A Poisson distribution with intensity $\lambda$ is defined by its probability mass function:<br /> <br /> {{Equation1|equation=&lt;math&gt; \prob{y=k ; \lambda} = \displaystyle{\frac{\lambda^{k} \, e^{-\lambda} }{k!} }. &lt;/math&gt;}}<br /> <br /> <br /> ::[[File:poisson1.png|link=]]<br /> <br /> <br /> One of the main property of the Poisson distribution is that $\lambda$ is both the mean and the variance of the distribution:<br /> <br /> {{Equation1|equation=&lt;math&gt;\esp{y} = \var{y} = \lambda &lt;/math&gt;}}<br /> <br /> All that remains is to define the Poisson intensity function $\lambda_{ij} = \lambda(t_{ij},\psi_i)$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\prob{y_{ij}=k {{!}} \psi_i} = \displaystyle{\frac{\lambda_{ij}^{k}\, e^{-\lambda_{ij} } } {k!} }. &lt;/math&gt;}}<br /> }}<br /> <br /> <br /> There are many variations of the Poisson model:<br /> <br /> <br /> &lt;ul&gt;<br /> * ''Homogeneous Poisson distribution:'' this assumes a constant intensity $\lambda_i$ for each individual $i$. Here, $\psi_i = \lambda_i$ and $\lambda(t_{ij},\psi_i)=\lambda_i$. <br /> &lt;br&gt;&lt;br&gt;<br /> * ''Non-homogeneous Poisson distribution:'' this assumes that the Poisson intensity is a function of time. For example, suppose that we believe that a disease-related event is increasing linearly in frequency each month. We could then model this using $\lambda(t_{ij},\psi_i) = \lambda_{i} + a_i t_{ij}$, where $t_{ij} = j$ (months). Here, $\psi_i=(\lambda_{i},a_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> * ''Additional regression variables:'' the Poisson intensity may depend on regression variables other than time. For example, assume that taking a drug tends to reduce the number of events. We can then link the time-varying drug concentration $C$ to the value of $\lambda$ at time $t_{ij}$ using for instance an &quot;Imax&quot; model:<br /> <br /> {{Equation1|equation=&lt;math&gt; <br /> \lambda(t_{ij},\psi_i) = \lambda_{i}\left(1-\Imax_i\displaystyle{\frac{ \ C_i(t_{ij})}{IC_{50,i} + C_i(t_{ij})} }\right) ,<br /> &lt;/math&gt; }}<br /> <br /> : where $\lambda_{i}$ is the baseline intensity and where $0\leq \Imax_i\leq 1$. Here, $\psi_{i} = (\lambda_{i}, \Imax_i, IC_{50,i})$.<br /> <br /> : This model can even be combined with the previous non-homogeneous model by assuming a time-varying baseline $\lambda_{i}(t)$ in order to combine a drug effect model with a disease model for instance.&lt;br&gt;<br /> <br /> <br /> * Instead of assuming independent count data, we can introduce Markovian dependency into the model by assuming for example that $\lambda_{ij}$ is function of $y_{i,j-1}$. Then, $\prob{y_{ij}=k\, |\, y_{i\,j-1}, t_{ij},\psi_i}$ is the probability function of a Poisson random variable with parameter $\lambda_{ij} =\lambda(y_{i,j-1}, t_{ij},\psi_i)$.<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> * If $y_{ij}$ is the number of a given type of events (seizures, hemorrhages, etc.) in a given time interval $I_{ij}$, and if $h_i(t)=h(t,\psi_i)$ is the hazard function associated with this sequence of events for individual $i$, then $y_{ij}$ is a non-homogeneous Poisson process with Poisson intensity $\lambda_{ij}=\displaystyle{ \int_{I_{ij}}} h(t,\psi_i)dt$ in interval $I_{ij}$ (see [[Models for time-to-event data]] section).<br /> &lt;/ul&gt;<br /> <br /> <br /> Let us see now some other examples of distributions for count data:<br /> <br /> <br /> &lt;ul&gt;<br /> * The inflated Poisson distribution:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> \prob{y=k ; \lambda,p_0} = \left\{ \begin{array}{cc}<br /> p_0 + (1-p_0)e^{-\lambda} &amp; {\rm if } \ k=0 \\<br /> (1-p_0) \displaystyle {\frac{e^{-\lambda} \lambda^{k} }{k!} } &amp; {\rm if } \ k&gt;0 .<br /> \end{array}<br /> \right.<br /> &lt;/math&gt;}}<br /> <br /> :where $0\leq p_0 &lt;1$. This is useful when data seem generally to follow a Poisson distribution except for having an overly large quantity of cases when $k=0$:<br /> <br /> <br /> ::[[File:poisson2.png|link=]]<br /> <br /> <br /> * The negative binomial distribution is:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; p,r} = \displaystyle{ \frac{\Gamma(k+r)}{k!\, \Gamma(r)} }(1-p)^r p^k ,<br /> &lt;/math&gt;}}<br /> <br /> :with $0\leq p \leq 1$ and $r&gt;0$. If $r$ is an integer, then the negative binomial (NB) distribution with parameters $(p,r)$ is the probability distribution of the number of successes in a sequence of Bernoulli trials with probability of success $p$ before $r$ failures occur.<br /> <br /> <br /> ::[[File:poisson3.png|link=]]<br /> <br /> <br /> * The generalized Poisson distribution is: <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \prob{y=k ; \lambda,\delta} = \displaystyle {\frac{\lambda (\lambda+k\delta)^{k-1} e^{-\lambda-k\delta} }{k!} },<br /> &lt;/math&gt; }}<br /> <br /> :with $\lambda&gt;0$ and $0\leq \delta &lt;1$.<br /> :The generalized Poisson (GP) distribution includes the Poisson distribution as a special case $(\delta=0)$, and is over-dispersed relative to the Poisson. Indeed, the variance to mean ratio exceeds 1:<br /> <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray} \esp{y} &amp;=&amp; \frac{\lambda}{1-\delta} \\<br /> \var{y} &amp;=&amp; \frac{\lambda}{1-\delta^3}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> ::[[File:poisson4.png|link=]]<br /> &lt;ul&gt;<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> -----------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary<br /> |text=<br /> For a given design $\bx_{i}$ and a given vector of parameters $\psi_i$, a parametric model for count data is completely defined by:<br /> <br /> <br /> &lt;ul&gt;<br /> - the probability mass function used to represent the distribution of the data in a given time interval<br /> &lt;br&gt;&lt;br&gt;<br /> - a model which defines how the distribution's parameter function (i.e., intensity) varies over time.<br /> &lt;/ul&gt;<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == $\mlxtran$ for count data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 1: <br /> |title2= Poisson model with time varying intensity<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\alpha_i,\beta_i) \\[0.3cm]<br /> \lambda(t,\psi_i) &amp;=&amp; \alpha_i + \beta_i\,t \\[0.3cm]<br /> \prob{y_{ij}=k} &amp;=&amp; \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{-\lambda(t_{ij} , \psi_i)}\\<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border: none;&quot;&gt; <br /> INPUT:<br /> input = {alpha, beta}<br /> <br /> EQUATION:<br /> lambda = alpha + beta*t<br /> <br /> DEFINITION:<br /> y ~ poisson(lambda)<br /> &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1= Example 2: <br /> |title2= generalized Poisson model<br /> |text=<br /> <br /> |equation=&lt;math&gt; \begin{array}{c}<br /> \psi_i &amp;=&amp; (\lambda_i,\delta_i) \\<br /> \log\left( \prob{y_{ij}=k} \right) &amp;=&amp; \log(\lambda_i) + (k-1)\log(\lambda_i+k\delta_i) \\<br /> &amp;&amp; -\lambda_i-k\delta_i - \log(k!)\\[1cm]<br /> \end{array}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot; background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {dlt, lbd}<br /> <br /> DEFINITION:<br /> Y = {<br /> type = count,<br /> log(P(Y=k)) = log(lambda)<br /> + (k-1)*log(lambda+k*delta)<br /> - lambda -k*delta - factln(k)<br /> } &lt;/pre&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @article{blundell2002individual,<br /> title={Individual effects and dynamics in count data models},<br /> author={Blundell, R. and Griffith, R. and Windmeijer, F.},<br /> journal={Journal of Econometrics},<br /> volume={108},<br /> number={1},<br /> pages={113-131},<br /> year={2002},<br /> publisher={Elsevier}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{bolker2009generalized,<br /> title={Generalized linear mixed models: a practical guide for ecology and evolution},<br /> author={Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, J. R. and Stevens, M. H. and White, J.-S. S. and others},<br /> journal={Trends in ecology &amp; evolution},<br /> volume={24},<br /> number={3},<br /> pages={127-135},<br /> year={2009},<br /> publisher={Elsevier Science}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{cameron1998regression,<br /> title={Regression analysis of count data},<br /> author={Cameron, A. C. and Trivedi, P. K.},<br /> volume={30},<br /> year={1998},<br /> publisher={Cambridge University Press}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{christensen2002bayesian,<br /> title={Bayesian prediction of spatial count data using generalized linear mixed models},<br /> author={Christensen, O. F. and Waagepetersen, R.},<br /> journal={Biometrics},<br /> volume={58},<br /> number={2},<br /> pages={280-286},<br /> year={2002},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fahrmeir1994multivariate,<br /> title={Multivariate statistical modelling based on generalized linear models},<br /> author={Fahrmeir, L. and Tutz, G. and Hennevogl, W.},<br /> volume={2},<br /> year={1994},<br /> publisher={Springer New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{hall2004zero,<br /> title={Zero-inflated Poisson and binomial regression with random effects: a case study},<br /> author={Hall, D. B.},<br /> journal={Biometrics},<br /> volume={56},<br /> number={4},<br /> pages={103--1039},<br /> year={2004},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{heilbron2007zero,<br /> title={Zero-Altered and other Regression Models for Count Data with Added Zeros},<br /> author={Heilbron, D. C.},<br /> journal={Biometrical Journal},<br /> volume={36},<br /> number={5},<br /> pages={531-547},<br /> year={2007},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lawless1987negative,<br /> title={Negative binomial and mixed Poisson regression},<br /> author={Lawless, J. F.},<br /> journal={Canadian Journal of Statistics},<br /> volume={15},<br /> number={3},<br /> pages={209-225},<br /> year={1987},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lee2006multi,<br /> title={Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros},<br /> author={Lee, A. H. and Wang, K. and Scott, J. A. and Yau, K. K. W. and McLachlan, G. J.},<br /> journal={Statistical Methods in Medical Research},<br /> volume={15},<br /> number={1},<br /> pages={47-61},<br /> year={2006},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C. E. and Searle, S. R. and Neuhaus, J. M.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics},<br /> url={http://books.google.fr/books?id=kyvgyK\_sBlkC},<br /> year={2011},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{min2005random,<br /> title={Random effect models for repeated measures of zero-inflated count data},<br /> author={Min, Y. and Agresti, A.},<br /> journal={Statistical Modelling},<br /> volume={5},<br /> number={1},<br /> pages={1-19},<br /> year={2005},<br /> publisher={SAGE Publications}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{molenberghs2005models,<br /> title={Models for discrete longitudinal data},<br /> author={Molenberghs, G. and Verbeke, G.},<br /> year={2005},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{mullahy1998heterogeneity,<br /> title={Heterogeneity, excess zeros, and the structure of count data models},<br /> author={Mullahy, J.},<br /> journal={Journal of Applied Econometrics},<br /> volume={12},<br /> number={3},<br /> pages={337-350},<br /> year={1998},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{savic2009performance,<br /> title={Performance in population models for count data, part ii: A new saem algorithm},<br /> author={Savic, R. and Lavielle, M.},<br /> journal={Journal of pharmacokinetics and pharmacodynamics},<br /> volume={36},<br /> number={4},<br /> pages={367-379},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1988mixed,<br /> title={Mixed Poisson likelihood regression models for longitudinal interval count data},<br /> author={Thall, P. F.},<br /> journal={Biometrics},<br /> pages={197-209},<br /> year={1988},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{thall1990some,<br /> title={Some covariance models for longitudinal count data with overdispersion},<br /> author={Thall, P. F. and Vail, S. C.},<br /> journal={Biometrics},<br /> pages={657-671},<br /> year={1990},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{tempelman1996mixed,<br /> title={A mixed effects model for overdispersed count data in animal breeding},<br /> author={Tempelman, R. J. and Gianola, D.},<br /> journal={Biometrics},<br /> pages={265-279},<br /> year={1996},<br /> publisher={JSTOR}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{winkelmann2008econometric,<br /> title={Econometric analysis of count data},<br /> author={Winkelmann, R.},<br /> year={2008},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{wolfinger1993generalized,<br /> title={Generalized linear mixed models a pseudo-likelihood approach},<br /> author={Wolfinger, R. and O'Connell, M.},<br /> journal={Journal of statistical Computation and Simulation},<br /> volume={48},<br /> number={3-4},<br /> pages={233-243},<br /> year={1993},<br /> publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{yau2003zero,<br /> title={Zero-Inflated Negative Binomial Mixed Regression Modeling of Over-Dispersed Count Data with Extra Zeros},<br /> author={Yau, K. K. W. and Wang, K. and Lee, A. H.},<br /> journal={Biometrical Journal},<br /> volume={45},<br /> number={4},<br /> pages={437-452},<br /> year={2003},<br /> publisher={Wiley Online Library}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{zeileis2008regression,<br /> title={Regression models for count data in R},<br /> author={Zeileis, A. and Kleiber, C. and Jackman, S.},<br /> journal={Journal of Statistical Software},<br /> volume={27},<br /> number={8},<br /> pages={1-25},<br /> year={2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> {{Back&amp;Next<br /> |linkBack=Continuous data models<br /> |linkNext=Model for categorical data }}</div> Admin http://wiki.webpopix.org/index.php/Continuous_data_models Continuous data models 2013-06-07T13:40:24Z <p>Admin : /* Censored data */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The data ==<br /> <br /> Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br /> <br /> The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br /> <br /> <br /> {| cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width:60%&quot; align=&quot;center&quot;| <br /> :[[File:continuous_graf0a_1.png|link=]]<br /> | style=&quot;width: 40%&quot; align=&quot;left&quot;| <br /> {| class=&quot;wikitable&quot; style=&quot;width: 70%;font-size:7pt; &quot;<br /> !| ID || TIME ||CONCENTRATION<br /> |- <br /> |1 || 1.0 || 9.84 <br /> |-<br /> |1 || 2.0 || 8.19 <br /> |-<br /> |1 || 4.0 || 6.91 <br /> |-<br /> |1 || 8.0 || 3.71 <br /> |-<br /> |1 || 12.0 || 1.25 <br /> |-<br /> |2 || 1.0 || 17.23 <br /> |-<br /> |2 || 3.0 || 11.14 <br /> |-<br /> |2 || 5.0 || 4.35 <br /> |-<br /> |2 || 10.0 || 2.92 <br /> |-<br /> |3 || 2.0 || 9.78 <br /> |-<br /> |3 || 3.0 || 10.40 <br /> |-<br /> |3 || 4.0 || 7.67 <br /> |-<br /> |3 || 6.0 || 6.84 <br /> |-<br /> |3 || 11.0 || 1.10 <br /> |-<br /> |4 || 4.0 || 8.78 <br /> |-<br /> |4 || 6.0 || 3.87 <br /> |-<br /> |4 || 12.0 || 1.85 <br /> |}<br /> |}<br /> <br /> <br /> Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br /> <br /> <br /> ::[[File:continuous_graf0b_1.png|link=]]<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The model ==<br /> <br /> <br /> For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;nlme&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1)<br /> }}<br /> <br /> where $g(t_{ij},\psi_i)\geq 0$.<br /> <br /> Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br /> In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &amp;=&amp; f(t_{ij},\psi_i) \\ <br /> \std{y_{ij} {{!}} \psi_i} &amp;=&amp; g(t_{ij},\psi_i).<br /> \end{eqnarray}&lt;/math&gt;}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The structural model == <br /> <br /> <br /> $f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br /> <br /> In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br /> Here are some example curves for various combinations of $A$ and $\alpha$:<br /> <br /> <br /> ::[[File:continuous_graf1bis.png|link=]]<br /> <br /> <br /> Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == The residual error model ==<br /> <br /> <br /> For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br /> <br /> <br /> &lt;ul&gt;<br /> * A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme1&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br /> \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> :The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br /> <br /> <br /> ::[[File: continuous_graf2a1.png|link=]]<br /> <br /> <br /> :Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br /> <br /> <br /> &lt;ul&gt;<br /> ::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br /> &lt;br&gt;<br /> ::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br /> &lt;/ul&gt;<br /> <br /> <br /> ::[[File:continuous_graf2a2.png|link=]]<br /> <br /> <br /> :These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br /> <br /> <br /> * A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br /> <br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme2&quot;&gt;&lt;math&gt; y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br /> \quad \ 1 \leq j \leq n_i . &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> :The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br /> <br /> <br /> ::[[File:continuous_graf2b.png|link=]]<br /> <br /> <br /> * A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1&gt;0$ and $b_i&gt;0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br /> <br /> <br /> ::[[File:continuous_graf2c.png|link=]]<br /> <br /> <br /> * Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br /> <br /> <br /> ::[[File:continuous_graf2d.png|link=]]<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extension to autocorrelated errors == <br /> <br /> <br /> For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process.<br /> For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;autocorr1&quot;&gt;&lt;math&gt; {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> where $0\leq \rho_i &lt;1$ for each individual $i$.<br /> If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{array}<br /> \gamma(\tau) &amp;=&amp; {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &amp;= &amp;\rho_i^{\tau} .<br /> \end{array}&lt;/math&gt; }}<br /> <br /> The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br /> <br /> <br /> ::[[File:continuousGraf3.png|link=]]<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Distribution of the standardized residual errors ==<br /> <br /> <br /> The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br /> Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}&lt;-u}=\prob{\teps_{ij}&gt;u}$ for all $u\in \Rset$.<br /> Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}&gt;f(t_{ij},\psi_i)} = \prob{y_{ij}&lt;f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br /> <br /> A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br /> <br /> Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's t-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br /> <br /> <br /> ::[[File:continuous_graf4_bis.png|link=]]<br /> <br /> <br /> If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized Student's $t$-distribution.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The conditional likelihood ==<br /> <br /> <br /> The conditional likelihood for given observations $\by$ is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), &lt;/math&gt; }}<br /> <br /> where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br /> If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model1&quot;&gt;&lt;math&gt; \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br /> <br /> For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $\qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model2&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \psi ) &amp; = &amp;<br /> \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Transforming the data==<br /> <br /> <br /> The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br /> <br /> Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;def_t&quot; &gt;&lt;math&gt; \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br /> As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br /> <br /> <br /> <br /> {{Example<br /> |title=Examples: <br /> | text=<br /> 1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br /> <br /> &lt;!-- Therefore, $y=f e^{g\teps}$. --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \log(y_{ij})&amp;=&amp;\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br /> y_{ij}&amp;=&amp;f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File: continuous_graf5a.png|link=]]<br /> <br /> <br /> 2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br /> &lt;!-- %\begin{eqnarray*}<br /> %\transy(y)&amp;=&amp;\log(y/(1-y)) \\<br /> % y&amp;=&amp;\frac{f}{f+(1-f) e^{-g\teps}} .<br /> %\end{eqnarray*} --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \logit(y_{ij})&amp;=&amp;\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br /> y_{ij}&amp;=&amp; \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File:continuous_graf5b.png|link=]]<br /> <br /> <br /> 3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \transy(y_{ij})&amp;=&amp;\log((y_{ij}-A)/(B-y_{ij})), \\<br /> y_{ij}&amp;=&amp;A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> &lt;!-- [[File:continuous_graf5c.png]] --&gt;<br /> }}<br /> <br /> <br /> Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;likeN_model3&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br /> &lt;/math&gt; }} <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Censored data ==<br /> <br /> <br /> Censoring occurs when the value of a measurement or observation is only partially known.<br /> For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br /> <br /> For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the &quot;measurement&quot; is not a number but the information that the quantity is less than the LLOD.<br /> <br /> Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br /> <br /> As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br /> <br /> We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br /> <br /> <br /> * '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; <br /> \repy_{ij} = \left\{ \begin{array}{c}<br /> y_{ij} &amp; {\rm if } \ y_{ij} \geq L \\<br /> y_{ij} &lt; L &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }} <br /> <br /> &lt;blockquote&gt;In the figures below, the &quot;data&quot; below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6a.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||log-VL || cens<br /> |- <br /> | 1 || 1.0 || 0.26 || 0<br /> |-<br /> | 1 || 2.0 || 0.02 || 0<br /> |-<br /> | 1 || 3.0 || -0.13 || 0<br /> |-<br /> | 1 || 4.0 || -0.13 || 0<br /> |-<br /> | 1 || 5.0 || -0.30 || 1<br /> |-<br /> | 1 || 6.0 || -0.30 || 1<br /> |-<br /> | 1 || 7.0 || -0.25 || 0<br /> |-<br /> | 1 || 8.0 || -0.30 || 1<br /> |-<br /> | 1 || 9.0 || -0.29 || 0<br /> |-<br /> | 1 || 10.0 || -0.30 || 1<br /> |}<br /> |}<br /> <br /> <br /> * '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\notin I \\<br /> y_{ij} \in I &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as &quot;$y_{ij} \in [0,1)$&quot;. In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6b.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||CONC. || llimit || cens<br /> |-<br /> | 1 || 0.3 || 1.20 || . || 0<br /> |-<br /> | 1 || 0.5 || 1.93 || . || 0<br /> |-<br /> | 1 || 1.0 || 3.38 || . || 0<br /> |-<br /> | 1 || 2.0 || 3.88 || . || 0<br /> |-<br /> | 1 || 4.0 || 3.24 || . || 0<br /> |-<br /> | 1 || 6.0 || 1.82 || . || 0<br /> |-<br /> | 1 || 8.0 || 1.07 || . || 0<br /> |-<br /> | 1 || 12.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 16.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 20.0 || 1.00 || 0.00 || 1<br /> |}<br /> |}<br /> <br /> <br /> <br /> * '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\leq U \\<br /> y_{ij} &gt; U &amp; {\rm otherwise.}<br /> \end{array} \right. <br /> &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br /> &lt;/blockquote&gt;<br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6c.png|link=]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot; |<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||VOLUME || CENS<br /> |-<br /> | 1 || 2.0 || 1.85 || 0<br /> |-<br /> | 1 || 7.0 || 2.40 || 0<br /> |-<br /> | 1 || 12.0 || 3.27 || 0<br /> |-<br /> | 1 || 17.0 || 3.28 || 0<br /> |-<br /> | 1 || 22.0 || 3.62 || 0<br /> |- <br /> | 1 || 27.0 || 3.02 || 0<br /> |-<br /> | 1 || 32.0 || 3.80 || -1<br /> |-<br /> | 1 || 37.0 || 3.80 || -1<br /> |-<br /> | 1 || 42.0 || 3.80 || -1<br /> |-<br /> | 1 || 47.0 || 3.80 || -1<br /> |}<br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> <br /> |text= &amp;#32;<br /> * Different censoring limits and intervals can be in play at different times and for different individuals.<br /> * Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br /> }}<br /> <br /> <br /> The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br /> <br /> {{EquationWithRef<br /> |equation = &lt;div id=&quot;likeN_model4&quot;&gt;&lt;math&gt; <br /> \begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du &lt;/math&gt; }}<br /> <br /> We see that if $y_{ij}$ is not censored (i.e., $\mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions to multidimensional continuous observations == <br /> <br /> <br /> &lt;ul&gt;<br /> * Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i.<br /> &lt;/math&gt;}}<br /> <br /> : It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br /> <br /> <br /> * Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i .<br /> &lt;/math&gt; }}<br /> <br /> : Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> &lt;br&gt;&lt;br&gt;<br /> -----------------------------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary <br /> |text= <br /> A model for continuous data is completely defined by:<br /> <br /> *The structural model $f$<br /> *The residual error model $g$<br /> *The probability distribution of the residual errors $(\teps_{ij})$<br /> *Possibly a transformation $\transy$ of the data<br /> <br /> <br /> The model is associated with a design which includes:<br /> <br /> <br /> - the observation times $(t_{ij})$<br /> <br /> - possibly some additional regression variables $(x_{ij})$<br /> <br /> - possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br /> <br /> - possibly a censoring process $(I_{ij})$<br /> <br /> }}<br /> <br /> <br /> == $\mlxtran$ for continuous data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=<br /> <br /> |text= <br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; (A,\alpha,B,\beta, a) \\<br /> f(t,\psi) &amp;=&amp; A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br /> y_{ij} &amp;=&amp; f(t_{ij} , \psi_i) + a\, \teps_{ij}<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {A, B, alpha, beta, a}<br /> <br /> EQUATION:<br /> f = A*exp(-alpha*t) + B*exp(-beta*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, std=a}&lt;/pre&gt;<br /> }}<br /> <br /> }}<br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2=<br /> <br /> |text=<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br /> t_0 &amp;=&amp;0 \\[0.2cm]<br /> {\rm if \quad t&lt;t_0} \\[0.2cm]<br /> \quad \nitc &amp;=&amp; \delta \, c/( \beta \, p) \\<br /> \quad \itc &amp;=&amp; (s - d\,\nitc) / \delta \\<br /> \quad \vl &amp;=&amp; p \, \itc / c. \\[0.2cm] <br /> {\rm else \quad \quad }\\[0.2cm] <br /> \quad \dA{\nitc}{} &amp; =&amp; s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br /> \quad \dA{\itc}{} &amp; = &amp;\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br /> \quad \dA{\vl}{} &amp; = &amp;p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br /> \quad \log(y_{ij}) &amp;= &amp;\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {delta, c, beta, p, s, d, nu, rho, a}<br /> <br /> EQUATION:<br /> t0=0<br /> N_0 = delta*c/(beta*p)<br /> I_0 = (s - d*N_0)/delta<br /> V_0 = p*I_0/c<br /> ddt_N = s - beta*(1-nu)*N*V - d*N<br /> ddt_I = beta*(1-nu)*N*V - delta*I<br /> ddt_V = p*(1-rho)*I - c*V<br /> <br /> DEFINITION:<br /> y = {distribution=logNormal, prediction=V, std=a}<br /> &lt;/pre&gt; }} <br /> }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D.M. },<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{demidenko2005mixed,<br /> title={Mixed Models: Theory and Applications},<br /> author={Demidenko, E.},<br /> isbn={9780471726135},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&amp;redir_esc=y}, <br /> year={2005}, publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fitzmaurice2008longitudinal,<br /> title={Longitudinal Data Analysis},<br /> author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br /> isbn={9781420011579},<br /> lccn={2008020681},<br /> series={Chapman &amp; Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br /> year={2008},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang, J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications},<br /> publisher = {Springer Series in Statistics},<br /> year = {2007},<br /> address = {New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{laird1982,<br /> author = {Laird, N.M. and Ware, J.H.},<br /> title = {Random-Effects Models for Longitudinal Data},<br /> journal = {Biometrics},<br /> volume = {38},<br /> pages = {963-974},<br /> year = {1982}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lindstrom1990Nonlinear,<br /> author = {Lindstrom, M.J. and Bates, D.M. },<br /> title = {Nonlinear mixed-effects models for repeated measures},<br /> journal = {Biometrics},<br /> volume = {46},<br /> pages = {673-687},<br /> year = {1990}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R.C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C.E. and Searle, S.R.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&amp;redir_esc=y}, year={2004}, publisher={Wiley &amp; Sons} <br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{verbeke2009linear,<br /> title={Linear Mixed Models for Longitudinal Data},<br /> author={Verbeke, G. and Molenberghs, G.},<br /> isbn={9781441902993},<br /> lccn={2010483807},<br /> series={Springer Series in Statistics},<br /> url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{west2006linear,<br /> title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br /> author={West, B. and Welch, K.B. and Galecki, A.T.},<br /> isbn={9781584884804},<br /> lccn={2006045440},year={2006},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the observations <br /> |linkNext=Models for count data }}</div> Admin http://wiki.webpopix.org/index.php/Continuous_data_models Continuous data models 2013-06-07T13:38:23Z <p>Admin : /* Transforming the data */</p> <hr /> <div>&lt;!-- Menu for the Observations chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the observations]]<br /> *[[Modeling the observations| Introduction ]] | [[ Continuous data models ]] | [[Models for count data]] | [[Model for categorical data]] | [[Models for time-to-event data ]] | [[Joint models]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The data ==<br /> <br /> Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.<br /> <br /> The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.<br /> <br /> <br /> {| cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; <br /> | style=&quot;width:60%&quot; align=&quot;center&quot;| <br /> :[[File:continuous_graf0a_1.png|link=]]<br /> | style=&quot;width: 40%&quot; align=&quot;left&quot;| <br /> {| class=&quot;wikitable&quot; style=&quot;width: 70%;font-size:7pt; &quot;<br /> !| ID || TIME ||CONCENTRATION<br /> |- <br /> |1 || 1.0 || 9.84 <br /> |-<br /> |1 || 2.0 || 8.19 <br /> |-<br /> |1 || 4.0 || 6.91 <br /> |-<br /> |1 || 8.0 || 3.71 <br /> |-<br /> |1 || 12.0 || 1.25 <br /> |-<br /> |2 || 1.0 || 17.23 <br /> |-<br /> |2 || 3.0 || 11.14 <br /> |-<br /> |2 || 5.0 || 4.35 <br /> |-<br /> |2 || 10.0 || 2.92 <br /> |-<br /> |3 || 2.0 || 9.78 <br /> |-<br /> |3 || 3.0 || 10.40 <br /> |-<br /> |3 || 4.0 || 7.67 <br /> |-<br /> |3 || 6.0 || 6.84 <br /> |-<br /> |3 || 11.0 || 1.10 <br /> |-<br /> |4 || 4.0 || 8.78 <br /> |-<br /> |4 || 6.0 || 3.87 <br /> |-<br /> |4 || 12.0 || 1.85 <br /> |}<br /> |}<br /> <br /> <br /> Instead of individual plots, we can plot them all together. Such a figure is usually called a ''spaghetti plot'':<br /> <br /> <br /> ::[[File:continuous_graf0b_1.png|link=]]<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The model ==<br /> <br /> <br /> For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;nlme&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1)<br /> }}<br /> <br /> where $g(t_{ij},\psi_i)\geq 0$.<br /> <br /> Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1).<br /> In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray} \esp{y_{ij} {{!}} \psi_i} &amp;=&amp; f(t_{ij},\psi_i) \\ <br /> \std{y_{ij} {{!}} \psi_i} &amp;=&amp; g(t_{ij},\psi_i).<br /> \end{eqnarray}&lt;/math&gt;}}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The structural model == <br /> <br /> <br /> $f$ is known as the ''structural model'' and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).<br /> <br /> In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$.<br /> Here are some example curves for various combinations of $A$ and $\alpha$:<br /> <br /> <br /> ::[[File:continuous_graf1bis.png|link=]]<br /> <br /> <br /> Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == The residual error model ==<br /> <br /> <br /> For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,<br /> <br /> <br /> &lt;ul&gt;<br /> * A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model [[#nlme|(1)]] then reduces to<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme1&quot; &gt;&lt;math&gt;y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N<br /> \quad \ 1 \leq j \leq n_i. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> :The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.<br /> <br /> <br /> ::[[File: continuous_graf2a1.png|link=]]<br /> <br /> <br /> :Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model [[#nlme1|(2)]]. The figure below shows for 50 individuals:<br /> <br /> <br /> &lt;ul&gt;<br /> ::'''-left''': prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$. <br /> &lt;br&gt;<br /> ::'''-right''': observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.<br /> &lt;/ul&gt;<br /> <br /> <br /> ::[[File:continuous_graf2a2.png|link=]]<br /> <br /> <br /> :These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.<br /> <br /> <br /> * A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model [[#nlme|(1)]] then becomes<br /> <br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;nlme2&quot;&gt;&lt;math&gt; y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N,<br /> \quad \ 1 \leq j \leq n_i . &lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> <br /> :The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:<br /> <br /> <br /> ::[[File:continuous_graf2b.png|link=]]<br /> <br /> <br /> * A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1&gt;0$ and $b_i&gt;0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:<br /> <br /> <br /> ::[[File:continuous_graf2c.png|link=]]<br /> <br /> <br /> * Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.<br /> <br /> <br /> ::[[File:continuous_graf2d.png|link=]]<br /> &lt;/ul&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extension to autocorrelated errors == <br /> <br /> <br /> For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process.<br /> For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;autocorr1&quot;&gt;&lt;math&gt; {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. &lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> where $0\leq \rho_i &lt;1$ for each individual $i$.<br /> If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{array}<br /> \gamma(\tau) &amp;=&amp; {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &amp;= &amp;\rho_i^{\tau} .<br /> \end{array}&lt;/math&gt; }}<br /> <br /> The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.<br /> <br /> <br /> ::[[File:continuousGraf3.png|link=]]<br /> <br /> <br /> <br /> &lt;br&gt;<br /> == Distribution of the standardized residual errors ==<br /> <br /> <br /> The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$.<br /> Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}&lt;-u}=\prob{\teps_{ij}&gt;u}$ for all $u\in \Rset$.<br /> Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}&gt;f(t_{ij},\psi_i)} = \prob{y_{ij}&lt;f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.<br /> <br /> A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.<br /> <br /> Other distributions can be used, such as [http://en.wikipedia.org/wiki/Student's_t-distribution Student's t-distribution] (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.<br /> <br /> <br /> ::[[File:continuous_graf4_bis.png|link=]]<br /> <br /> <br /> If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized Student's $t$-distribution.<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The conditional likelihood ==<br /> <br /> <br /> The conditional likelihood for given observations $\by$ is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by {{!}} \bpsi), &lt;/math&gt; }}<br /> <br /> where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. <br /> If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model1&quot;&gt;&lt;math&gt; \begin{eqnarray}\pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \pcyipsii(\by_i {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \bpsi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) ,<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(5) }}<br /> <br /> where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).<br /> <br /> For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $\qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;likeN_model2&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \psi ) &amp; = &amp;<br /> \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} .<br /> \end{eqnarray} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(6) }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Transforming the data==<br /> <br /> <br /> The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.<br /> <br /> Model [[#nlme|(1)]] can be extended to include a transformation of the data:<br /> <br /> {{EquationWithRef <br /> |equation=&lt;div id=&quot;def_t&quot; &gt;&lt;math&gt; \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} &lt;/math&gt;&lt;/div&gt;<br /> |reference=(7) }}<br /> <br /> where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function).<br /> As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.<br /> <br /> <br /> <br /> {{Example<br /> |title=Examples: <br /> | text=<br /> 1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:<br /> <br /> &lt;!-- Therefore, $y=f e^{g\teps}$. --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \log(y_{ij})&amp;=&amp;\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\<br /> y_{ij}&amp;=&amp;f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File: continuous_graf5a.png|link=]]<br /> <br /> <br /> 2. If $y$ takes its values between 0 and 1, a logit transformation can be used:<br /> &lt;!-- %\begin{eqnarray*}<br /> %\transy(y)&amp;=&amp;\log(y/(1-y)) \\<br /> % y&amp;=&amp;\frac{f}{f+(1-f) e^{-g\teps}} .<br /> %\end{eqnarray*} --&gt;<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \logit(y_{ij})&amp;=&amp;\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\<br /> y_{ij}&amp;=&amp; \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }.<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> <br /> <br /> ::[[File:continuous_graf5b.png|link=]]<br /> <br /> <br /> 3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> \transy(y_{ij})&amp;=&amp;\log((y_{ij}-A)/(B-y_{ij})), \\<br /> y_{ij}&amp;=&amp;A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, .<br /> \end{eqnarray}&lt;/math&gt;<br /> }}<br /> &lt;!-- [[File:continuous_graf5c.png]] --&gt;<br /> }}<br /> <br /> <br /> Using the transformation proposed in [[#def_t|(7)]], the conditional density $\pcypsi$ becomes<br /> <br /> {{EquationWithRef<br /> |equation= &lt;div id=&quot;likeN_model3&quot; &gt;&lt;math&gt; \begin{eqnarray}<br /> \pcypsi(\by {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) {{!}} \psi_i ) \\<br /> &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right)<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(8) }}<br /> <br /> For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \pcypsi(\by {{!}} \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}.<br /> &lt;/math&gt; }} <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Censored data ==<br /> <br /> <br /> Censoring occurs when the value of a measurement or observation is only partially known.<br /> For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.<br /> <br /> For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the &quot;measurement&quot; is not a number but the information that the quantity is less than the LLOD.<br /> <br /> Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.<br /> <br /> As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.<br /> <br /> We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.<br /> <br /> <br /> * '''Left censoring''': a data point is below a certain value $L$ but it is not known by how much:<br /> <br /> {{Equation1<br /> |equation = &lt;math&gt; <br /> \repy_{ij} = \left\{ \begin{array}{c}<br /> y_{ij} &amp; {\rm if } \ y_{ij} \geq L \\<br /> y_{ij} &lt; L &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }} <br /> <br /> &lt;blockquote&gt;In the figures below, the &quot;data&quot; below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column {{Verbatim|cens}} can be used to indicate if an observation is left-censored ({{Verbatim|cens{{-}}1}}) or not ({{Verbatim|cens{{-}}0}}). The column of observations {{Verbatim|log-VL}} displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6a.png]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||log-VL || cens<br /> |- <br /> | 1 || 1.0 || 0.26 || 0<br /> |-<br /> | 1 || 2.0 || 0.02 || 0<br /> |-<br /> | 1 || 3.0 || -0.13 || 0<br /> |-<br /> | 1 || 4.0 || -0.13 || 0<br /> |-<br /> | 1 || 5.0 || -0.30 || 1<br /> |-<br /> | 1 || 6.0 || -0.30 || 1<br /> |-<br /> | 1 || 7.0 || -0.25 || 0<br /> |-<br /> | 1 || 8.0 || -0.30 || 1<br /> |-<br /> | 1 || 9.0 || -0.29 || 0<br /> |-<br /> | 1 || 10.0 || -0.30 || 1<br /> |}<br /> |}<br /> <br /> <br /> * '''Interval censoring:''' if a data point is in interval $I$, its exact value is not known:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\notin I \\<br /> y_{ij} \in I &amp; {\rm otherwise.}<br /> \end{array} \right. &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as &quot;$y_{ij} \in [0,1)$&quot;. In the table, an additional column {{Verbatim|llimit}} is required to indicate the lower bound of the censoring interval.&lt;/blockquote&gt;<br /> <br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6b.png]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot;|<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||CONC. || llimit || cens<br /> |-<br /> | 1 || 0.3 || 1.20 || . || 0<br /> |-<br /> | 1 || 0.5 || 1.93 || . || 0<br /> |-<br /> | 1 || 1.0 || 3.38 || . || 0<br /> |-<br /> | 1 || 2.0 || 3.88 || . || 0<br /> |-<br /> | 1 || 4.0 || 3.24 || . || 0<br /> |-<br /> | 1 || 6.0 || 1.82 || . || 0<br /> |-<br /> | 1 || 8.0 || 1.07 || . || 0<br /> |-<br /> | 1 || 12.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 16.0 || 1.00 || 0.00 || 1<br /> |-<br /> | 1 || 20.0 || 1.00 || 0.00 || 1<br /> |}<br /> |}<br /> <br /> <br /> <br /> * '''Right censoring:''' when a data point is above a certain value $U$, it is not known by how much:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt; \repy_{ij} = \left\{ \begin{array}{cc}<br /> y_{ij} &amp; {\rm if } \ y_{ij}\leq U \\<br /> y_{ij} &gt; U &amp; {\rm otherwise.}<br /> \end{array} \right. <br /> &lt;/math&gt; }}<br /> <br /> &lt;blockquote&gt;Column {{Verbatim|cens}} is used to indicate if an observation is right-censored ({{Verbatim|cens{{-}}-1}}) or not ({{Verbatim|cens{{-}}0}}).<br /> &lt;/blockquote&gt;<br /> <br /> {| cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; <br /> | style=&quot;width=60%&quot; |<br /> [[File:continuous_graf6c.png]]<br /> | style=&quot;width=40%&quot; align=&quot;right&quot; |<br /> {| class=&quot;wikitable&quot; style=&quot;width: 150%&quot;<br /> !| ID || TIME ||VOLUME || CENS<br /> |-<br /> | 1 || 2.0 || 1.85 || 0<br /> |-<br /> | 1 || 7.0 || 2.40 || 0<br /> |-<br /> | 1 || 12.0 || 3.27 || 0<br /> |-<br /> | 1 || 17.0 || 3.28 || 0<br /> |-<br /> | 1 || 22.0 || 3.62 || 0<br /> |- <br /> | 1 || 27.0 || 3.02 || 0<br /> |-<br /> | 1 || 32.0 || 3.80 || -1<br /> |-<br /> | 1 || 37.0 || 3.80 || -1<br /> |-<br /> | 1 || 42.0 || 3.80 || -1<br /> |-<br /> | 1 || 47.0 || 3.80 || -1<br /> |}<br /> |}<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> <br /> |text= &amp;#32;<br /> * Different censoring limits and intervals can be in play at different times and for different individuals.<br /> * Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.<br /> }}<br /> <br /> <br /> The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,<br /> <br /> {{EquationWithRef<br /> |equation = &lt;div id=&quot;likeN_model4&quot;&gt;&lt;math&gt; <br /> \begin{eqnarray} \pcypsi(\brepy {{!}} \bpsi ) &amp; = &amp; \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} {{!}} \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} {{!}} \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }.<br /> \end{eqnarray}<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(9) }}<br /> <br /> where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \prob{y_{ij} \in I_{ij} {{!}} \psi_i} = \int_{I_{ij} } \qypsiij(u {{!}} \psi_i )\, du &lt;/math&gt; }}<br /> <br /> We see that if $y_{ij}$ is not censored (i.e., $\mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions to multidimensional continuous observations == <br /> <br /> <br /> &lt;ul&gt;<br /> * Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation [[#nlme|(1)]] still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i.<br /> &lt;/math&gt;}}<br /> <br /> : It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.<br /> <br /> <br /> * Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N,<br /> \ \ 1 \leq j \leq n_i .<br /> &lt;/math&gt; }}<br /> <br /> : Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} .<br /> &lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> &lt;br&gt;&lt;br&gt;<br /> -----------------------------------------------<br /> &lt;br&gt;&lt;br&gt;<br /> <br /> {{Summary<br /> |title=Summary <br /> |text= <br /> A model for continuous data is completely defined by:<br /> <br /> *The structural model $f$<br /> *The residual error model $g$<br /> *The probability distribution of the residual errors $(\teps_{ij})$<br /> *Possibly a transformation $\transy$ of the data<br /> <br /> <br /> The model is associated with a design which includes:<br /> <br /> <br /> - the observation times $(t_{ij})$<br /> <br /> - possibly some additional regression variables $(x_{ij})$<br /> <br /> - possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)<br /> <br /> - possibly a censoring process $(I_{ij})$<br /> <br /> }}<br /> <br /> <br /> == $\mlxtran$ for continuous data models == <br /> <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=<br /> <br /> |text= <br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; (A,\alpha,B,\beta, a) \\<br /> f(t,\psi) &amp;=&amp; A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\<br /> y_{ij} &amp;=&amp; f(t_{ij} , \psi_i) + a\, \teps_{ij}<br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {A, B, alpha, beta, a}<br /> <br /> EQUATION:<br /> f = A*exp(-alpha*t) + B*exp(-beta*t)<br /> <br /> DEFINITION:<br /> y = {distribution=normal, prediction=f, std=a}&lt;/pre&gt;<br /> }}<br /> <br /> }}<br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example 2:<br /> |title2=<br /> <br /> |text=<br /> |equation= &lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; (\delta, c , \beta, p, s, d, \nu,\rho, a) \\<br /> t_0 &amp;=&amp;0 \\[0.2cm]<br /> {\rm if \quad t&lt;t_0} \\[0.2cm]<br /> \quad \nitc &amp;=&amp; \delta \, c/( \beta \, p) \\<br /> \quad \itc &amp;=&amp; (s - d\,\nitc) / \delta \\<br /> \quad \vl &amp;=&amp; p \, \itc / c. \\[0.2cm] <br /> {\rm else \quad \quad }\\[0.2cm] <br /> \quad \dA{\nitc}{} &amp; =&amp; s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\<br /> \quad \dA{\itc}{} &amp; = &amp;\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\<br /> \quad \dA{\vl}{} &amp; = &amp;p(1-\rho) \, \itc(t) - c \, \vl(t) \\<br /> \quad \log(y_{ij}) &amp;= &amp;\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} <br /> \end{eqnarray}&lt;/math&gt;<br /> |code=<br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> INPUT:<br /> input = {delta, c, beta, p, s, d, nu, rho, a}<br /> <br /> EQUATION:<br /> t0=0<br /> N_0 = delta*c/(beta*p)<br /> I_0 = (s - d*N_0)/delta<br /> V_0 = p*I_0/c<br /> ddt_N = s - beta*(1-nu)*N*V - d*N<br /> ddt_I = beta*(1-nu)*N*V - delta*I<br /> ddt_V = p*(1-rho)*I - c*V<br /> <br /> DEFINITION:<br /> y = {distribution=logNormal, prediction=V, std=a}<br /> &lt;/pre&gt; }} <br /> }}<br /> <br /> &lt;br&gt;&lt;br&gt;<br /> <br /> <br /> ==Bibliography==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @book{davidian1995,<br /> author = {Davidian, M. and Giltinan, D.M. },<br /> title = {Nonlinear Models for Repeated Measurements Data },<br /> publisher = {Chapman &amp; Hall.},<br /> address = {London},<br /> edition = {},<br /> year = {1995}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{demidenko2005mixed,<br /> title={Mixed Models: Theory and Applications},<br /> author={Demidenko, E.},<br /> isbn={9780471726135},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Mixed_Models.html?id=IWQR8d_UZHoC&amp;redir_esc=y}, <br /> year={2005}, publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{fitzmaurice2008longitudinal,<br /> title={Longitudinal Data Analysis},<br /> author={Fitzmaurice, G. and Davidian, M. and Verbeke, G. and Molenberghs, G.},<br /> isbn={9781420011579},<br /> lccn={2008020681},<br /> series={Chapman &amp; Hall/CRC Handbooks of Modern Statistical Methods},url={http://books.google.fr/books?id=zVBjCvQCoGQC},<br /> year={2008},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{jiang2007,<br /> author = {Jiang, J.},<br /> title = {Linear and Generalized Linear Mixed Models and Their Applications},<br /> publisher = {Springer Series in Statistics},<br /> year = {2007},<br /> address = {New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{laird1982,<br /> author = {Laird, N.M. and Ware, J.H.},<br /> title = {Random-Effects Models for Longitudinal Data},<br /> journal = {Biometrics},<br /> volume = {38},<br /> pages = {963-974},<br /> year = {1982}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{lindstrom1990Nonlinear,<br /> author = {Lindstrom, M.J. and Bates, D.M. },<br /> title = {Nonlinear mixed-effects models for repeated measures},<br /> journal = {Biometrics},<br /> volume = {46},<br /> pages = {673-687},<br /> year = {1990}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{littell2006sas,<br /> title={SAS for mixed models},<br /> author={Littell, R.C.},<br /> year={2006},<br /> publisher={SAS institute}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{mcculloch2011generalized,<br /> title={Generalized, Linear, and Mixed Models},<br /> author={McCulloch, C.E. and Searle, S.R.},<br /> isbn={9781118209967},<br /> series={Wiley Series in Probability and Statistics}, url={http://books.google.fr/books/about/Generalized_Linear_and_Mixed_Models.html?id=bWDPukohugQC&amp;redir_esc=y}, year={2004}, publisher={Wiley &amp; Sons} <br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{verbeke2009linear,<br /> title={Linear Mixed Models for Longitudinal Data},<br /> author={Verbeke, G. and Molenberghs, G.},<br /> isbn={9781441902993},<br /> lccn={2010483807},<br /> series={Springer Series in Statistics},<br /> url={http://books.google.fr/books?id=jmPkX4VU7h0C},<br /> year={2009},<br /> publisher={Springer}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{west2006linear,<br /> title={Linear Mixed Models: A Practical Guide Using Statistical Software},<br /> author={West, B. and Welch, K.B. and Galecki, A.T.},<br /> isbn={9781584884804},<br /> lccn={2006045440},year={2006},publisher={Taylor &amp; Francis}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the observations <br /> |linkNext=Models for count data }}</div> Admin http://wiki.webpopix.org/index.php/Extension_to_multivariate_distributions Extension to multivariate distributions 2013-06-07T13:34:34Z <p>Admin : /* The Gaussian model */</p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The Gaussian model ==<br /> <br /> We would now like to extend the model defined for a unique individual scalar parameter $\psi_i$ to the case where $\psi_i$ is a vector $(\psi_{i,1},\psi_{i,2}, \ldots,\psi_{i,d})$ of individual parameters.<br /> <br /> To begin with, we are going to merely generalize the basic model to each component of $\psi_i$. To this end, we suppose that there exists a vector of covariates $c_i = (c_{i,1}, \ldots, c_{i,L})$ and:<br /> <br /> <br /> &lt;ul&gt;<br /> * $d$ monotonic transformations $h_1$, $h_2$, $\ldots$, $h_d$<br /> <br /> * $d$ vectors of fixed coefficients $\bbeta_1$, $\bbeta_2, \ldots, \bbeta_d$<br /> <br /> * $d$ functions $\hmodel_1$, $\hmodel_2$, $\ldots$, $\hmodel_d$<br /> <br /> * a vector of random effects $\beeta_i = (\eta_{i,1},\eta_{i,2},\ldots , \eta_{i,d})$,<br /> &lt;/ul&gt;<br /> <br /> <br /> such that, for each $\iparam=1,2,\ldots,d$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hpsi_{i,\iparam} &amp;=&amp; \hmodel_\iparam(\bbeta_\iparam,c_i) \\<br /> h_\iparam(\psi_{i,\iparam}) &amp; =&amp; h_\iparam(\hpsi_{i,\iparam}) +\eta_{i,\iparam} \\<br /> &amp; =&amp; \mmodel_\iparam(\bbeta_\iparam,c_i) +\eta_{i,\iparam}.<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> For instance, a linear covariate model supposes that for each $\iparam=1,2,\ldots,d$, we have:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> h_\iparam(\hpsi_{i,\iparam}) = h_\iparam(\psi_{ {\rm pop},\iparam})+ \bbeta_{\iparam,1}(c_{i,1} - c_{\rm pop,1}) + \bbeta_{\iparam,2}(c_{i,2} - c_{\rm pop,2}) + \ldots + \bbeta_{\iparam,L}(c_{i,L} - c_{\rm pop,L}) .<br /> &lt;/math&gt; }}<br /> <br /> Dependency can be introduced between parameters by supposing that the random effects $(\eta_{i,\iparam})$ are not independent. In the special case where the random effects are Gaussian, this means considering them to be correlated, i.e., we suppose there exists a $d\times d$ variance-covariance matrix $\Omega$ such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\beeta_i}&amp;=&amp;0 \\<br /> \esp{\beeta_i \beeta_i^\prime} &amp;=&amp; \Omega .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \Omega = \left(<br /> \begin{array}{cccc}<br /> \omega_1^2 &amp; \omega_{1,2} &amp; \ldots &amp; \omega_{1,d} \\<br /> \omega_{1,2} &amp; \omega_2^2 &amp; \ldots &amp; \omega_{2,d} \\<br /> \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\<br /> \omega_{1,d} &amp; \omega_{2,d} &amp; \ldots &amp; \omega_d^2<br /> \end{array}<br /> \right), <br /> &lt;/math&gt; }}<br /> <br /> where $\omega_\iparam^2$ is the variance of $\eta_{i,\iparam}$ and $\omega_{\iparam,\iparam^\prime}$ the covariance between $\eta_{i,\iparam}$ and $\eta_{i,\iparam^\prime}$.<br /> <br /> It will be useful in the following to have a diagonal decomposition of $\Omega$. To this end, let us define the correlation matrix $R=(R_{\iparam,\iparam^\prime}, 1 \leq \iparam,\iparam^\prime \leq d)$ of the vector $\eta_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; R_{\iparam,\iparam^\prime} = \left\{<br /> \begin{array}{ll}<br /> 1 &amp; {\rm if \quad } \iparam=\iparam^\prime \\<br /> \rho_{\iparam,\iparam^\prime}=\frac{\omega_{\iparam,\iparam^\prime} }{\omega_{\iparam}\omega_{\iparam^\prime} } &amp; \hbox{otherwise,}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> and let $D=(D_{\iparam,\iparam^\prime})$ be a diagonal matrix which contains the standard deviations $(\omega_\iparam)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;D_{\iparam,\iparam^\prime} = \left\{<br /> \begin{array}{ll}<br /> \omega_{\iparam} &amp; {\rm if \quad } \iparam=\iparam^\prime \\<br /> 0 &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> Then we have the diagonal decomposition: $\Omega = D \, R \, D$.<br /> <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Consider a PK model with three PK parameters: the absorption rate constant $ka$, the volume $V$ and the clearance $Cl$. Here, $\psi_i=(ka_i,V_i,Cl_i)$.<br /> <br /> <br /> &lt;li&gt; If we make the assumption that $\eta_{i,V}$ and $\eta_{i,Cl}$ are correlated, it means that the log-volume and the log-clearance are linearly correlated, with correlation:<br /> <br /> {{Equation1<br /> |equation= &lt;math&gt;\begin{eqnarray}<br /> \rho_{V,Cl} &amp; = &amp; \corr{\eta_{i,V},\eta_{i,Cl} } \\<br /> &amp; = &amp; \corr{\log(V_i),\log(Cl_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; Assuming that $ka$ is fixed in the population means that $ka_i = ka_{\rm pop}$ for any $i$, which implies that $\eta_{i,ka}=0$, and thus $\omega_{ka}=0$.<br /> <br /> The correlation matrix $R$ and the variance covariance matrix $\Omega$ of $(\eta_{i,ka}, \eta_{i,V}, \eta_{i,Cl})$ are therefore<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; R = \left(<br /> \begin{array}{ccc}<br /> 1 &amp; 0 &amp; 0 \\<br /> 0 &amp; 1 &amp; \rho_{V,Cl} \\<br /> 0 &amp; \rho_{V,Cl} &amp; 1 \\<br /> \end{array}<br /> \right)<br /> , \quad \quad<br /> \Omega = DRD = \left(<br /> \begin{array}{ccc}<br /> 0 &amp; 0 &amp; 0 \\<br /> 0 &amp; \omega_V^2 &amp; \omega_V\omega_{Cl}\, \rho_{V,Cl} \\<br /> 0 &amp; \omega_V\omega_{Cl}\, \rho_{V,Cl} &amp; \omega_{Cl}^2 \\<br /> \end{array}<br /> \right) .<br /> &lt;/math&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The probability distribution function ==<br /> <br /> <br /> <br /> We have now all the elements needed for computing the pdf of $\psi_i=(\psi_{i,1},\psi_{i,2}, \ldots,\psi_{i,d})$.<br /> Here, $\theta = (\psi_{{\rm pop},1}, \ldots, \psi_{{\rm pop},d}, \bbeta_1, \ldots,\bbeta_d,\Omega)$.<br /> <br /> <br /> &lt;ul&gt;<br /> * If $\Omega$ is a positive-definite matrix, it can be inverted and a straightforward extension of the pdf proposed in [[Covariate_models#indiv_cov6|(9) of The covariate model]] for a scalar variable gives<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_multi1&quot;&gt;&lt;math&gt;<br /> \ppsii(\psi_i;c_i,\theta )= \left( \prod_{\iparam=1}^d h_\iparam^\prime(\psi_{i,\iparam}) \right)<br /> (2 \pi)^{-\frac{d}{2} } {{!}}\Omega{{!}}^{-\frac{1}{2} }<br /> {\rm exp} \left\{-\frac{1}{2} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^\prime \Omega^{-1} ( h(\psi_i) - \mmodel(\bbeta,c_i) ) \right\} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> : where $h(\psi_i)$ is the column vector $(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots, h_d(\psi_{i,d}))^\prime$ and $\mmodel(\bbeta,c_i)$ the column vector $(h_1(\hpsi_{i,1}), h_2(\hpsi_{i,2}), \ldots, h_d(\hpsi_{i,d}))^\prime$.<br /> <br /> <br /> * If the variance of some of the random effects is null, $\Omega$ is not positive-definite. The pdf in [[#indiv_multi1|(1)]] does not apply anymore for the complete $d$-vector $\psi_i$ but only for the $d_1$-vector subset $\psi_i^{(1)}$ of $\psi_i$ whose variance matrix $\Omega_1$ is positive-definite. The distribution of the remaining fixed parameters $\psi_i^{(0)}$ is a Dirac delta distribution. Let $I_0$ be the indices of the parameters $\psi_i^{(0)}$ and $I_1$ those of the parameters $\psi_i^{(1)}$, i.e., $\omega_\iparam =0$ if $\iparam \in I_0$ and $\omega_\iparam &gt;0$ if $\iparam \in I_1$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \ppsii(\psi_i;c_i,\theta ) = \pmacro(\psi_i^{(0)};c_i,\theta )\,\,\pmacro(\psi_i^{(1)};c_i,\theta ) ,<br /> &lt;/math&gt; }}<br /> <br /> : where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\psi_i^{(0)};c_i,\theta ) &amp;= &amp; \prod_{\iparam \in I_0} \delta_{ \{ h(\psi_{i,\iparam})=\mmodel_{\iparam}(\bbeta_{\iparam},c_i) \} } \\<br /> \pmacro(\psi_i^{(1)};c_i,\theta )&amp;=&amp; \left( \prod_{\iparam \in I_1} h_\iparam^\prime(\psi_{i,\iparam}) \right)<br /> (2 \pi)^{-\frac{d_1}{2} } {{!}}\Omega_1{{!}}^{-\frac{1}{2} }<br /> {\rm exp} \left\{ -\frac{1}{2} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)^\prime} \Omega_1^{-1} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)} \right\},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> : with $( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)}$ the same as $( h(\psi_i) - \mmodel(\bbeta,c_i) )$ but with the $I_0$ entries removed.<br /> <br /> <br /> * There exist other situations where $\Omega$ is not positive-definite. This is the case for instance when two random effects are equal: $\eta_{i,\iparam} \equiv \eta_{i,\iparam^\prime}$. For them, we can calculate a joint distribution <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pmacro(\psi_{i,\iparam}, \psi_{i,\iparam^\prime},\eta_{i,\iparam}; \bbeta_{\iparam},\bbeta_{\iparam^\prime},\omega^2_\iparam ,c_i ) = \pmacro(\psi_{i,\iparam} {{!}} \eta_{i,\iparam}; \bbeta_{\iparam}, c_i ) \<br /> \pmacro(\psi_{i,\iparam^\prime} {{!}} \eta_{i,\iparam};\bbeta_{\iparam^\prime} , c_i ) \<br /> \pmacro(\eta_{i,\iparam} ; \omega^2_\iparam) , &lt;/math&gt; }}<br /> <br /> : where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\psi_{i,\iparam}{{!}} \eta_{i,\iparam}; \bbeta_{\iparam}, c_i ) &amp;=&amp; \delta_{\{h(\psi_{i,\iparam})=\mmodel_{\iparam}(\bbeta_{\iparam},c_i)+\eta_{i,\iparam} \} } \\<br /> \pmacro( \psi_{i,\iparam^\prime} {{!}} \eta_{i,\iparam};\bbeta_{\iparam^\prime} , c_i ) &amp;=&amp; \delta_{\{h(\psi_{i,\iparam^\prime})=\mmodel_{\iparam^\prime}(\bbeta_{\iparam^\prime},c_i)+\eta_{i,\iparam} \} } \\<br /> \pmacro(\eta_{i,\iparam} ; \omega^2_\iparam) &amp;=&amp; \displaystyle{ \frac{ 1}{\sqrt{2 \, \pi \omega_\iparam^2 } } }\ \exp\left\{-\displaystyle{ \frac{\eta_{i,\iparam}^2}{2 \omega_\iparam^2} }\right\}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> All kinds of combinations are possible, including parameters with and without variability, algebraic relationships between random effects, etc. In all possible cases it is possible to find an adequate decomposition that lets us characterize a pdf. This pdf turns out to play a fundamental role for tasks such as population parameter estimation with maximum likelihood, where we start with the observations $\by = (y_i , 1\leq i \leq N)$ and the individual parameters $(\psi_i)$ are not observed.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> == $\mlxtran$ for the covariance model ==<br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=<br /> |text= TO DO<br /> |equation=<br /> |code =<br /> }}<br /> --&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Model with covariates<br /> |linkNext=Additional levels of variability }}</div> Admin http://wiki.webpopix.org/index.php/Extension_to_multivariate_distributions Extension to multivariate distributions 2013-06-07T13:31:24Z <p>Admin : /* $\mlxtran$ for the covariance model */</p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The Gaussian model ==<br /> <br /> We would now like to extend the model defined for a unique individual scalar parameter $\psi_i$ to the case where $\psi_i$ is a vector $(\psi_{i,1},\psi_{i,2}, \ldots,\psi_{i,d})$ of individual parameters.<br /> <br /> To begin with, we are going to merely generalize the basic model to each component of $\psi_i$. To this end, we suppose that there exists a vector of covariates $c_i = (c_{i,1}, \ldots, c_{i,L})$ and:<br /> <br /> <br /> &lt;ul&gt;<br /> * $d$ monotonic transformations $h_1$, $h_2$, $\ldots$, $h_d$<br /> <br /> * $d$ vectors of fixed coefficients $\bbeta_1$, $\bbeta_2, \ldots, \bbeta_d$<br /> <br /> * $d$ functions $\hmodel_1$, $\hmodel_2$, $\ldots$, $\hmodel_d$<br /> <br /> * a vector of random effects $\beeta_i = (\eta_{i,1},\eta_{i,2},\ldots , \eta_{i,d})$,<br /> &lt;/ul&gt;<br /> <br /> <br /> such that, for each $\iparam=1,2,\ldots,d$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hpsi_{i,\iparam} &amp;=&amp; \hmodel_\iparam(\bbeta_\iparam,c_i) \\<br /> h_\iparam(\psi_{i,\iparam}) &amp; =&amp; h_\iparam(\hpsi_{i,\iparam}) +\eta_{i,\iparam} \\<br /> &amp; =&amp; \mmodel_\iparam(\bbeta_\iparam,c_i) +\eta_{i,\iparam}.<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> For instance, a linear covariate model supposes that for each $\iparam=1,2,\ldots,d$, we have:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> h_\iparam(\hpsi_{i,\iparam}) = h_\iparam(\psi_{ {\rm pop},\iparam})+ \bbeta_{\iparam,1}(c_{i,1} - c_{\rm pop,1}) + \bbeta_{\iparam,2}(c_{i,2} - c_{\rm pop,2}) + \ldots + \bbeta_{\iparam,L}(c_{i,L} - c_{\rm pop,L}) .<br /> &lt;/math&gt; }}<br /> <br /> Dependency can be introduced between parameters by supposing that the random effects $(\eta_{i,\iparam})$ are not independent. In the special case where the random effects are Gaussian, this means considering them to be correlated, i.e., we suppose there exists a $d\times d$ variance-covariance matrix $\Omega$ such that<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\beeta_i}&amp;=&amp;0 \\<br /> \esp{\beeta_i \beeta_i^\prime} &amp;=&amp; \Omega .<br /> \end{eqnarray} &lt;/math&gt; }}<br /> <br /> Here,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \Omega = \left(<br /> \begin{array}{cccc}<br /> \omega_1^2 &amp; \omega_{1,2} &amp; \ldots &amp; \omega_{1,d} \\<br /> \omega_{1,2} &amp; \omega_2^2 &amp; \ldots &amp; \omega_{2,d} \\<br /> \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\<br /> \omega_{1,d} &amp; \omega_{2,d} &amp; \ldots &amp; \omega_d^2<br /> \end{array}<br /> \right), <br /> &lt;/math&gt; }}<br /> <br /> where $\omega_\iparam^2$ is the variance of $\eta_{i,\iparam}$ and $\omega_{\iparam,\iparam^\prime}$ the covariance between $\eta_{i,\iparam}$ and $\eta_{i,\iparam^\prime}$.<br /> <br /> It will be useful in the following to have a diagonal decomposition of $\Omega$. To this end, let us define the correlation matrix $R=(R_{\iparam,\iparam^\prime}, 1 \leq \iparam,\iparam^\prime \leq d)$ of the vector $\eta_i$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; R_{\iparam,\iparam^\prime} = \left\{<br /> \begin{array}{ll}<br /> 1 &amp; {\rm if \quad } \iparam=\iparam^\prime \\<br /> \rho_{\iparam,\iparam^\prime}=\frac{\omega_{\iparam,\iparam^\prime} }{\omega_{\iparam}\omega_{\iparam^\prime} } &amp; \hbox{otherwise,}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> and let $D=(D_{\iparam,\iparam^\prime})$ be a diagonal matrix which contains the standard deviations $(\omega_\iparam)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;D_{\iparam,\iparam^\prime} = \left\{<br /> \begin{array}{ll}<br /> \omega_{\iparam} &amp; {\rm if \quad } \iparam=\iparam^\prime \\<br /> 0 &amp; {\rm otherwise.}<br /> \end{array}<br /> \right.<br /> &lt;/math&gt; }}<br /> <br /> Then we have the diagonal decomposition: $\Omega = D \, R \, D$.<br /> <br /> <br /> <br /> {{Example<br /> |title=Example:<br /> |text=Consider a PK model with three PK parameters: the absorption rate constant $ka$, the volume $V$ and the clearance $Cl$. Here, $\psi_i=(ka_i,V_i,Cl_i)$.<br /> <br /> <br /> &lt;li&gt; If we make the assumption that $\eta_{i,V}$ and $\eta_{i,Cl}$ are correlated, it means that the log-volume and the log-clearance are linearly correlated, with correlation:<br /> <br /> {{Equation<br /> |equation= &lt;math&gt;\begin{eqnarray}<br /> \rho_{V,Cl} &amp; = &amp; \corr{\eta_{i,V},\eta_{i,Cl} } \\<br /> &amp; = &amp; \corr{\log(V_i),\log(Cl_i)} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; Assuming that $ka$ is fixed in the population means that $ka_i = ka_{\rm pop}$ for any $i$, which implies that $\eta_{i,ka}=0$, and thus $\omega_{ka}=0$.<br /> <br /> The correlation matrix $R$ and the variance covariance matrix $\Omega$ of $(\eta_{i,ka}, \eta_{i,V}, \eta_{i,Cl})$ are therefore<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; R = \left(<br /> \begin{array}{ccc}<br /> 1 &amp; 0 &amp; 0 \\<br /> 0 &amp; 1 &amp; \rho_{V,Cl} \\<br /> 0 &amp; \rho_{V,Cl} &amp; 1 \\<br /> \end{array}<br /> \right)<br /> , \quad \quad<br /> \Omega = DRD = \left(<br /> \begin{array}{ccc}<br /> 0 &amp; 0 &amp; 0 \\<br /> 0 &amp; \omega_V^2 &amp; \omega_V\omega_{Cl}\, \rho_{V,Cl} \\<br /> 0 &amp; \omega_V\omega_{Cl}\, \rho_{V,Cl} &amp; \omega_{Cl}^2 \\<br /> \end{array}<br /> \right) .<br /> &lt;/math&gt; }}<br /> }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> == The probability distribution function ==<br /> <br /> <br /> <br /> We have now all the elements needed for computing the pdf of $\psi_i=(\psi_{i,1},\psi_{i,2}, \ldots,\psi_{i,d})$.<br /> Here, $\theta = (\psi_{{\rm pop},1}, \ldots, \psi_{{\rm pop},d}, \bbeta_1, \ldots,\bbeta_d,\Omega)$.<br /> <br /> <br /> &lt;ul&gt;<br /> * If $\Omega$ is a positive-definite matrix, it can be inverted and a straightforward extension of the pdf proposed in [[Covariate_models#indiv_cov6|(9) of The covariate model]] for a scalar variable gives<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_multi1&quot;&gt;&lt;math&gt;<br /> \ppsii(\psi_i;c_i,\theta )= \left( \prod_{\iparam=1}^d h_\iparam^\prime(\psi_{i,\iparam}) \right)<br /> (2 \pi)^{-\frac{d}{2} } {{!}}\Omega{{!}}^{-\frac{1}{2} }<br /> {\rm exp} \left\{-\frac{1}{2} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^\prime \Omega^{-1} ( h(\psi_i) - \mmodel(\bbeta,c_i) ) \right\} ,<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> : where $h(\psi_i)$ is the column vector $(h_1(\psi_{i,1}), h_2(\psi_{i,2}), \ldots, h_d(\psi_{i,d}))^\prime$ and $\mmodel(\bbeta,c_i)$ the column vector $(h_1(\hpsi_{i,1}), h_2(\hpsi_{i,2}), \ldots, h_d(\hpsi_{i,d}))^\prime$.<br /> <br /> <br /> * If the variance of some of the random effects is null, $\Omega$ is not positive-definite. The pdf in [[#indiv_multi1|(1)]] does not apply anymore for the complete $d$-vector $\psi_i$ but only for the $d_1$-vector subset $\psi_i^{(1)}$ of $\psi_i$ whose variance matrix $\Omega_1$ is positive-definite. The distribution of the remaining fixed parameters $\psi_i^{(0)}$ is a Dirac delta distribution. Let $I_0$ be the indices of the parameters $\psi_i^{(0)}$ and $I_1$ those of the parameters $\psi_i^{(1)}$, i.e., $\omega_\iparam =0$ if $\iparam \in I_0$ and $\omega_\iparam &gt;0$ if $\iparam \in I_1$. Then,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \ppsii(\psi_i;c_i,\theta ) = \pmacro(\psi_i^{(0)};c_i,\theta )\,\,\pmacro(\psi_i^{(1)};c_i,\theta ) ,<br /> &lt;/math&gt; }}<br /> <br /> : where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\psi_i^{(0)};c_i,\theta ) &amp;= &amp; \prod_{\iparam \in I_0} \delta_{ \{ h(\psi_{i,\iparam})=\mmodel_{\iparam}(\bbeta_{\iparam},c_i) \} } \\<br /> \pmacro(\psi_i^{(1)};c_i,\theta )&amp;=&amp; \left( \prod_{\iparam \in I_1} h_\iparam^\prime(\psi_{i,\iparam}) \right)<br /> (2 \pi)^{-\frac{d_1}{2} } {{!}}\Omega_1{{!}}^{-\frac{1}{2} }<br /> {\rm exp} \left\{ -\frac{1}{2} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)^\prime} \Omega_1^{-1} ( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)} \right\},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> : with $( h(\psi_i) - \mmodel(\bbeta,c_i) )^{(1)}$ the same as $( h(\psi_i) - \mmodel(\bbeta,c_i) )$ but with the $I_0$ entries removed.<br /> <br /> <br /> * There exist other situations where $\Omega$ is not positive-definite. This is the case for instance when two random effects are equal: $\eta_{i,\iparam} \equiv \eta_{i,\iparam^\prime}$. For them, we can calculate a joint distribution <br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \pmacro(\psi_{i,\iparam}, \psi_{i,\iparam^\prime},\eta_{i,\iparam}; \bbeta_{\iparam},\bbeta_{\iparam^\prime},\omega^2_\iparam ,c_i ) = \pmacro(\psi_{i,\iparam} {{!}} \eta_{i,\iparam}; \bbeta_{\iparam}, c_i ) \<br /> \pmacro(\psi_{i,\iparam^\prime} {{!}} \eta_{i,\iparam};\bbeta_{\iparam^\prime} , c_i ) \<br /> \pmacro(\eta_{i,\iparam} ; \omega^2_\iparam) , &lt;/math&gt; }}<br /> <br /> : where<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pmacro(\psi_{i,\iparam}{{!}} \eta_{i,\iparam}; \bbeta_{\iparam}, c_i ) &amp;=&amp; \delta_{\{h(\psi_{i,\iparam})=\mmodel_{\iparam}(\bbeta_{\iparam},c_i)+\eta_{i,\iparam} \} } \\<br /> \pmacro( \psi_{i,\iparam^\prime} {{!}} \eta_{i,\iparam};\bbeta_{\iparam^\prime} , c_i ) &amp;=&amp; \delta_{\{h(\psi_{i,\iparam^\prime})=\mmodel_{\iparam^\prime}(\bbeta_{\iparam^\prime},c_i)+\eta_{i,\iparam} \} } \\<br /> \pmacro(\eta_{i,\iparam} ; \omega^2_\iparam) &amp;=&amp; \displaystyle{ \frac{ 1}{\sqrt{2 \, \pi \omega_\iparam^2 } } }\ \exp\left\{-\displaystyle{ \frac{\eta_{i,\iparam}^2}{2 \omega_\iparam^2} }\right\}.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> &lt;/ul&gt;<br /> <br /> <br /> All kinds of combinations are possible, including parameters with and without variability, algebraic relationships between random effects, etc. In all possible cases it is possible to find an adequate decomposition that lets us characterize a pdf. This pdf turns out to play a fundamental role for tasks such as population parameter estimation with maximum likelihood, where we start with the observations $\by = (y_i , 1\leq i \leq N)$ and the individual parameters $(\psi_i)$ are not observed.<br /> <br /> <br /> &lt;br&gt;<br /> <br /> &lt;!--<br /> == $\mlxtran$ for the covariance model ==<br /> <br /> {{ExampleWithCode<br /> |title1=Example 1:<br /> |title2=<br /> |text= TO DO<br /> |equation=<br /> |code =<br /> }}<br /> --&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Model with covariates<br /> |linkNext=Additional levels of variability }}</div> Admin http://wiki.webpopix.org/index.php/Gaussian_models Gaussian models 2013-06-07T13:28:33Z <p>Admin : /* Extensions of the normal distribution */</p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The normal distribution ==<br /> <br /> Gaussian models have several advantages, including the capacity of describing with ease both the predicted value of a random variable and its fluctuations around this value. Indeed, if we consider a Gaussian random variable $\psi$ with mean $\mu$ and standard deviation $\omega$, we can work with two entirely equivalent mathematical representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian1&quot;&gt;&lt;math&gt; \begin{eqnarray}<br /> \psi &amp;\sim&amp; {\cal N}(\mu , \omega^2) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian2&quot;&gt;&lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; \mu + \eta, \quad {\rm where }\ \quad \ \eta \sim {\cal N}(0,\omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> The form [[#indiv_gaussian1|(1)]] provides an explicit description of the distribution of $\psi$ from which we can deduce the pdf and other characteristics such as the median, mode and quantiles. The figure below shows the pdf of a normal distribution with mean $\mu$ and standard deviation $\omega$. <br /> Each vertical band contains 10% of the distribution.<br /> <br /> <br /> :{{ImageWithCaption|image=Ndistrib.png|caption=The ${\cal N}(\mu,\omega^2)$ distribution}}<br /> <br /> <br /> This type of graphical representation is powerful and helps us to better visualize the types of values the random variable can take and those values that are more likely than others.<br /> <br /> Examples of normal distributions with various parameters are shown in the next figure.<br /> <br /> <br /> {{ImageWithCaption|image=distrib1.png|caption=Normal distributions}}<br /> <br /> <br /> Representation [[#indiv_gaussian2|(2)]] lets us separate the random and non-random components of $\psi$. If we define as the predicted value the value obtained in the absence of randomness ($\eta=0$), we get that $\hat{\psi}=\mu$. In the particular case of a normal distribution, this predicted value is the mean, median and mode of $\psi$. We can therefore rewrite equations [[#indiv_gaussian1|(1)]] and [[#indiv_gaussian2|(2)]] using $\hpsi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;\sim&amp; {\cal N}(\hpsi , \omega^2) \\<br /> \psi &amp;=&amp; \hpsi + \eta, \quad {\rm where } \quad \ \ \eta \sim {\cal N}(0,\omega^2) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions of the normal distribution == <br /> <br /> Clearly, not all distributions are Gaussian. To begin with, the normal distribution has the support $\Rset$, unlike many parameters that take values in precise ranges; some variables take only positive values (e.g., concentrations and volumes) and others are restricted to bounded intervals (e.g., bioavailability).<br /> <br /> Furthermore, the [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distribution] is symmetric, which is not a property shared by all distributions. One way to extend the use of [http://en.wikipedia.org/wiki/Gaussian_distribution Gaussian distributions] is to consider that some transform of the parameters we are interested in is Gaussian,<br /> i.e., assume the existence of a monotonic function $h$ such that $h(\psi)$ is normally distributed. Then, there exists some $\mu$ and $\omega$ such that $h(\psi) \sim {\cal N}(\mu , \omega^2)$.<br /> <br /> For a given transformation $h$, we can parametrize using $\hat{\psi}$, the predicted value of $\psi$. Indeed, the predicted value of $h(\psi)$ is $\mu=h(\hat{\psi})$, and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi) &amp;\sim&amp; {\cal N}(h(\hat{\psi}) , \omega^2) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian4&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi) &amp;=&amp; h(\hat{\psi}) + \eta , \quad {\rm where } \quad \ \eta \sim {\cal N}(0,\omega^2). <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is possible to derive the pdf of $\psi$ from [[#indiv_gaussian3|(4)]]:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian5&quot;&gt;&lt;math&gt;<br /> \ppsi(\psi)=\displaystyle{ \frac{h^\prime(\psi)}{\sqrt{2 \pi \omega^2} } } \ \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi) - h(\hpsi))^2 \right\}. &lt;/math&gt;&lt;/div&gt; <br /> |reference=(5) }}<br /> <br /> Let us now see some examples of transformed normal pdfs:<br /> <br /> <br /> &lt;br&gt;<br /> ===Log-normal distribution===<br /> <br /> The log-normal distribution is widely used for describing the distribution of PK/PD parameters. This choice is usually justified by the fact that it ensures non-negative values, and rarely because it is shown to properly describe the population distribution of the parameter of interest.<br /> <br /> Let $\psi$ be a log-normally distributed random variable with parameters $(\mu,\omega)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log(\psi) \sim {\cal N}( \mu, \omega). &lt;/math&gt; }}<br /> <br /> This distribution can be also parameterized with $(m,\omega)$, where $m = \mu = \hat{\psi}$. Then, $\log(\psi) \sim {\cal N}( \log(m), \omega)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \ppsi(\psi)=\displaystyle{ \frac{1}{\psi \, \sqrt{2 \pi \omega^2} } }\ \exp\left\{- \displaystyle{\frac{1}{2 \, \omega^2} (\log(\psi) - \log(m))^2} \right\}.<br /> &lt;/math&gt; }}<br /> <br /> We display below some log-normal pdfs obtained with different parameters $(m,\omega)$.<br /> <br /> <br /> {{ImageWithCaption|image=distrib2.png|caption=Log-normal distributions}}<br /> <br /> <br /> We see that for a given standard deviation $\omega$, the pdfs obtained for different $m$ are simply rescaled.<br /> &lt;!-- {{Equation1|equation=&lt;math&gt; f_{\alpha m,\omega}(x) = \frac{f_{m,\omega}(x/\alpha)}{\alpha} &lt;/math&gt; }} --&gt;<br /> On the other hand, for a given $m$ the asymmetry of the distribution increases when the standard deviation $\omega$ increases.<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> Note that the log-normal distribution takes its values in $(0,+\infty)$. It is straightforward to define a rescaled distribution in $(a,+\infty)$ by shifting it:<br /> <br /> {{Equation1<br /> |equation= <br /> &lt;math&gt;\begin{eqnarray}<br /> \log(\psi-a) &amp;\sim&amp; {\cal N}( \log(m-a), \omega^2).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Power-normal (or Box-Cox) distribution===<br /> <br /> <br /> This is the distribution of a random variable $\psi$ for which the Box-Cox transformation of $\psi$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> h(\psi) = \displaystyle{ \frac{\psi^\lambda -1}{\lambda} }<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> (with $\lambda &gt; 0$) follows a normal distribution ${\cal N}( \mu, \omega^2)$ truncated such that $h(\psi)&gt;0$. It therefore takes its values in $(0,+\infty)$.<br /> The distribution converges to the log-normal distribution when $\lambda \to 0$ and a truncated normal distribution when $\lambda \to 1$.<br /> The main interest of a power-normal distribution is its ability to represent a distribution &quot;between&quot; the log-normal distribution and the normal distribution.<br /> <br /> Here, $m = \hat{\psi} = (\lambda \mu + 1)^{1/\lambda}$.<br /> We display below several power-normal pdfs obtained with various parameter sets $(\lambda,m,\omega)$.<br /> <br /> <br /> {{ImageWithCaption|image=distrib3.png|caption=Power-normal distributions }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Logit-normal and probit-normal distributions.===<br /> <br /> A random variable $\psi$ with a logit-normal distribution takes its values in $(0,1)$. The logit of $\psi$ is normally distributed, i.e.,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(\psi) &amp;= &amp;\log \left(\displaystyle{ \frac{\psi}{1-\psi} }\right) \<br /> \sim \ \ {\cal N}( \mu, \omega^2) \\<br /> m &amp;=&amp; \displaystyle{ \frac{1}{1+e^{-\mu} } }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This means that $\mu=\logit(m)$.<br /> <br /> A random variable $\psi$ with a probit-normal distribution also takes its values in $(0,1)$. Then, the &lt;balloon title=&quot;The probit function is the inverse cumulative distribution function (quantile function) 1/&amp;Phi; associated with the standard normal distribution N(0,1).&quot; style=&quot;color:#177245&quot;&gt;probit&lt;/balloon&gt; of $\psi$ is normally distributed:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probit(\psi) &amp;= &amp;\Phi^{-1}(\psi) \<br /> \sim \ {\cal N}( \mu, \omega^2) \\<br /> m &amp;=&amp; \Phi(\mu).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This means that $\mu=\probit(m)$.<br /> <br /> We can see in the figures below that the pdfs of the logit and probit distributions with the same $m$ and well-chosen $\omega$ are very similar.<br /> Thus, these two distributions can be used interchangeably for modeling the distribution of a parameter that takes its values in $(0,1)$.<br /> <br /> <br /> {{ImageWithCaption|image=distribution4.png|caption=Logit-normal and probit-normal distributions }}<br /> <br /> <br /> Logit and probit transformations can be generalized to any interval $(a,b)$ by setting<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \psi = a + (b-a)\tilde{\psi}, &lt;/math&gt; }}<br /> <br /> where $\tilde{\psi}$ is a random variable that takes its values in $(0,1)$ with a logit (or probit) distribution.<br /> <br /> Furthermore, it is easy to show that the probit-normal distribution with $m=0.5$ and $\omega=1$ is the uniform distribution on $(0,1)$.<br /> Thus, any uniform distribution can easily be derived from the probit-normal distribution.<br /> <br /> <br /> &lt;br&gt;<br /> === Extension to transformed Student's $t$-distributions ===<br /> <br /> These extensions (log-$t$, power-$t$, etc.) can be obtained simply by replacing the normal distribution of the random effects with a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student $t$-distribution]. Such extensions can be useful for modeling heavy-tailed distributions.<br /> Several [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distributions] with different degrees of freedom (d.f.) are displayed below. The [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] converges to the normal distribution as the d.f. increases, whereas heavy tails are obtained for small d.f.<br /> <br /> <br /> {{ImageWithCaption|image=student.png|caption=Standardized normal and Student's $t$ probability distribution functions }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == $\mlxtran$ for the Gaussian model== <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example<br /> |title2=<br /> |text=<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(F_i) &amp;\sim&amp; {\cal N}(\logit(F_{\rm pop}), \omega_F^2) \\<br /> \log(ka_i) &amp;\sim&amp; {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br /> V_i &amp;\sim&amp; {\cal N}(V_{\rm pop}, \omega_V^2) \\<br /> \displaystyle{\frac{Cl_i^{\lambda_{Cl} } - 1}{\lambda_{Cl} } } &amp;\sim&amp; {\cal N}(\frac{Cl_{\rm pop}^{\lambda_{Cl} } - 1}{\lambda_{Cl} }, \omega_{Cl}^2) <br /> \end{eqnarray}&lt;/math&gt; <br /> |code= <br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={F_pop, ka_pop, V_pop, Cl_pop, lambda_Cl, <br /> omega_F, omega_ka, omega_V, omega_Cl}<br /> <br /> DEFINITION:<br /> F = {distribution=logitnormal,reference=F_pop,sd=omega_F}<br /> ka = {distribution=lognormal,reference=ka_pop,sd=omega_ka}<br /> V = {distribution=normal,reference=V_pop,sd=omega_V}<br /> Cl = {distribution=powernormal,<br /> reference=Cl_pop,power=lambda_Cl,sd=omega_Cl}<br /> &lt;/pre&gt; }}<br /> <br /> }}<br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the individual parameters<br /> |linkNext=Model with covariates }}</div> Admin http://wiki.webpopix.org/index.php/Gaussian_models Gaussian models 2013-06-07T13:27:30Z <p>Admin : /* Extension to transformed Student's $t$-distributions */</p> <hr /> <div>&lt;!-- Menu for the Individual Parameters chapter --&gt;<br /> &lt;sidebarmenu&gt;<br /> +[[Modeling the individual parameters]]<br /> *[[Modeling the individual parameters| Introduction ]] | [[Gaussian models]] | [[Model with covariates]] | [[Extension to multivariate distributions]] | [[Additional levels of variability]] <br /> &lt;/sidebarmenu&gt;<br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> == The normal distribution ==<br /> <br /> Gaussian models have several advantages, including the capacity of describing with ease both the predicted value of a random variable and its fluctuations around this value. Indeed, if we consider a Gaussian random variable $\psi$ with mean $\mu$ and standard deviation $\omega$, we can work with two entirely equivalent mathematical representations:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian1&quot;&gt;&lt;math&gt; \begin{eqnarray}<br /> \psi &amp;\sim&amp; {\cal N}(\mu , \omega^2) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian2&quot;&gt;&lt;math&gt; \begin{eqnarray}<br /> \psi &amp;=&amp; \mu + \eta, \quad {\rm where }\ \quad \ \eta \sim {\cal N}(0,\omega^2) .<br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(2) }}<br /> <br /> The form [[#indiv_gaussian1|(1)]] provides an explicit description of the distribution of $\psi$ from which we can deduce the pdf and other characteristics such as the median, mode and quantiles. The figure below shows the pdf of a normal distribution with mean $\mu$ and standard deviation $\omega$. <br /> Each vertical band contains 10% of the distribution.<br /> <br /> <br /> :{{ImageWithCaption|image=Ndistrib.png|caption=The ${\cal N}(\mu,\omega^2)$ distribution}}<br /> <br /> <br /> This type of graphical representation is powerful and helps us to better visualize the types of values the random variable can take and those values that are more likely than others.<br /> <br /> Examples of normal distributions with various parameters are shown in the next figure.<br /> <br /> <br /> {{ImageWithCaption|image=distrib1.png|caption=Normal distributions}}<br /> <br /> <br /> Representation [[#indiv_gaussian2|(2)]] lets us separate the random and non-random components of $\psi$. If we define as the predicted value the value obtained in the absence of randomness ($\eta=0$), we get that $\hat{\psi}=\mu$. In the particular case of a normal distribution, this predicted value is the mean, median and mode of $\psi$. We can therefore rewrite equations [[#indiv_gaussian1|(1)]] and [[#indiv_gaussian2|(2)]] using $\hpsi$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;\sim&amp; {\cal N}(\hpsi , \omega^2) \\<br /> \psi &amp;=&amp; \hpsi + \eta, \quad {\rm where } \quad \ \ \eta \sim {\cal N}(0,\omega^2) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Extensions of the normal distribution == <br /> <br /> Clearly, not all distributions are Gaussian. To begin with, the normal distribution has the support $\Rset$, unlike many parameters that take values in precise ranges; some variables take only positive values (e.g., concentrations and volumes) and others are restricted to bounded intervals (e.g., bioavailability).<br /> <br /> Furthermore, the Gaussian distribution is symmetric, which is not a property shared by all distributions. One way to extend the use of Gaussian distributions is to consider that some transform of the parameters we are interested in is Gaussian,<br /> i.e., assume the existence of a monotonic function $h$ such that $h(\psi)$ is normally distributed. Then, there exists some $\mu$ and $\omega$ such that $h(\psi) \sim {\cal N}(\mu , \omega^2)$.<br /> <br /> For a given transformation $h$, we can parametrize using $\hat{\psi}$, the predicted value of $\psi$. Indeed, the predicted value of $h(\psi)$ is $\mu=h(\hat{\psi})$, and<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian3&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi) &amp;\sim&amp; {\cal N}(h(\hat{\psi}) , \omega^2) <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(3) }}<br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian4&quot;&gt;&lt;math&gt;\begin{eqnarray}<br /> h(\psi) &amp;=&amp; h(\hat{\psi}) + \eta , \quad {\rm where } \quad \ \eta \sim {\cal N}(0,\omega^2). <br /> \end{eqnarray}&lt;/math&gt;&lt;/div&gt;<br /> |reference=(4) }}<br /> <br /> It is possible to derive the pdf of $\psi$ from [[#indiv_gaussian3|(4)]]:<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;indiv_gaussian5&quot;&gt;&lt;math&gt;<br /> \ppsi(\psi)=\displaystyle{ \frac{h^\prime(\psi)}{\sqrt{2 \pi \omega^2} } } \ \exp\left\{-\displaystyle{ \frac{1}{2 \, \omega^2} } (h(\psi) - h(\hpsi))^2 \right\}. &lt;/math&gt;&lt;/div&gt; <br /> |reference=(5) }}<br /> <br /> Let us now see some examples of transformed normal pdfs:<br /> <br /> <br /> &lt;br&gt;<br /> ===Log-normal distribution===<br /> <br /> The log-normal distribution is widely used for describing the distribution of PK/PD parameters. This choice is usually justified by the fact that it ensures non-negative values, and rarely because it is shown to properly describe the population distribution of the parameter of interest.<br /> <br /> Let $\psi$ be a log-normally distributed random variable with parameters $(\mu,\omega)$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\log(\psi) \sim {\cal N}( \mu, \omega). &lt;/math&gt; }}<br /> <br /> This distribution can be also parameterized with $(m,\omega)$, where $m = \mu = \hat{\psi}$. Then, $\log(\psi) \sim {\cal N}( \log(m), \omega)$ and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \ppsi(\psi)=\displaystyle{ \frac{1}{\psi \, \sqrt{2 \pi \omega^2} } }\ \exp\left\{- \displaystyle{\frac{1}{2 \, \omega^2} (\log(\psi) - \log(m))^2} \right\}.<br /> &lt;/math&gt; }}<br /> <br /> We display below some log-normal pdfs obtained with different parameters $(m,\omega)$.<br /> <br /> <br /> {{ImageWithCaption|image=distrib2.png|caption=Log-normal distributions}}<br /> <br /> <br /> We see that for a given standard deviation $\omega$, the pdfs obtained for different $m$ are simply rescaled.<br /> &lt;!-- {{Equation1|equation=&lt;math&gt; f_{\alpha m,\omega}(x) = \frac{f_{m,\omega}(x/\alpha)}{\alpha} &lt;/math&gt; }} --&gt;<br /> On the other hand, for a given $m$ the asymmetry of the distribution increases when the standard deviation $\omega$ increases.<br /> <br /> <br /> <br /> {{Remarks<br /> |title=Remarks<br /> |text=<br /> Note that the log-normal distribution takes its values in $(0,+\infty)$. It is straightforward to define a rescaled distribution in $(a,+\infty)$ by shifting it:<br /> <br /> {{Equation1<br /> |equation= <br /> &lt;math&gt;\begin{eqnarray}<br /> \log(\psi-a) &amp;\sim&amp; {\cal N}( \log(m-a), \omega^2).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> ===Power-normal (or Box-Cox) distribution===<br /> <br /> <br /> This is the distribution of a random variable $\psi$ for which the Box-Cox transformation of $\psi$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> h(\psi) = \displaystyle{ \frac{\psi^\lambda -1}{\lambda} }<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> (with $\lambda &gt; 0$) follows a normal distribution ${\cal N}( \mu, \omega^2)$ truncated such that $h(\psi)&gt;0$. It therefore takes its values in $(0,+\infty)$.<br /> The distribution converges to the log-normal distribution when $\lambda \to 0$ and a truncated normal distribution when $\lambda \to 1$.<br /> The main interest of a power-normal distribution is its ability to represent a distribution &quot;between&quot; the log-normal distribution and the normal distribution.<br /> <br /> Here, $m = \hat{\psi} = (\lambda \mu + 1)^{1/\lambda}$.<br /> We display below several power-normal pdfs obtained with various parameter sets $(\lambda,m,\omega)$.<br /> <br /> <br /> {{ImageWithCaption|image=distrib3.png|caption=Power-normal distributions }}<br /> <br /> <br /> &lt;br&gt;<br /> ===Logit-normal and probit-normal distributions.===<br /> <br /> A random variable $\psi$ with a logit-normal distribution takes its values in $(0,1)$. The logit of $\psi$ is normally distributed, i.e.,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(\psi) &amp;= &amp;\log \left(\displaystyle{ \frac{\psi}{1-\psi} }\right) \<br /> \sim \ \ {\cal N}( \mu, \omega^2) \\<br /> m &amp;=&amp; \displaystyle{ \frac{1}{1+e^{-\mu} } }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This means that $\mu=\logit(m)$.<br /> <br /> A random variable $\psi$ with a probit-normal distribution also takes its values in $(0,1)$. Then, the &lt;balloon title=&quot;The probit function is the inverse cumulative distribution function (quantile function) 1/&amp;Phi; associated with the standard normal distribution N(0,1).&quot; style=&quot;color:#177245&quot;&gt;probit&lt;/balloon&gt; of $\psi$ is normally distributed:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probit(\psi) &amp;= &amp;\Phi^{-1}(\psi) \<br /> \sim \ {\cal N}( \mu, \omega^2) \\<br /> m &amp;=&amp; \Phi(\mu).<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> This means that $\mu=\probit(m)$.<br /> <br /> We can see in the figures below that the pdfs of the logit and probit distributions with the same $m$ and well-chosen $\omega$ are very similar.<br /> Thus, these two distributions can be used interchangeably for modeling the distribution of a parameter that takes its values in $(0,1)$.<br /> <br /> <br /> {{ImageWithCaption|image=distribution4.png|caption=Logit-normal and probit-normal distributions }}<br /> <br /> <br /> Logit and probit transformations can be generalized to any interval $(a,b)$ by setting<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \psi = a + (b-a)\tilde{\psi}, &lt;/math&gt; }}<br /> <br /> where $\tilde{\psi}$ is a random variable that takes its values in $(0,1)$ with a logit (or probit) distribution.<br /> <br /> Furthermore, it is easy to show that the probit-normal distribution with $m=0.5$ and $\omega=1$ is the uniform distribution on $(0,1)$.<br /> Thus, any uniform distribution can easily be derived from the probit-normal distribution.<br /> <br /> <br /> &lt;br&gt;<br /> === Extension to transformed Student's $t$-distributions ===<br /> <br /> These extensions (log-$t$, power-$t$, etc.) can be obtained simply by replacing the normal distribution of the random effects with a [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student $t$-distribution]. Such extensions can be useful for modeling heavy-tailed distributions.<br /> Several [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distributions] with different degrees of freedom (d.f.) are displayed below. The [http://en.wikipedia.org/wiki/Student%27s_t-distribution Student's $t$-distribution] converges to the normal distribution as the d.f. increases, whereas heavy tails are obtained for small d.f.<br /> <br /> <br /> {{ImageWithCaption|image=student.png|caption=Standardized normal and Student's $t$ probability distribution functions }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == $\mlxtran$ for the Gaussian model== <br /> <br /> <br /> {{ExampleWithCode<br /> |title1=Example<br /> |title2=<br /> |text=<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \logit(F_i) &amp;\sim&amp; {\cal N}(\logit(F_{\rm pop}), \omega_F^2) \\<br /> \log(ka_i) &amp;\sim&amp; {\cal N}(\log(ka_{\rm pop}), \omega_{ka}^2) \\<br /> V_i &amp;\sim&amp; {\cal N}(V_{\rm pop}, \omega_V^2) \\<br /> \displaystyle{\frac{Cl_i^{\lambda_{Cl} } - 1}{\lambda_{Cl} } } &amp;\sim&amp; {\cal N}(\frac{Cl_{\rm pop}^{\lambda_{Cl} } - 1}{\lambda_{Cl} }, \omega_{Cl}^2) <br /> \end{eqnarray}&lt;/math&gt; <br /> |code= <br /> {{MLXTranForTable<br /> |name=<br /> |text=<br /> &lt;pre style=&quot;background-color: #EFEFEF; border:none&quot;&gt;<br /> [INDIVIDUAL]<br /> input={F_pop, ka_pop, V_pop, Cl_pop, lambda_Cl, <br /> omega_F, omega_ka, omega_V, omega_Cl}<br /> <br /> DEFINITION:<br /> F = {distribution=logitnormal,reference=F_pop,sd=omega_F}<br /> ka = {distribution=lognormal,reference=ka_pop,sd=omega_ka}<br /> V = {distribution=normal,reference=V_pop,sd=omega_V}<br /> Cl = {distribution=powernormal,<br /> reference=Cl_pop,power=lambda_Cl,sd=omega_Cl}<br /> &lt;/pre&gt; }}<br /> <br /> }}<br /> <br /> {{Back&amp;Next<br /> |linkBack=Modeling the individual parameters<br /> |linkNext=Model with covariates }}</div> Admin http://wiki.webpopix.org/index.php/Mod%C3%A8le:Rcode Modèle:Rcode 2013-06-07T13:05:50Z <p>Admin : </p> <hr /> <div>&lt;div class=&quot;noprint&quot; style=&quot; background-color:#EFEFEF; border: 1px solid darkgray; border-radius:1em; margin-left:5%; margin-right:15%&quot;&gt;<br /> :&lt;div style=&quot;margin-top:1em&quot;&gt;[[Image:Rstudio.png|33px|left|top]]&lt;/div&gt;<br /> &lt;div style=&quot;text-align: left; padding-left: 4em; font-family:'courier new'; font-size:14pt; font-weight:bold; color: #007FFF; margin-top:1em&quot;&gt;R &lt;span style=&quot;font-size:11pt;font-weight:normal&quot;&gt; {{{name|{{{1}}}}}}&lt;/span&gt;&lt;/div&gt;<br /> &lt;br&gt;<br /> &lt;div style=&quot;text-align: left; padding-left: 1em; font-family:'courier new';font-size:10pt;margin-bottom:1em&quot;&gt;{{{code|{{{2}}}}}}&lt;/div&gt;<br /> &lt;/div&gt;<br /> &lt;noinclude&gt;</div> Admin http://wiki.webpopix.org/index.php/Overview Overview 2013-06-07T13:04:42Z <p>Admin : </p> <hr /> <div>The desire to model a biological or physical phenomenon often arises when we are able to record some observations issued from that phenomenon. Nothing would be more natural therefore than to begin this introduction by looking at some observed data.<br /> <br /> <br /> {{ExampleWithImage<br /> |text= This first plot displays the viral load of four patients with hepatitis C who started a treatment at time $t=0$.<br /> |image = NEWintro1.png<br /> }} <br /> <br /> <br /> {{ExampleWithImage<br /> |text=This second example involves weight data for rats measured over 14 weeks, for a sub-chronic toxicity study related to the question of genetically modified corn.<br /> |image = NEWintro2.png}}<br /> <br /> <br /> {{ExampleWithImage<br /> |text= In this third example, data are fluorescence intensities measured over time in a cellular biology experiment.<br /> |image=NEWintro3.png }}<br /> <br /> <br /> {{ExampleWithImage<br /> |text= Note that repeated measurements are not necessarily always functions of time.<br /> For example, we may be interested in corn production as a function of fertilizer quantity.<br /> |image= NEWintro4.png}}<br /> <br /> <br /> Even though these examples come from quite different domains, in each case the data is made up of repeated measurements on several individuals from a population. What we will call a &quot;population approach&quot; is therefore relevant for characterizing and modeling this data. The modeling goal is thus twofold: characterize the biological or physical phenomena observed for each individual, and secondly, the variability seen between individuals.<br /> <br /> In the example with the rats, the model needs to integrate a growth model that describes how a rat's weight increases with time, and a statistical model that describes why these kinetics can vary from one rat to another. The goal is thus to finish with a &quot;typical&quot; curve for the population (in red) and to be able to explain the variability in the individual's curves (in green) around this population curve.<br /> <br /> <br /> ::[[File:NEWintro5.png]]<br /> <br /> <br /> The model will explain some of this variability by individual covariates such as sex or diet (rats 1 and 3 are male while rats 2 and 4 are female), but some of the variability will remain unexplained and will be considered as random. Integrating into the same model effects considered fixed and others considered random leads naturally to the use of mixed-effects models.<br /> <br /> An alternative yet equivalent approach considers this model as a hierarchical one: each curve is described by a single model, and the variability between individual models is described by a population model. In the case of parametric models, this means that the observations for a given individual are described by a model of the observations that depends on a vector of individual parameters: this is the classic individual approach. The population approach is then a direct extension of [[The individual approach|the individual approach]]: we add a component to the model that describes the variability of the individual parameters within the population.<br /> <br /> A model can thus be seen as a [[What is a model? A joint probability distribution! | joint probability distribution]], which can easily be extended to the case where other variables in the model are considered as random variables: covariates, population parameters, the design, etc. The hierarchical structure of the model leads to a natural decomposition of the joint distribution into a product of conditional and marginal distributions.<br /> <br /> Models for [[Modeling the individual parameters |individual parameters]] and models for [[Modeling the observations | observations]] are described in the [[Introduction_%26_notation|Models]] chapter. In particular, models for [[Continuous data models|continuous observations]], [[Model for categorical data|categorical data]], [[Models for count data|count data]] and [[ Models for time-to-event data | survival data]] are presented and illustrated by various examples. Extensions for [[ Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]] and [[Stochastic differential equations based models| stochastic differential equation based models]] are also presented.<br /> <br /> The Tasks &amp; Tools chapter presents practical examples of using these models: [[Visualization|exploration and visualization]], [[Estimation|estimation]], [[Model evaluation#Model diagnostics|model diagnostics]], [[Model evaluation#Model selection|model selection]] and [[Simulation|simulation]]. All approaches and proposed methods are rigorously detailed in the [[Introduction and notation|Methods]] chapter.<br /> <br /> The main purpose of a model is to be used. Mathematical modeling and statistics remain useful tools for many disciplines (biology, agronomy, environmental studies, pharmacology, etc.), but it is important that these tools are used properly. The various software packages used in this wiki have been developed with this in mind: they serve the modeler well, while fully complying with a coherent mathematical formalism and using well-known and theoretically justified methods.<br /> <br /> Tools for model exploration ($\mlxplore$), modeling ($\monolix$) and simulation ($\simulix$) use the same model coding language $\mlxtran$. This allows us to define a complete workflow using the same model implementation, i.e., to run several different tasks based on the same model.<br /> <br /> $\mlxtran$ is extremely flexible and well-adapted to implementing complex mixed-effects models.<br /> With $\mlxtran$ we can easily write ODE-based models, implement [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic models]] with complex administration schedules, include inter-individual variability in parameters, define statistical models for covariates, etc.<br /> Another crucial property of $\mlxtran$ is that it rigorously adopts the model representation formalism proposed in $\wikipopix$. In other words, the model implementation is fully consistent with its mathematical representation.<br /> <br /> $\mlxplore$ provides a clear graphical interface that allows us to visualize not only the structural model but also the statistical model, which is of fundamental importance in the population approach. We can visualize for instance the impact of covariates and inter-individual variability of model parameters on predictions. Then, $\mlxplore$ is an ideal tool for teaching or discovering what is a [[Introduction_to_PK_modeling_using_MLXPlore_-_Part_I|pharmacokinetic model]] for example.<br /> <br /> The algorithms implemented in $\monolix$ ([http://en.wikipedia.org/wiki/Stochastic_approximation Stochastic Approximation] of EM, [http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo MCMC], [http://en.wikipedia.org/wiki/Simulated_Annealing Simulated Annealing], [http://en.wikipedia.org/wiki/Importance_sampling Importance Sampling], etc.) are extremely efficient for a wide variety of complex models. Furthermore, convergence of [[The SAEM algorithm for estimating population parameters|SAEM]] and its extensions ([[Mixture models|mixture models]], [[Hidden Markov models|hidden Markov models]], [[Stochastic differential equations based models|SDE-based models]], censored data, etc.) has been rigorously proved and published in statistical journals.<br /> <br /> $\simulix$ is a model computation engine which enables us to simulate a $\mlxtran$ model from within various environments. $\simulix$ is now available for the Matlab and R platforms, allowing any user to combine the flexibility of R and Matlab scripts with the power of $\mlxtran$ in order to easily encode complex models and simulate data.<br /> <br /> For these reasons, $\wikipopix$ and these tools can be used with confidence for training and teaching. This is even more the case because $\mlxplore$, $\monolix$ and $\simulix$ are free for academic research and education purposes.<br /> <br /> <br /> {{Next<br /> |link=The individual approach }}</div> Admin http://wiki.webpopix.org/index.php/Estimation Estimation 2013-06-07T12:59:47Z <p>Admin : </p> <hr /> <div>== Introduction ==<br /> <br /> In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br /> <br /> Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br /> <br /> Estimation tasks are common ones seen in statistics:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.&lt;/li&gt;<br /> <br /> &lt;li&gt;Evaluate the precision of the proposed estimates.&lt;/li&gt;<br /> <br /> &lt;li&gt;Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. &lt;/li&gt;<br /> <br /> &lt;li&gt;Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Maximum likelihood estimation of the population parameters== <br /> <br /> &lt;br&gt;<br /> === Definitions ===<br /> <br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \like(\theta ; \by) &amp;\eqdef&amp; \py(\by ; \theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> &lt;blockquote&gt;<br /> * A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. &lt;br&gt;&lt;br&gt;<br /> * Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). &lt;br&gt;&lt;br&gt;<br /> * An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: &lt;br&gt;&lt;br&gt;<br /> ** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br /> &lt;/blockquote&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br /> <br /> To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br /> <br /> <br /> * First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> f(t) = h^\prime(t)f_h(h(t)) . &lt;/math&gt; }}<br /> <br /> : Thus,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br /> &lt;/math&gt; }}<br /> <br /> : By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Now we show that it is the median. Since $h$ is a strictly increasing function,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &amp;=&amp; \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br /> &amp;=&amp; 0.5 .<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> : In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Example ===<br /> <br /> Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> ke&amp;=&amp;Cl/V \\<br /> Cc(t) &amp;=&amp; \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br /> <br /> <br /> {{MLXTran<br /> |name=joint1est_model.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {ka, V, Cl, h0, gamma}<br /> <br /> EQUATION:<br /> ke=Cl/V<br /> Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br /> h = h0*exp(gamma*Cc)<br /> <br /> OBSERVATION:<br /> Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br /> Hemorrhaging = {type=event, hazard=h}<br /> <br /> OUTPUT:<br /> output = {Concentration, Hemorrhaging}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br /> <br /> The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXTranForTable<br /> |name=<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INDIVIDUAL:<br /> ka = {distribution=logNormal, iiv=yes}<br /> V = {distribution=logNormal, iiv=yes},<br /> Cl = {distribution=normal, iiv=yes},<br /> h0 = {distribution=probitNormal, iiv=yes},<br /> gamma = {distribution=logitNormal, iiv=yes},<br /> &lt;/pre&gt; }}<br /> |image=<br /> [[File:Vsaem1.png]]<br /> }}<br /> <br /> <br /> Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br /> it is possible to perform a preliminary sensitivity analysis in order to select &quot;good&quot; initial values.<br /> <br /> <br /> {{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br /> <br /> <br /> <br /> Then, when we run SAEM, it converges easily and quickly to the MLE:<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter<br /> ka : 0.974<br /> V : 7.07<br /> Cl : 2.00<br /> h0 : 0.0102<br /> gamma : 0.485<br /> <br /> omega_ka : 0.668<br /> omega_V : 0.365<br /> omega_Cl : 0.588<br /> omega_h0 : 0.105<br /> omega_gamma : 0.0901<br /> <br /> a_1 : 0.345<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br /> <br /> In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br /> <br /> Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text=<br /> For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; \psi_{\rm pop} e^{\eta} \\<br /> &amp;\approx &amp; \psi_{\rm pop}(1+ \eta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\psi} &amp;\approx&amp; \psi_{\rm pop} \\<br /> \std{\psi} &amp;\approx &amp; \psi_{\rm pop}\omega_{\psi},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\rm cv}(\psi) &amp;=&amp; \frac{\std{\psi} }{\esp{\psi} } \\<br /> &amp;\approx &amp; \omega_{\psi} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br /> }}<br /> <br /> <br /> {{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bayesian estimation of the population parameters==<br /> <br /> The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ) &amp;=&amp; \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br /> &amp;=&amp; \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}^{\rm MAP} &amp;=&amp; \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br /> &amp;=&amp; \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br /> &lt;/math&gt; }}<br /> <br /> The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br /> <br /> Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br /> <br /> We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br /> <br /> All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br /> <br /> Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br /> <br /> $\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br /> <br /> <br /> &lt;ul&gt;<br /> * a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br /> &lt;br&gt;<br /> <br /> * estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and calculating the MAP estimate.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and then estimating the posterior distribution.<br /> &lt;/ul&gt;<br /> <br /> <br /> We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> (\hat{\theta}_E , \hat{\theta}_{M} ) &amp;=&amp; \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br /> <br /> <br /> &lt;li&gt; ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: &lt;/li&gt;<br /> <br /> <br /> &lt;ol style=&quot;list-style-type:lower-roman&quot;&gt;<br /> &lt;li&gt; Compute the maximum likelihood of $\theta_E$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}_E &amp;=&amp; \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. &lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br /> &lt;/ol&gt;<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK example<br /> |text=<br /> In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . &lt;/math&gt; }}<br /> <br /> $\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br /> <br /> <br /> &lt;div style=&quot;margin-left:17%; margin-right:17%; align:center&quot;&gt;<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:100%&quot;<br /> {{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br /> {{!}}-<br /> {{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br /> {{!}}}&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br /> <br /> <br /> As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> == Estimation of the Fisher information matrix ==<br /> <br /> The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (s.a.) r.s.e.(%)<br /> ka : 0.974 0.082 8<br /> V : 7.07 0.35 5<br /> Cl : 2 0.07 4<br /> h0 : 0.0102 0.0014 14<br /> gamma : 0.485 0.015 3<br /> <br /> omega_ka : 0.668 0.064 10<br /> omega_V : 0.365 0.037 10<br /> omega_Cl : 0.588 0.055 9<br /> omega_h0 : 0.105 0.032 30<br /> omega_gamma : 0.0901 0.044 49<br /> <br /> a_1 : 0.345 0.012 3<br /> &lt;/pre&gt; }}<br /> <br /> The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (lin) r.s.e.(%)<br /> ka : 0.246 0.0081 3<br /> Cl : 1.9 0.075 4<br /> V1 : 1.71 0.14 8<br /> Q : 0.000171 0.024 1.43e+04<br /> V2 : 0.00673 3.1 4.62e+04<br /> <br /> omega_ka : 0.171 0.026 15<br /> omega_Cl : 0.293 0.026 9<br /> omega_V1 : 0.621 0.062 10<br /> omega_Q : 5.72 1.4e+03 2.41e+04<br /> omega_V2 : 4.61 1.8e+04 3.94e+05<br /> <br /> a : 0.136 0.0073 5<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br /> <br /> &lt;br&gt;<br /> == Estimation of the individual parameters ==<br /> <br /> Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br /> <br /> The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br /> <br /> The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the &quot;most likely&quot; values of the individual parameters are the most suited for computing the &quot;most likely&quot; predictions.<br /> <br /> <br /> {{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br /> <br /> &lt;br&gt;<br /> <br /> == Estimation of the observed log-likelihood ==<br /> <br /> <br /> Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> {\llike} (\hat{\theta};\by) &amp;=&amp; \log({\like}(\hat{\theta};\by)) \\<br /> &amp;\eqdef&amp; \log(\py(\by;\hat{\theta})) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Evaluation]] Section).<br /> <br /> <br /> &lt;br&gt;<br /> == Bibliography ==<br /> <br /> &lt;bibtex&gt;<br /> @article{Monolix,<br /> author = {Lixoft},<br /> title = {Monolix 4.2},<br /> year={2012}<br /> journal = {http://www.lixoft.eu/products/monolix/product-monolix-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{comets2011package,<br /> title={saemix: Stochastic Approximation Expectation Maximization (SAEM) algorithm. R package version 0.96.1},<br /> author={Comets, E. and Lavenu, A. and Lavielle, M.},<br /> journal = {http://cran.r-project.org/web/packages/saemix/index.html},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{nlmefitsa,<br /> title={nlmefitsa: fit nonlinear mixed-effects model with stochastic EM algorithm. Matlab R2013a function},<br /> author={The MathWorks},<br /> journal = {http://www.mathworks.fr/fr/help/stats/nlmefitsa.html},<br /> year={2013}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{beal1992nonmem,<br /> title={NONMEM users guides},<br /> author={Beal, S.L. and Sheiner, L.B. and Boeckmann, A. and Bauer, R.J.},<br /> journal={San Francisco, NONMEM Project Group, University of California},<br /> year={1992}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @book{pinheiro2000mixed,<br /> title={Mixed effects models in S and S-PLUS},<br /> author={Pinheiro, J.C. and Bates, D.M.},<br /> year={2000},<br /> publisher={Springer Verlag}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{pinheiro2010r,<br /> title={the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-96},<br /> author={Pinheiro, J. and Bates, D. and DebRoy, S. and Sarkar, D.},<br /> journal={R Foundation for Statistical Computing, Vienna},<br /> year={2010}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @article{spiegelhalter2003winbugs,<br /> title={WinBUGS user manual},<br /> author={Spiegelhalter, D. and Thomas, A. and Best, N. and Lunn, D.},<br /> journal={Cambridge: MRC Biostatistics Unit},<br /> year={2003}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @Manual{docSPSS,<br /> title = {Linear mixed-effects modeling in SPSS. An introduction to the MIXED procedure},<br /> author = {SPSS},<br /> year = {2002},<br /> note={Technical Report}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> &lt;bibtex&gt;<br /> @Manual{docSAS,<br /> title = {The NLMMIXED procedure, SAS/STAT 9.2 User's Guide},<br /> chapter = {61},<br /> pages = {4337--4435},<br /> author = {SAS},<br /> year = {2008}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> {{Back&amp;Next<br /> |linkBack=Visualization<br /> |linkNext=Model evaluation }}</div> Admin http://wiki.webpopix.org/index.php/Estimation Estimation 2013-06-07T12:58:39Z <p>Admin : /* Estimating the observed log-likelihood */</p> <hr /> <div>== Introduction ==<br /> <br /> In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br /> <br /> Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br /> <br /> Estimation tasks are common ones seen in statistics:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.&lt;/li&gt;<br /> <br /> &lt;li&gt;Evaluate the precision of the proposed estimates.&lt;/li&gt;<br /> <br /> &lt;li&gt;Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. &lt;/li&gt;<br /> <br /> &lt;li&gt;Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Maximum likelihood estimation of the population parameters== <br /> <br /> &lt;br&gt;<br /> === Definitions ===<br /> <br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \like(\theta ; \by) &amp;\eqdef&amp; \py(\by ; \theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> &lt;blockquote&gt;<br /> * A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. &lt;br&gt;&lt;br&gt;<br /> * Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). &lt;br&gt;&lt;br&gt;<br /> * An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: &lt;br&gt;&lt;br&gt;<br /> ** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br /> &lt;/blockquote&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br /> <br /> To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br /> <br /> <br /> * First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> f(t) = h^\prime(t)f_h(h(t)) . &lt;/math&gt; }}<br /> <br /> : Thus,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br /> &lt;/math&gt; }}<br /> <br /> : By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Now we show that it is the median. Since $h$ is a strictly increasing function,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &amp;=&amp; \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br /> &amp;=&amp; 0.5 .<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> : In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Example ===<br /> <br /> Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> ke&amp;=&amp;Cl/V \\<br /> Cc(t) &amp;=&amp; \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br /> <br /> <br /> {{MLXTran<br /> |name=joint1est_model.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {ka, V, Cl, h0, gamma}<br /> <br /> EQUATION:<br /> ke=Cl/V<br /> Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br /> h = h0*exp(gamma*Cc)<br /> <br /> OBSERVATION:<br /> Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br /> Hemorrhaging = {type=event, hazard=h}<br /> <br /> OUTPUT:<br /> output = {Concentration, Hemorrhaging}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br /> <br /> The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXTranForTable<br /> |name=<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INDIVIDUAL:<br /> ka = {distribution=logNormal, iiv=yes}<br /> V = {distribution=logNormal, iiv=yes},<br /> Cl = {distribution=normal, iiv=yes},<br /> h0 = {distribution=probitNormal, iiv=yes},<br /> gamma = {distribution=logitNormal, iiv=yes},<br /> &lt;/pre&gt; }}<br /> |image=<br /> [[File:Vsaem1.png]]<br /> }}<br /> <br /> <br /> Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br /> it is possible to perform a preliminary sensitivity analysis in order to select &quot;good&quot; initial values.<br /> <br /> <br /> {{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br /> <br /> <br /> <br /> Then, when we run SAEM, it converges easily and quickly to the MLE:<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter<br /> ka : 0.974<br /> V : 7.07<br /> Cl : 2.00<br /> h0 : 0.0102<br /> gamma : 0.485<br /> <br /> omega_ka : 0.668<br /> omega_V : 0.365<br /> omega_Cl : 0.588<br /> omega_h0 : 0.105<br /> omega_gamma : 0.0901<br /> <br /> a_1 : 0.345<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br /> <br /> In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br /> <br /> Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text=<br /> For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; \psi_{\rm pop} e^{\eta} \\<br /> &amp;\approx &amp; \psi_{\rm pop}(1+ \eta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\psi} &amp;\approx&amp; \psi_{\rm pop} \\<br /> \std{\psi} &amp;\approx &amp; \psi_{\rm pop}\omega_{\psi},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\rm cv}(\psi) &amp;=&amp; \frac{\std{\psi} }{\esp{\psi} } \\<br /> &amp;\approx &amp; \omega_{\psi} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br /> }}<br /> <br /> <br /> {{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bayesian estimation==<br /> <br /> The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ) &amp;=&amp; \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br /> &amp;=&amp; \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}^{\rm MAP} &amp;=&amp; \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br /> &amp;=&amp; \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br /> &lt;/math&gt; }}<br /> <br /> The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br /> <br /> Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br /> <br /> We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br /> <br /> All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br /> <br /> Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br /> <br /> $\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br /> <br /> <br /> &lt;ul&gt;<br /> * a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br /> &lt;br&gt;<br /> <br /> * estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and calculating the MAP estimate.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and then estimating the posterior distribution.<br /> &lt;/ul&gt;<br /> <br /> <br /> We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> (\hat{\theta}_E , \hat{\theta}_{M} ) &amp;=&amp; \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br /> <br /> <br /> &lt;li&gt; ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: &lt;/li&gt;<br /> <br /> <br /> &lt;ol style=&quot;list-style-type:lower-roman&quot;&gt;<br /> &lt;li&gt; Compute the maximum likelihood of $\theta_E$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}_E &amp;=&amp; \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. &lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br /> &lt;/ol&gt;<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK example<br /> |text=<br /> In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . &lt;/math&gt; }}<br /> <br /> $\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br /> <br /> <br /> &lt;div style=&quot;margin-left:17%; margin-right:17%; align:center&quot;&gt;<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:100%&quot;<br /> {{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br /> {{!}}-<br /> {{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br /> {{!}}}&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br /> <br /> <br /> As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> === Estimating the Fisher information matrix ===<br /> <br /> The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (s.a.) r.s.e.(%)<br /> ka : 0.974 0.082 8<br /> V : 7.07 0.35 5<br /> Cl : 2 0.07 4<br /> h0 : 0.0102 0.0014 14<br /> gamma : 0.485 0.015 3<br /> <br /> omega_ka : 0.668 0.064 10<br /> omega_V : 0.365 0.037 10<br /> omega_Cl : 0.588 0.055 9<br /> omega_h0 : 0.105 0.032 30<br /> omega_gamma : 0.0901 0.044 49<br /> <br /> a_1 : 0.345 0.012 3<br /> &lt;/pre&gt; }}<br /> <br /> The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model to the same data as before. The output is shown below. The large values for the relative standard errors for the inter-compartmental clearance $Q$ and the volume of the peripheral compartment $V_2$ mean that the data does not allow us to estimate well these two parameters.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (lin) r.s.e.(%)<br /> ka : 0.246 0.0081 3<br /> Cl : 1.9 0.075 4<br /> V1 : 1.71 0.14 8<br /> Q : 0.000171 0.024 1.43e+04<br /> V2 : 0.00673 3.1 4.62e+04<br /> <br /> omega_ka : 0.171 0.026 15<br /> omega_Cl : 0.293 0.026 9<br /> omega_V1 : 0.621 0.062 10<br /> omega_Q : 5.72 1.4e+03 2.41e+04<br /> omega_V2 : 4.61 1.8e+04 3.94e+05<br /> <br /> a : 0.136 0.0073 5<br /> &lt;/pre&gt; }}<br /> <br /> <br /> The Fisher information criteria is also widely used in optimal experimental design. Indeed, minimizing the variance of the estimator corresponds to maximizing the information. Then, estimators and designs can be evaluated by looking at certain summary statistics of the covariance matrix (like the determinant or trace for instance).<br /> <br /> &lt;br&gt;<br /> === Estimating the individual parameters ===<br /> <br /> Once $\theta$ has been estimated, the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ of the individual parameters $\psi_i$ can be estimated for each individual $i$ using the [[The Metropolis-Hastings algorithm for simulating the individual parameters| Metropolis-Hastings algorithm]]. For each $i$, this algorithm generates a sequence $(\psi_i^{k}, k \geq 1)$ which converges in distribution to the conditional distribution $\pmacro(\psi_i | y_i ; \hat{\theta})$ and that can be used for estimating any summary statistic of this distribution (mean, standard deviation, quantiles, etc.).<br /> <br /> The mode of this conditional distribution can be estimated using this sequence or by maximizing $\pmacro(\psi_i | y_i ; \hat{\theta})$ using numerical methods.<br /> <br /> The choice of using the conditional mean or the conditional mode is arbitrary. By default, $\monolix$ uses the conditional mode, taking the philosophy that the &quot;most likely&quot; values of the individual parameters are the most suited for computing the &quot;most likely&quot; predictions.<br /> <br /> <br /> {{ImageWithCaption|image=mode1.png|caption=Predicted concentrations for 6 individuals using the estimated conditional modes of the individual PK parameters}} <br /> <br /> &lt;br&gt;<br /> <br /> === Estimating the observed log-likelihood ===<br /> <br /> <br /> Once $\theta$ has been estimated, the observed log-likelihood of $\hat{\theta}$ is defined as<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \begin{eqnarray}<br /> {\llike} (\hat{\theta};\by) &amp;=&amp; \log({\like}(\hat{\theta};\by)) \\<br /> &amp;\eqdef&amp; \log(\py(\by;\hat{\theta})) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The observed log-likelihood cannot be computed in closed form for nonlinear mixed effects models, but can be estimated using the methods described in the [[Estimation of the log-likelihood]] Section. The estimated log-likelihood can then be used for performing likelihood ratio tests and for computing information criteria such as AIC and BIC (see the [[Model evaluation]] Section).<br /> <br /> == Bibliography ==<br /> <br /> <br /> &lt;bibtex&gt;<br /> @ARTICLE{popixplore,<br /> author = {POPIX Inria team},<br /> title = {Popixplore 1.0},<br /> url = {https://wiki.inria.fr/wikis/popix/images/7/71/Popixplore_1.1.zip},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @ARTICLE{MLXplore,<br /> author = {Lixoft},<br /> title = {MLXPlore 1.0},<br /> url = {http://www.lixoft.eu/products/mlxplore/mlxplore-overview},<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @article{macey2000berkeley,<br /> title={Berkeley Madonna user’s guide},<br /> author={Macey, R. and Oster, G. and Zahnley, T.},<br /> journal={Berkeley (CA): University of California},<br /> year={2000}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{chatterjee2009sensitivity,<br /> title={Sensitivity analysis in linear regression},<br /> author={Chatterjee, S. and Hadi, A. S.},<br /> volume={327},<br /> year={2009},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{sensibilité2013,<br /> title={Analyse de sensibilité et exploration de modèles},<br /> author={Faivre R. and Looss B. and Mah&amp;eacute;vas, S. and Makowski, D. and Monod, H.},<br /> year={2013},<br /> publisher={Editions Quae}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2000sensitivity,<br /> title={Sensitivity analysis},<br /> author={Saltelli, A. and Chan, K. and Scott, E. M. and others},<br /> volume={134},<br /> year={2000},<br /> publisher={Wiley New York}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2008global,<br /> title={Global sensitivity analysis: the primer},<br /> author={Saltelli, A. and Ratto, M. and Andres, T. and Campolongo, F. and Cariboni, J. and Gatelli, D. and Saisana, M. and Tarantola, S.},<br /> year={2008},<br /> publisher={Wiley-Interscience}<br /> }<br /> &lt;/bibtex&gt;<br /> &lt;bibtex&gt;<br /> @book{saltelli2004sensitivity,<br /> title={Sensitivity analysis in practice: a guide to assessing scientific models},<br /> author={Saltelli, A. and Tarantola, S. and Campolongo, F. and Ratto, M.},<br /> year={2004},<br /> publisher={Wiley}<br /> }<br /> &lt;/bibtex&gt;<br /> <br /> <br /> <br /> {{Next<br /> |link=Modeling}}</div> Admin http://wiki.webpopix.org/index.php/Estimation Estimation 2013-06-07T12:58:21Z <p>Admin : /* Bibliography */</p> <hr /> <div>== Introduction ==<br /> <br /> In the modeling context, we usually assume that we have data that includes observations $\by$, measurement times $\bt$ and possibly additional regression variables $\bx$. There may also be individual covariates $\bc$, and in pharmacological applications the dose regimen $\bu$. For clarity, in the following notation we will omit the design variables $\bt$, $\bx$ and $\bu$, and the covariates $\bc$.<br /> <br /> Here, we find ourselves in the classical framework of incomplete data models. Indeed, only $\by = (y_{ij})$ is observed in the joint model $\pypsi(\by,\bpsi;\theta)$.<br /> <br /> Estimation tasks are common ones seen in statistics:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; Estimate the population parameter $\theta$ using the available observations and possibly a priori information that is available.&lt;/li&gt;<br /> <br /> &lt;li&gt;Evaluate the precision of the proposed estimates.&lt;/li&gt;<br /> <br /> &lt;li&gt;Reconstruct missing data, here being the individual parameters $\bpsi=(\psi_i, 1\leq i \leq N)$. &lt;/li&gt;<br /> <br /> &lt;li&gt;Estimate the log-likelihood for a given model, i.e., for a given joint distribution $\qypsi$ and value of $\theta$.&lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> &lt;br&gt;<br /> <br /> == Maximum likelihood estimation of the population parameters== <br /> <br /> &lt;br&gt;<br /> === Definitions ===<br /> <br /> <br /> ''Maximum likelihood estimation'' consists of maximizing with respect to $\theta$ the ''observed likelihood'' defined by:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \like(\theta ; \by) &amp;\eqdef&amp; \py(\by ; \theta) \\<br /> &amp;=&amp; \int \pypsi(\by,\bpsi ;\theta) \, d \bpsi .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Maximum likelihood estimation of the population parameter $\theta$ requires:<br /> <br /> &lt;blockquote&gt;<br /> * A model, i.e., a joint distribution $\qypsi$. Depending on the software used, the model can be implemented using a script or a graphical user interface. $\monolix$ is extremely flexible and allows us to combine both. It is possible for instance to code the structural model using $\mlxtran$ and use the GUI for implementing the statistical model. Whatever the options selected, the complete model can always be saved as a text file. &lt;br&gt;&lt;br&gt;<br /> * Inputs $\by$, $\bc$, $\bu$ and $\bt$. All of these variables tend to be stored in a unique data file (see the [[Visualization#Data exploration | Data Exploration ]] Section). &lt;br&gt;&lt;br&gt;<br /> * An algorithm which allows us to maximize $\int \pypsi(\by,\bpsi ;\theta) \, d \bpsi$ with respect to $\theta$. Each software package has its own algorithms implemented. It is not our goal here to rate and compare the various algorithms and implementations. We will use exclusively the SAEM algorithm as described in [[The SAEM algorithm for estimating population parameters | The SAEM algorithm]] and implemented in $\monolix$ as we are entirely satisfied by both its theoretical and practical qualities: &lt;br&gt;&lt;br&gt;<br /> ** The algorithms implemented in $\monolix$ including SAEM and its extensions (mixture models, hidden Markov models, SDE-based model, censored data, etc.) have been published in statistical journals. Furthermore, convergence of SAEM has been rigorously proved.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ is extremely efficient for a wide variety of complex models.&lt;br&gt;&lt;br&gt;<br /> ** The SAEM implementation in $\monolix$ was done by the same group that proposed the algorithm and studied in detail its theoretical and practical properties.<br /> &lt;/blockquote&gt;<br /> <br /> <br /> {{Remarks<br /> |title=Remark<br /> |text= It is important to highlight the fact that for a parameter $\psi_i$ whose distribution is the tranformation of a normal one (log-normal, logit-normal, etc.) the MLE $\hat{\psi}_{\rm pop}$ of the reference parameter $\psi_{\rm pop}$ is neither the mean nor the mode of the distribution. It is in fact the median.<br /> <br /> To show why this is the case, let $h$ be a nonlinear, twice continuously derivable and strictly increasing function such that $h(\psi_i)$ is normally distributed.<br /> <br /> <br /> * First we show that it is not the mean. By definition, the MLE of $h(\psi_{\rm pop})$ is $h(\hat{\psi}_{\rm pop})$. Thus, the estimated distribution of $h(\psi_i)$ is the normal distribution with mean $h(\hat{\psi}_{\rm pop})$, but $\esp{h(\psi_i)} = h(\hat{\psi}_{\rm pop})$ implies that $\esp{\psi_i} \neq \hat{\psi}_{\rm pop}$ since $h$ is nonlinear. In other words, $\hat{\psi}_{\rm pop}$ is not the mean of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Next we show that it is not the mode. Let $f$ be the pdf of $\psi_i$ and let $f_h$ be the pdf of $h(\psi_i)$. By definition, for any $h(t)\in \mathbb{R}$,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> f(t) = h^\prime(t)f_h(h(t)) . &lt;/math&gt; }}<br /> <br /> : Thus,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; <br /> f^\prime(t) = h^{\prime \prime}(t)f_h(h(t)) + h^{\prime 2}(t)f_h^\prime(h(t)) .<br /> &lt;/math&gt; }}<br /> <br /> : By definition of the mode, $f_h^\prime(h(\hat{\psi}_{\rm pop}))=0$. Since $h$ is nonlinear, $h^{\prime \prime}(\hat{\psi}_{\rm pop})\neq 0$ a.s. and $f^\prime(\hat{\psi}_{\rm pop})\neq 0$ a.s.. In other words, $\hat{\psi}_{\rm pop}$ is not the mode of the estimated distribution of $\psi_i$.<br /> <br /> <br /> * Now we show that it is the median. Since $h$ is a strictly increasing function,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \probs{\hat{\psi}_{\rm pop} }{\psi_i \leq \hat{\psi}_{\rm pop} } &amp;=&amp; \probs{\hat{\psi}_{\rm pop} }{h(\psi_i) \leq h(\hat{\psi}_{\rm pop})} \\<br /> &amp;=&amp; 0.5 .<br /> \end{eqnarray}&lt;/math&gt; }} <br /> <br /> : In other words, $\hat{\psi}_{\rm pop}$ is the median of the estimated distribution of $\psi_i$.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> <br /> === Example ===<br /> <br /> Let us again look at the model used in the [[Visualization#Model exploration | Model Visualization]] Section. For the case of a unique dose $D$ given at time $t=0$, the structural model is written:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> ke&amp;=&amp;Cl/V \\<br /> Cc(t) &amp;=&amp; \displaystyle{\frac{D \, ka}{V(ka-ke)} }\left(e^{-ke\,t} - e^{-ka\,t} \right) \\<br /> h(t) &amp;=&amp; h_0 \, \exp(\gamma\, Cc(t)) ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where $Cc$ is the concentration in the central compartment and $h$ the hazard function for the event of interest (hemorrhaging). Supposing a constant error model for the concentration, the model for the observations can be easily implemented using $\mlxtran$.<br /> <br /> <br /> {{MLXTran<br /> |name=joint1est_model.txt<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INPUT:<br /> parameter = {ka, V, Cl, h0, gamma}<br /> <br /> EQUATION:<br /> ke=Cl/V<br /> Cc = amtDose*ka/(V*(ka-ke))*(exp(-ke*t) - exp(-ka*t))<br /> h = h0*exp(gamma*Cc)<br /> <br /> OBSERVATION:<br /> Concentration = {type=continuous, prediction=Cc, errorModel=constant}<br /> Hemorrhaging = {type=event, hazard=h}<br /> <br /> OUTPUT:<br /> output = {Concentration, Hemorrhaging}<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Here, {{Verbatim|amtDose}} is a reserved keyword for the last administered dose.<br /> <br /> The model's parameters are the absorption rate constant $ka$, the volume of distribution $V$, the clearance $Cl$, the baseline hazard $h_0$ and the coefficient $\gamma$. The statistical model for the individual parameters can be defined in the $\monolix$ project file (left) and/or the $\monolix$ GUI (right):<br /> <br /> <br /> {{ExampleWithCode&amp;Image<br /> |title=<br /> |text=<br /> |code={{MLXTranForTable<br /> |name=<br /> |text=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt; <br /> INDIVIDUAL:<br /> ka = {distribution=logNormal, iiv=yes}<br /> V = {distribution=logNormal, iiv=yes},<br /> Cl = {distribution=normal, iiv=yes},<br /> h0 = {distribution=probitNormal, iiv=yes},<br /> gamma = {distribution=logitNormal, iiv=yes},<br /> &lt;/pre&gt; }}<br /> |image=<br /> [[File:Vsaem1.png]]<br /> }}<br /> <br /> <br /> Once the model is implemented, tasks such as maximum likelihood estimation can be performed using the SAEM algorithm. Certain settings in SAEM must be provided by the user. Even though SAEM is quite insensitive to the initial parameter values,<br /> it is possible to perform a preliminary sensitivity analysis in order to select &quot;good&quot; initial values.<br /> <br /> <br /> {{ImageWithCaption|image=Vsaem2.png|caption=Looking for good initial values for SAEM}}<br /> <br /> <br /> <br /> Then, when we run SAEM, it converges easily and quickly to the MLE:<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter<br /> ka : 0.974<br /> V : 7.07<br /> Cl : 2.00<br /> h0 : 0.0102<br /> gamma : 0.485<br /> <br /> omega_ka : 0.668<br /> omega_V : 0.365<br /> omega_Cl : 0.588<br /> omega_h0 : 0.105<br /> omega_gamma : 0.0901<br /> <br /> a_1 : 0.345<br /> &lt;/pre&gt; }}<br /> <br /> <br /> Parameter estimation can therefore be seen as estimating the reference values and variance of the random effects.<br /> <br /> In addition to these numbers, it is important to be able to graphically represent these distributions in order to see them and therefore understand them better. In effect, the interpretation of certain parameters is not always simple. Of course, we know what a normal distribution represents and in particular its mean, median and mode, which are equal (see the distribution of $Cl$ below for instance). These measures of central tendency can be different among themselves for other asymmetric distributions such as the log-normal (see the distribution of $ka$).<br /> <br /> Interpreting dispersion terms like $\omega_{ka}$ and $\omega_{V}$ is not obvious either when the parameter distributions are not normal. In such cases, quartiles or quantiles of order 5% and 95% (for example) may be useful for quantitively describing the variability of these parameters.<br /> <br /> <br /> {{Remarks <br /> |title=Remarks<br /> |text=<br /> For a parameter $\psi$ whose distribution is log-normal, we can approximate the coefficient of variation for $\psi$ by the standard deviation $\omega_{\psi}$ of the random effect $\eta$ if this is fairly small. In effect, when $\omega_{\psi}$ is small,<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \psi &amp;=&amp; \psi_{\rm pop} e^{\eta} \\<br /> &amp;\approx &amp; \psi_{\rm pop}(1+ \eta) .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Thus<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \esp{\psi} &amp;\approx&amp; \psi_{\rm pop} \\<br /> \std{\psi} &amp;\approx &amp; \psi_{\rm pop}\omega_{\psi},<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> and<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> {\rm cv}(\psi) &amp;=&amp; \frac{\std{\psi} }{\esp{\psi} } \\<br /> &amp;\approx &amp; \omega_{\psi} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> Do not forget that this approximation is only valid when $\omega$ is small and in the case of log-normal distributions. It does not carry over to any other distribution. Thus, when $\omega_{h0}=0.1$ for a probit-normal distribution or $\omega_{\gamma}=0.09$ for a logit-normal one, there is no immediate interpretation available. Only by looking at the graphical display of the pdf or by calculating some quantiles of interest can we begin to get an idea of dispersion in the parameters $h0$ and $\gamma$.<br /> }}<br /> <br /> <br /> {{ImageWithCaption|image=saem3b.png|caption=Estimation of the population distributions of the individual parameters of the model }}<br /> <br /> <br /> <br /> &lt;br&gt;<br /> <br /> ==Bayesian estimation==<br /> <br /> The ''Bayesian approach'' considers $\theta$ as a random vector with a ''prior distribution'' $\qth$. We can then define the posterior distribution of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \pcthy(\theta {{!}} \by ) &amp;=&amp; \displaystyle{ \frac{\pth( \theta )\pcyth(\by {{!}} \theta )}{\py(\by)} }\\<br /> &amp;=&amp; \displaystyle{ \frac{\pth( \theta ) \int \pypsith(\by,\bpsi {{!}}\theta) \, d \bpsi}{\py(\by)} }.<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> We can estimate this conditional distribution and derive any statistics (posterior mean, standard deviation, percentiles, etc.) or derive the so-called ''Maximum a Posteriori'' (MAP) estimate of $\theta$:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}^{\rm MAP} &amp;=&amp; \argmax{\theta} \pcthy(\theta {{!}} \by ) \\<br /> &amp;=&amp; \argmax{\theta} \left\{ {\llike}(\theta ; \by) + \log( \pth( \theta ) ) \right\} .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> The MAP estimate therefore maximizes a penalized version of the observed likelihood. In other words, maximum a posteriori estimation reduces to penalized maximum likelihood estimation. Suppose for instance that $\theta$ is a scalar parameter and the prior is a normal distribution with mean $\theta_0$ and variance $\gamma^2$. Then, the MAP estimate minimizes<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;<br /> \hat{\theta}^{\rm MAP} =\argmax{\theta} \left\{ {\llike} (\theta ; \by) - \displaystyle{ \frac{1}{2\gamma^2} }(\theta - \theta_0)^2 \right\} .<br /> &lt;/math&gt; }}<br /> <br /> The MAP estimate is a trade-off between the MLE which maximizes ${\llike}(\theta ; \by)$ and $\theta_0$ which minimizes $(\theta - \theta_0)^2$. The weight given to the prior directly depends on the variance of the prior distribution: the smaller $\gamma^2$ is, the closer to $\theta_0$ the MAP is. The limiting distribution considers that $\gamma^2=0$: this prior means here that $\theta$ is fixed as $\theta_0$ and no longer needs to be estimated.<br /> <br /> Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and blindly following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.<br /> <br /> We have to remember that Bayesian methods have been extremely successful, in particular for numerical calculations. For instance, (Bayesian) MCMC methods allow us to estimate more or less any conditional distribution coming from any hierarchical model, whereas frequentist approaches such as maximum likelihood estimation can be much more difficult to implement.<br /> <br /> All things said, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: finding the right compromise between the confidence we have in the data and prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a pharmacokinetic trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model's PK parameters using the trial data. In this case, the modeler would be better served trying to use prior information based on physiological criteria rather than just a statistical model.<br /> <br /> Therefore, we can use information available to us, of course! Why not? But this information needs to be pertinent. Systematically using a prior for the parameters is not always meaningful. Can we reasonable suppose that we have access to such information? For continuous data for example, what does putting a prior on the residual error model's parameters mean in reality? A reasoned statistical approach consists of only including prior information for certain parameters (those for which we have real prior information) and having confidence in the data for the others.<br /> <br /> $\monolix$ allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be:<br /> <br /> <br /> &lt;ul&gt;<br /> * a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to identifiability constraints.<br /> &lt;br&gt;<br /> <br /> * estimated by maximum likelihood, either because we have great confidence in the data or have no information on the parameter.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and calculating the MAP estimate.<br /> &lt;br&gt;<br /> <br /> * estimated by introducing a prior and then estimating the posterior distribution.<br /> &lt;/ul&gt;<br /> <br /> <br /> We put aside dealing with the fixed components of $\theta$ in the following. Here are some possible situations:<br /> <br /> <br /> &lt;ol&gt;<br /> &lt;li&gt; ''Combined maximum likelihood and maximum a posteriori estimation'': decompose $\theta$ into $(\theta_E,\theta_{M})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{M}$ those with a prior distribution whose posterior distribution is to be maximized. Then, $(\hat{\theta}_E , \hat{\theta}_{M} )$ below maximizes the penalized likelihood of $(\theta_E,\theta_{M})$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> (\hat{\theta}_E , \hat{\theta}_{M} ) &amp;=&amp; \argmax{\theta_E , \theta_{M} } \log(\py(\by , \theta_{M}; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E , \theta_{M} } \left\{ {\llike}(\theta_E , \theta_{M}; \by) + \log( \pth( \theta_M ) ) \right\} ,<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> where ${\llike} (\theta_E , \theta_{M}; \by) \ \ \eqdef \ \ \log\left(\py(\by | \theta_{M}; \theta_E)\right).$<br /> <br /> <br /> &lt;li&gt; ''Combined maximum likelihood and posterior distribution estimation'': here, decompose $\theta$ into $(\theta_E,\theta_{R})$ where $\theta_E$ are the components of $\theta$ to be estimated with MLE and $\theta_{R}$ those with a prior distribution whose posterior distribution is to be estimated. We propose the following strategy for estimating $\theta_E$ and $\theta_{R}$: &lt;/li&gt;<br /> <br /> <br /> &lt;ol style=&quot;list-style-type:lower-roman&quot;&gt;<br /> &lt;li&gt; Compute the maximum likelihood of $\theta_E$: &lt;/li&gt;<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt;\begin{eqnarray}<br /> \hat{\theta}_E &amp;=&amp; \argmax{\theta_E} \log(\py(\by ; \theta_E)) \\<br /> &amp;=&amp; \argmax{\theta_E} \int \pmacro(\by , \theta_R ; \theta_E ) d \theta_R .<br /> \end{eqnarray}&lt;/math&gt; }}<br /> <br /> <br /> &lt;li&gt; Estimate the conditional distribution $\pmacro(\theta_{R} | \by ;\hat{\theta}_E)$. &lt;/li&gt;<br /> &lt;/ol&gt;<br /> <br /> <br /> It is then straightforward to extend this approach to more complex situations where some components of $\theta$ are estimated with MLE, others using MAP estimation and others still by estimating their conditional distributions.<br /> &lt;/ol&gt;<br /> <br /> <br /> {{Example1<br /> |title1=Example<br /> |title2=A PK example<br /> |text=<br /> In this example we use only the pharmacokinetic data and aim to estimate the population parameter distributions of the PK parameters $ka$, $V$ and $Cl$. We assume log-normal distributions for these three parameters. All of the model's population parameters are estimated by maximum likelihood estimation except $ka_{\rm pop}$ for which a log-normal distribution is used as a prior:<br /> <br /> {{Equation1<br /> |equation=&lt;math&gt; \log(ka_{\rm pop}) \sim {\cal N}(\log(1.5), \gamma^2) . &lt;/math&gt; }}<br /> <br /> $\monolix$ allows us to compute the MAP estimate and to estimate the posterior distribution of $ka_{\rm pop}$ for various values of $\gamma$.<br /> <br /> <br /> &lt;div style=&quot;margin-left:17%; margin-right:17%; align:center&quot;&gt;<br /> {{{!}} class=&quot;wikitable&quot; align=&quot;center&quot; style=&quot;width:100%&quot;<br /> {{!}} $\gamma$ {{!}}{{!}} 0 {{!}}{{!}} 0.01 {{!}}{{!}} 0.025 {{!}}{{!}} 0.05 {{!}}{{!}} 0.1 {{!}}{{!}} 0.2 {{!}}{{!}} $+ \infty$ <br /> {{!}}-<br /> {{!}}$\hat{ka}_{\rm pop}^{\rm MAP}$ {{!}}{{!}} 1.5 {{!}}{{!}} 1.49 {{!}}{{!}} 1.47 {{!}}{{!}} 1.39 {{!}}{{!}} 1.22 {{!}}{{!}} 1.11 {{!}}{{!}} 1.05 <br /> {{!}}}&lt;/div&gt;<br /> <br /> {{ImageWithCaption|image=bayes1.png|caption=Prior and posterior distributions of $ka_{\rm pop}$ for different values of $\gamma$}}<br /> <br /> <br /> As expected, the posterior distribution converges to the prior distribution when the standard deviation $\gamma$ of the prior distribution decreases. Also, the mode of the posterior distribution converges to the maximum likelihood estimate of $ka_{\rm pop}$ when $\gamma$ increases.<br /> }}<br /> <br /> <br /> &lt;br&gt;<br /> === Estimating the Fisher information matrix ===<br /> <br /> The variance of the estimator $\thmle$ and thus confidence intervals can be derived from the [[Estimation of the observed Fisher information matrix|observed Fisher information matrix (F.I.M.)]], which itself is calculated using the observed likelihood (i.e., the pdf of the observations $\by$):<br /> <br /> {{EquationWithRef<br /> |equation=&lt;div id=&quot;ofim_intro3&quot;&gt;&lt;math&gt;<br /> \ofim(\thmle ; \by) \ \ \eqdef \ \ - \displaystyle{ \frac{\partial^2}{\partial \theta^2} }\log({\like}(\thmle ; \by)) .<br /> &lt;/math&gt;&lt;/div&gt;<br /> |reference=(1) }}<br /> <br /> Then, the variance-covariance matrix of the maximum likelihood estimator $\thmle$ can be estimated by the inverse of the observed F.I.M. Standard errors (s.e.) for each component of $\thmle$ are their standard deviations, i.e., the square-root of the diagonal elements of this covariance matrix. $\monolix$ also displays the (estimated) relative standard errors (r.s.e.), i.e., the (estimated) standard error divided by the value of the estimated parameter.<br /> <br /> <br /> {{JustCode<br /> |code=&lt;pre style=&quot;background-color:#EFEFEF; border:none;&quot;&gt;Estimation of the population parameters<br /> <br /> parameter s.e. (s.a.) r.s.e.(%)<br /> ka : 0.974 0.082 8<br /> V : 7.07 0.35 5<br /> Cl : 2 0.07 4<br /> h0 : 0.0102 0.0014 14<br /> gamma : 0.485 0.015 3<br /> <br /> omega_ka : 0.668 0.064 10<br /> omega_V : 0.365 0.037 10<br /> omega_Cl : 0.588 0.055 9<br /> omega_h0 : 0.105 0.032 30<br /> omega_gamma : 0.0901 0.044 49<br /> <br /> a_1 : 0.345 0.012 3<br /> &lt;/pre&gt; }}<br /> <br /> The F.I.M. can be used for detecting overparametrization of the structural model. In effect, if the model is poorly identifiable, certain estimators will be quite correlated and the F.I.M. will therefore be poorly conditioned and difficult to inverse. Suppose for example that we want to fit a two compartment PK model