Continuous data models

De Popix
Aller à : Navigation, rechercher


The data

Continuous data is data that can take any real value within a given range. For instance, a concentration takes its values in $\Rset^+$, the log of the viral load in $\Rset$, an effect expressed as a percentage in $[0,100]$.

The data can be stored in a table and represented graphically. Here is some simple pharmacokinetics data involving four individuals.

Continuous graf0a 1.png
1 1.0 9.84
1 2.0 8.19
1 4.0 6.91
1 8.0 3.71
1 12.0 1.25
2 1.0 17.23
2 3.0 11.14
2 5.0 4.35
2 10.0 2.92
3 2.0 9.78
3 3.0 10.40
3 4.0 7.67
3 6.0 6.84
3 11.0 1.10
4 4.0 8.78
4 6.0 3.87
4 12.0 1.85

Instead of individual plots, we can plot them all together. Such a figure is usually called a spaghetti plot:

Continuous graf0b 1.png

The model

For continuous data, we are going to consider scalar outcomes ($y_{ij}\in \Yr \subset \Rset$) and assume the following general model:

\(y_{ij}=f(t_{ij},\psi_i)+ g(t_{ij},\psi_i)\teps_{ij}, \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i. \)

where $g(t_{ij},\psi_i)\geq 0$.

Here, the residual errors $(\teps_{ij})$ are standardized random variables (mean zero and standard deviation 1). In this case, it is clear that $f(t_{ij},\psi_i)$ and $g(t_{ij},\psi_i)$ are the mean and standard deviation of $y_{ij}$, i.e.,

\(\begin{eqnarray} \esp{y_{ij} | \psi_i} &=& f(t_{ij},\psi_i) \\ \std{y_{ij} | \psi_i} &=& g(t_{ij},\psi_i). \end{eqnarray}\)

The structural model

$f$ is known as the structural model and aims to describe the time evolution of the phenomena under study. For a given subject $i$ and vector of individual parameters $\psi_i$, $f(t_{ij},\psi_i)$ is the prediction of the observed variable at time $t_{ij}$. In other words, it is the value that would be measured at time $t_{ij}$ if there was no error ($\teps_{ij}=0$).

In the current example, we decide to model with the structural model $f=A\exp\left(-\alpha t \right)$. Here are some example curves for various combinations of $A$ and $\alpha$:

Continuous graf1bis.png

Other models involving more complicated dynamical systems can be imagined, such as those defined as solutions of systems of ordinary or partial differential equations. Real-life examples are found in the study of HIV, pharmacokinetics and tumor growth.

The residual error model

For a given structural model $f$, the conditional probability distribution of the observations $(y_{ij})$ is completely defined by the residual error model, i.e., the probability distribution of the residual errors $(\teps_{ij})$ and the standard deviation $g(x_{ij},\psi_i)$. The residual error model can take many forms. For example,

    • A constant error model assumes that $g(t_{ij},\psi_i)=a_i$. Model (1) then reduces to
    \(y_{ij}=f(t_{ij},\psi_i)+ a_i\teps_{ij}, \quad \quad \ 1\leq i \leq N \quad \ 1 \leq j \leq n_i. \)
    The figure below shows four simulated sequences of observations $(y_{ij}, 1\leq i \leq 4, 1\leq j \leq 10)$ with their respective structural model $f(t,\psi_i)$ in blue. Here, $a_i=2$ is the standard deviation of $y_{ij}$ for all $(i,j)$.

    Continuous graf2a1.png

    Let $\hat{y}_{ij}=f(t_{ij},\psi_i)$ be the prediction of $y_{ij}$ given by the model (2). The figure below shows for 50 individuals:

      -left: prediction errors $e_{ij}=y_{ij}-\hat{y}_{ij}$ vs. predictions $(\hat{y}_{ij})$. The pink line is the mean $\esp{e_{ij}}=0$; the green lines are $\pm$ 1 standard deviations: $[\std{e_{ij}} , +\std{e_{ij}}]$ where $\std{e_{ij}}=a_i=0.5$.

      -right: observations $(y_{ij})$ vs. predictions $(\hat{y}_{ij})$. The pink line is the identify $y=\hat{y}$, the green lines represent an interval of $\pm 1$ standard deviations around $\hat{y}$: $[\hat{y}-\std{e_{ij}} , \hat{y}+\std{e_{ij}}]$.

    Continuous graf2a2.png

    These figures are typical for constant error models. The standard deviation of the prediction errors does not depend on the value of the predictions $(\hat{y}_{ij})$, so both intervals have constant amplitude.

    • A proportional error model assumes that $g(t_{ij},\psi_i) =b_i f(t_{ij},\psi_i)$. Model (1) then becomes

    \( y_{ij}=f(t_{ij},\psi_i)(1 + b_i\teps_{ij}), \quad\ \quad 1\leq i \leq N, \quad \ 1 \leq j \leq n_i . \)
    The standard deviation of the prediction error $e_{ij}=y_{ij}-\hat{y}_{ij}$ is proportional to the prediction $\hat{y}_{ij}$. Therefore, the amplitude of the $\pm 1$ standard deviation intervals increases linearly with $f$:

    Continuous graf2b.png

    • A combined error model combines a constant and a proportional error model by assuming $g(t_{ij},\psi_i) =a_i + b_i f(t_{ij},\psi_i)$, where $a_1>0$ and $b_i>0$. The standard deviation of the prediction error $e_{ij}$ and thus the amplitude of the intervals are now affine functions of the prediction $\hat{y}_{ij}$:

    Continuous graf2c.png

    • Another alternative combined error model is $g(t_{ij},\psi_i) =\sqrt{a_i^2 + b_i^2 f^2(t_{ij},\psi_i)}$. This gives intervals that look fairly similar to the previous ones, though they are no longer affine.

    Continuous graf2d.png

Extension to autocorrelated errors

For any subject $i$, the residual errors $(\teps_{ij},1\leq j \leq n_i)$ are usually assumed to be independent random variables. Extension to autocorrelated errors is possible by assuming for instance that $(\teps_{ij})$ is a stationary ARMA (Autoregressive Moving Average) process. For example, an autoregressive process of order 1, AR(1), assumes that autocorrelation decreases exponentially:

\( {\rm corr}(\teps_{ij},\teps_{i\,{j+1} }) = \rho_i^{(t_{i\,j+1}-t_{ij})}. \)

where $0\leq \rho_i <1$ for each individual $i$. If we assume that $t_{ij}=j$ for any $(i,j)$. Then, $t_{i,j+1}-t_{i,j}=1$ and the autocorrelation function $\gamma$ is given by:

\( \begin{array} \gamma(\tau) &=& {\rm corr}(\teps_{ij},\teps_{i\,j+\tau}) \\ &= &\rho_i^{\tau} . \end{array}\)

The figure below displays 3 different sequences of residual errors simulated with 3 different autocorrelations $\rho_1=0.1$, $\rho_2=0.6$ and $\rho_3=0.95$. The autocorrelation functions $\gamma(\tau)$ are also displayed.


Distribution of the standardized residual errors

The distribution of the standardized residual errors $(\teps_{ij})$ is usually assumed to be the same for each individual $i$ and any observation time $t_{ij}$. Furthermore, for identifiability reasons it is also assumed to be symmetrical around 0, i.e., $\prob{\teps_{ij}<-u}=\prob{\teps_{ij}>u}$ for all $u\in \Rset$. Thus, for any $(i,j)$ the distribution of the observation $y_{ij}$ is also symmetrical around its prediction $f(t_{ij},\psi_i)$. This $f(t_{ij},\psi_i)$ is therefore both the mean and the median of the distribution of $y_{ij}$: $\esp{y_{ij}|\psi_i}=f(t_{ij},\psi_i)$ and $\prob{y_{ij}>f(t_{ij},\psi_i)} = \prob{y_{ij}<f(t_{ij},\psi_i)} = 1/2$. If we make the additional hypothesis that 0 is the mode of the distribution of $\teps_{ij}$, then $f(t_{ij},\psi_i)$ is also the mode of the distribution of $y_{ij}$.

A widely used bell-shaped distribution for modeling residual errors is the normal distribution. If we assume that $\teps_{ij}\sim {\cal N}(0,1)$, then $y_{ij}$ is also normally distributed: $ y_{ij}\sim {\cal N}(f(t_{ij},\bpsi_i),\, g(t_{ij},\bpsi_i))$.

Other distributions can be used, such as Student's $t$-distribution (also known simply as the $t$-distribution) which is also symmetric and bell-shaped but with heavier tails, meaning that it is more prone to producing values that fall far from its prediction.

Continuous graf4 bis.png

If we assume that $\teps_{ij}\sim t(\nu)$, then $y_{ij}$ has a non-standardized Student's $t$-distribution.

The conditional likelihood

The conditional likelihood for given observations $\by$ is defined as

\( {\like}(\bpsi; \by) \ \ \eqdef \ \ \pcypsi(\by | \bpsi), \)

where $\pcypsi(\by | \bpsi)$ is the conditional density function of the observations. If we assume that the residual errors $(\teps_{ij},\ 1\leq i \leq N,\ 1\leq j \leq n_i)$ are i.i.d., then this conditional density is straightforward to compute:

\( \begin{eqnarray}\pcypsi(\by | \bpsi ) & = & \prod_{i=1}^N \pcyipsii(\by_i | \psi_i ) \\ & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} | \bpsi_i ) \\ & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{\frac{1}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right) , \end{eqnarray} \)

where $\qeps$ is the pdf of the i.i.d. residual errors ($\teps_{ij}$).

For example, if we assume that the residual errors $\teps_{ij}$ are Gaussian random variables with mean 0 and variance 1, then $ \qeps(x) = e^{-{x^2}/{2}}/\sqrt{2 \pi}$, and

\( \begin{eqnarray} \pcypsi(\by | \psi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi} g(t_{ij},\psi_i)} }\, \exp\left\{-\frac{1}{2}\left(\frac{y_{ij} - f(t_{ij},\psi_i)}{g(t_{ij},\psi_i)}\right)^2\right\} . \end{eqnarray} \)

Transforming the data

The assumption that the distribution of any observation $y_{ij}$ is symmetrical around its predicted value is a very strong one. If this assumption does not hold, we may decide to transform the data to make it more symmetric around its (transformed) predicted value. In other cases, constraints on the values that observations can take may also lead us to want to transform the data.

Model (1) can be extended to include a transformation of the data:

\( \transy(y_{ij})=\transy(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} \)

where $\transy$ is a monotonic transformation (a strictly increasing or decreasing function). As you can see, both the data $y_{ij}$ and the structural model $f$ are transformed by the function $\transy$ so that $f(t_{ij},\psi_i)$ remains the prediction of $y_{ij}$.


1. If $y$ takes non-negative values, a log transformation can be used: $\transy(y) = \log(y)$. We can then present the model with one of two equivalent representations:

\( \begin{eqnarray} \log(y_{ij})&=&\log(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij}, \\ y_{ij}&=&f(t_{ij},\psi_i)\, e^{ \displaystyle{ -g(t_{ij},\psi_i)\teps_{ij} } }. \end{eqnarray}\)

Continuous graf5a.png

2. If $y$ takes its values between 0 and 1, a logit transformation can be used:

\( \begin{eqnarray} \logit(y_{ij})&=&\logit(f(t_{ij},\psi_i))+ g(t_{ij},\psi_i)\teps_{ij} , \\ y_{ij}&=& \displaystyle{\frac{ f(t_{ij},\bpsi_i) }{ f(t_{ij},\psi_i) + (1- f(t_{ij},\bpsi_i)) \, e^{ g(t_{ij},\psi_i)\teps_{ij} } } }. \end{eqnarray}\)

Continuous graf5b.png

3. The logit error model can be extended if the $y_{ij}$ are known to take their values in an interval $[A,B]$:

\( \begin{eqnarray} \transy(y_{ij})&=&\log((y_{ij}-A)/(B-y_{ij})), \\ y_{ij}&=&A+(B-A)\displaystyle{\frac{f(t_{ij},\psi_i)-A}{f(t_{ij},\psi_i)-A+(B-f(t_{ij},\psi_i)) e^{-g(t_{ij},\psi_i)\teps_{ij} } } }\, . \end{eqnarray}\)

Using the transformation proposed in (7), the conditional density $\pcypsi$ becomes

\( \begin{eqnarray} \pcypsi(\by | \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} | \psi_i ) \\ & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \transy^\prime(y_{ij}) \, \ptypsiij(\transy(y_{ij}) | \psi_i ) \\ & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{\transy^\prime(y_{ij})}{g(t_{ij},\psi_i)} } \, \qeps\left(\frac{\transy(y_{ij}) - \transy(f(t_{ij},\psi_i))}{g(t_{ij},\psi_i)}\right) \end{eqnarray} \)

For example, if the observations are log-normally distributed given the individual parameters ($\transy(y) = \log(y)$), with a constant error model ($g(t;\psi_i)=a$), then

\( \pcypsi(\by | \bpsi ) = \prod_{i=1}^N \prod_{j=1}^{n_i} \displaystyle{ \frac{1}{\sqrt{2 \pi a^2} \, y_{ij} } }\, \exp\left\{-\frac{1}{2 \, a^2}\left(\log(y_{ij}) - \log(f(t_{ij},\psi_i))\right)^2\right\}. \)

Censored data

Censoring occurs when the value of a measurement or observation is only partially known. For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken.

For example, in analytical chemistry, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from the absence of that substance. Therefore, any time the quantity is below the LLOD, the "measurement" is not a number but the information that the quantity is less than the LLOD.

Similarly, in pharmacokinetic studies, measurements of the concentration below a certain limit referred to as the lower limit of quantification (LLOQ) are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.

As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation $\repy_{ij}$ (i.e., what is reported) is the measurement $y_{ij}$ if not censored, and the type of censoring otherwise.

We usually distinguish three types of censoring: left, right and interval. We now introduce these, along with illustrative data sets.

  • Left censoring: a data point is below a certain value $L$ but it is not known by how much:

\( \repy_{ij} = \left\{ \begin{array}{c} y_{ij} & {\rm if } \ y_{ij} \geq L \\ y_{ij} < L & {\rm otherwise.} \end{array} \right. \)

In the figures below, the "data" below the limit $L=-0.30$, shown in gray, is not observed. The values are therefore not reported in the dataset. An additional column cens can be used to indicate if an observation is left-censored (cens=1) or not (cens=0). The column of observations log-VL displays the observed log-viral load when it is above the limit $L=-0.30$, and the limit $L=-0.30$ otherwise.

Continuous graf6a.png

ID TIME log-VL cens
1 1.0 0.26 0
1 2.0 0.02 0
1 3.0 -0.13 0
1 4.0 -0.13 0
1 5.0 -0.30 1
1 6.0 -0.30 1
1 7.0 -0.25 0
1 8.0 -0.30 1
1 9.0 -0.29 0
1 10.0 -0.30 1

  • Interval censoring: if a data point is in interval $I$, its exact value is not known:

\( \repy_{ij} = \left\{ \begin{array}{cc} y_{ij} & {\rm if } \ y_{ij}\notin I \\ y_{ij} \in I & {\rm otherwise.} \end{array} \right. \)

For example, suppose we are measuring a concentration which naturally only takes non-negative values, but again we cannot measure it below the level $L = 1$. Therefore, any data point $y_{ij}$ below $1$ will be recorded only as "$y_{ij} \in [0,1)$". In the table, an additional column llimit is required to indicate the lower bound of the censoring interval.

Continuous graf6b.png

ID TIME CONC. llimit cens
1 0.3 1.20 . 0
1 0.5 1.93 . 0
1 1.0 3.38 . 0
1 2.0 3.88 . 0
1 4.0 3.24 . 0
1 6.0 1.82 . 0
1 8.0 1.07 . 0
1 12.0 1.00 0.00 1
1 16.0 1.00 0.00 1
1 20.0 1.00 0.00 1

  • Right censoring: when a data point is above a certain value $U$, it is not known by how much:

\( \repy_{ij} = \left\{ \begin{array}{cc} y_{ij} & {\rm if } \ y_{ij}\leq U \\ y_{ij} > U & {\rm otherwise.} \end{array} \right. \)

Column cens is used to indicate if an observation is right-censored (cens=-1) or not (cens=0).

Continuous graf6c.png

1 2.0 1.85 0
1 7.0 2.40 0
1 12.0 3.27 0
1 17.0 3.28 0
1 22.0 3.62 0
1 27.0 3.02 0
1 32.0 3.80 -1
1 37.0 3.80 -1
1 42.0 3.80 -1
1 47.0 3.80 -1


  • Different censoring limits and intervals can be in play at different times and for different individuals.
  • Interval censoring covers any type of censoring, i.e., setting $I=(-\infty,L]$ for left censoring and $I=[U,+\infty)$ for right censoring.

The likelihood needs to be computed carefully in the presence of censored data. To cover all three types of censoring in one go, let $I_{ij}$ be the (finite or infinite) censoring interval existing for individual $i$ at time $t_{ij}$. Then,

\( \begin{eqnarray} \pcypsi(\brepy | \bpsi ) & = & \prod_{i=1}^N \prod_{j=1}^{n_i} \pypsiij(y_{ij} | \psi_i )^{\mathbf{1}_{y_{ij} \notin I_{ij} } } \, \prob{y_{ij} \in I_{ij} | \psi_i}^{\mathbf{1}_{y_{ij} \in I_{ij} } }. \end{eqnarray} \)


\( \prob{y_{ij} \in I_{ij} | \psi_i} = \int_{I_{ij} } \qypsiij(u | \psi_i )\, du \)

We see that if $y_{ij}$ is not censored (i.e., $ \mathbf{1}_{y_{ij} \notin I_{ij}} = 1$), the contribution to the likelihood is the usual $\pypsiij(y_{ij} | \psi_i )$, whereas if it is censored, the contribution is $\prob{y_{ij} \in I_{ij}|\psi_i}$.

Extensions to multidimensional continuous observations

    • Extension to multidimensional observations is straightforward. If $d$ outcomes are simultaneously measured at $t_{ij}$, then $y_{ij}$ is a now a vector in $\Rset^d$ and we can suppose that equation (1) still holds for each component of $y_{ij}$. Thus, for $1\leq m \leq d$,

    \( y_{ijm}=f_m(t_{ij},\psi_i)+ g_m(t_{ij},\psi_i)\teps_{ijm} , \ \ 1\leq i \leq N, \ \ 1 \leq j \leq n_i. \)

    It is then possible to introduce correlation between the components of each observation by assuming that $\teps_{ij} = (\teps_{ijm} , 1\leq m \leq d)$ is a random vector with mean 0 and correlation matrix $R_{\teps_{ij}}$.

    • Suppose instead that $K$ replicates of the same measurement are taken at time $t_{ij}$. Then, the model becomes, for $1 \leq k \leq K$,

    \( y_{ijk}=f(t_{ij},\psi_i)+ g(t_{ij},\bpsi_i)\teps_{ijk} ,\ \ 1\leq i \leq N, \ \ 1 \leq j \leq n_i . \)

    Following what can be done for decomposing random effects into inter-individual and inter-occasion components, we can decompose the residual error into inter-measurement and inter-replicate components:

    \( y_{ijk}=f(t_{ij},\psi_i)+ g_{I\!M}(t_{ij},\psi_i)\vari{\teps}{ij}{I\!M} + g_{I\!R}(x_{ij},\psi_i)\vari{\teps}{ijk}{I\!R} . \)


A model for continuous data is completely defined by:
  • The structural model $f$
  • The residual error model $g$
  • The probability distribution of the residual errors $(\teps_{ij})$
  • Possibly a transformation $\transy$ of the data

The model is associated with a design which includes:

- the observation times $(t_{ij})$

- possibly some additional regression variables $(x_{ij})$

- possibly the inputs $(u_i)$ (e.g., the dosing regimen for a PK model)

- possibly a censoring process $(I_{ij})$

$\mlxtran$ for continuous data models

Example 1:

\(\begin{eqnarray} \psi &=& (A,\alpha,B,\beta, a) \\ f(t,\psi) &=& A\, e^{- \alpha \, t} + B\, e^{- \beta \, t} \\ y_{ij} &=& f(t_{ij} , \psi_i) + a\, \teps_{ij} \end{eqnarray}\)
Monolix icon2.png
input = {A, B, alpha, beta, a}

f = A*exp(-alpha*t) + B*exp(-beta*t)

y = {distribution=normal, prediction=f, std=a}

Example 2:

\( \begin{eqnarray} \psi &=& (\delta, c , \beta, p, s, d, \nu,\rho, a) \\ t_0 &=&0 \\[0.2cm] {\rm if \quad t<t_0} \\[0.2cm] \quad \nitc &=& \delta \, c/( \beta \, p) \\ \quad \itc &=& (s - d\,\nitc) / \delta \\ \quad \vl &=& p \, \itc / c. \\[0.2cm] {\rm else \quad \quad }\\[0.2cm] \quad \dA{\nitc}{} & =& s - \beta(1-\nu) \, \nitc(t) \, \vl(t) - d\,\nitc(t) \\ \quad \dA{\itc}{} & = &\beta(1-\nu) \, \nitc(t) \, \vl(t) - \delta \, \itc(t) \\ \quad \dA{\vl}{} & = &p(1-\rho) \, \itc(t) - c \, \vl(t) \\ \quad \log(y_{ij}) &= &\log(V(t_{ij} , \psi_i)) + a\, \teps_{ij} \end{eqnarray}\)
Monolix icon2.png
input = {delta, c, beta, p, s, d, nu, rho, a}

N_0 = delta*c/(beta*p)
I_0 = (s - d*N_0)/delta
V_0 =  p*I_0/c
ddt_N = s - beta*(1-nu)*N*V - d*N
ddt_I = beta*(1-nu)*N*V - delta*I
ddt_V = p*(1-rho)*I - c*V

y = {distribution=logNormal, prediction=V, std=a}


Davidian, M., Giltinan, D.M. - Nonlinear Models for Repeated Measurements Data

Chapman & Hall., London,1995
Demidenko, E. - Mixed Models: Theory and Applications
Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G. - Longitudinal Data Analysis
Taylor & Francis,2008
Jiang, J. - Linear and Generalized Linear Mixed Models and Their Applications
Springer Series in Statistics, New York,2007
Laird, N.M., Ware, J.H. - Random-Effects Models for Longitudinal Data
Biometrics 38:963-974,1982
Lindstrom, M.J., Bates, D.M. - Nonlinear mixed-effects models for repeated measures
Biometrics 46:673-687,1990
Littell, R.C. - SAS for mixed models
SAS institute,2006
McCulloch, C.E., Searle, S.R. - Generalized, Linear, and Mixed Models
Wiley & Sons,2004
Verbeke, G., Molenberghs, G. - Linear Mixed Models for Longitudinal Data
West, B., Welch, K.B., Galecki, A.T. - Linear Mixed Models: A Practical Guide Using Statistical Software
Taylor & Francis,2006


Outils personnels
Espaces de noms

Tasks & Tools
Download files
Boîte à outils