Mixture models
Sommaire 
Introduction
Mixedeffects models are frequently used for modeling longitudinal data when data is obtained from different individuals from the same population. These models allow us to take into account betweensubject variability. One complicating factor arises when data is obtained from a population with some underlying heterogeneity. If we assume that the population consists of several homogeneous subpopulations, a straightforward extension of mixedeffects models is a finite mixture of mixedeffects models.
As an example, the use of a mixture of mixed effects models is particularly relevant when the response of patients to a drug therapy is heterogeneous. In any clinical efficacy trial, patients who respond, partially respond or do not respond at all can be considered different subpopulations with quite different profiles.
The introduction of a categorical covariate (e.g., sex, genotype, treatment, status, etc.) into such a model already supposes that the whole population can be decomposed into subpopulations. The covariate then serves as a label for assigning each individual to a subpopulation. In practice, the covariate can either be known or not.
Mixture models usually refer to models for which the categorical covariate is unknown, but whatever the case, the joint model that brings together all the parts (observations, individual parameters, covariates, labels, design, etc.) is the same. The difference appears when having to perform certain tasks and in the methods needed to implement them. For instance, the task of simulation makes no distinction between the two situations because all the variables are simulated, whereas model construction is different depending on whether the labels are known or unknown: we have supervised learning if the labels are known and unsupervised learning otherwise.
There exist several types of mixture models which are useful in the context of mixedeffects models, e.g., mixtures of distributions, mixtures of residual error models, and mixtures of structural models. Indeed, heterogeneity in the response variable cannot be always adequately explained only by interpatient variability of certain parameters. It can therefore be necessary to introduce diversity into the structural models themselves:
 Betweensubject model mixtures assume that there exist subpopulations of individuals. Here, various structural models describe the response of the different subpopulations, and each subject belongs to one subpopulation. One can imagine for example different structural models for responders, non responders and partial responders to a given treatment.
 Withinsubject model mixtures assume that there exist subpopulations (of cells, viruses, etc.) within each patient. Again, differing structural models describe the response of the different subpopulations, but the proportion of each subpopulation depends on the patient.
Mixtures of mixedeffects models
For the sake of simplicity, we will consider a basic model that involves individual parameters $\bpsi=(\psi_i,1\leq i \leq N)$ and observations $\by=(y_i,1\leq i \leq N)$, where $y_i=(y_{ij},1\leq j \leq n_i)$. Then, the simplest way to model a finite mixture model is to introduce a label sequence $\bz=(z_i ; 1\leq z_i \leq N)$ that takes its values in $\{1,2,\ldots,M\}$ and is such that $z_i=m$ if subject $i$ belongs to subpopulation $m$.
In some situations, the label set $\bz$ is known and can then be used as a categorical covariate in the model. If $\bz$ is known and if we consider $\bz$ the realization of a random vector, the model is the conditional distribution
\(
\pcypsiz(\by,\bpsi  \bz;\theta) = \pccypsiz(\by  \bpsi , \bz)\pcpsiz(\bpsi  \bz;\theta) .
\)

(1) 
If $\bz$ is unknown, it is modeled as a random vector and the model is the joint distribution
\(
\pypsiz(\by,\bpsi, \bz;\theta) = \pcypsiz(\by,\bpsi \bz;\theta)\pz(\bz;\theta) .
\)

(2) 
We therefore consider that $\bz=(z_i)$ is a set of independent random variables taking its values in $\{1,2,\ldots,M\}$: for $i=1,2,\ldots, N$, there exist $\pw_{i,1},\pw_{i,2},\ldots,\pw_{i,M}$ such that
A simple model might assume that the $(z_i)$ are identically distributed: $\pw_{i,m} = \pw_{m}$ for $m=1,\ldots,M$. But more complex models can be considered, assuming for instance that an individual's probabilities depend on its covariate values.
In its most general form, a mixture of mixedeffects models assumes that there exist $M$ joint distributions $\pyipsii_{1}$, ..., $\pyipsii_{M}$ and vectors of parameters $\theta_1$, ..., $\theta_M$ such that for any individual $i$, the joint distribution of $y_i$ and $\psi_i$ becomes
where $\pypsi_{m}$ is the joint distribution of $(y_i,\psi_i)$ in group $m$ and where $\theta=(\theta_1,\ldots,\theta_M)$.
The distribution of the observations $y_i$ is therefore itself a mixture of $M$ distributions:
\(\begin{array}{c}
& = & \sum_{m=1}^M \prob{z_i = m} \left( \int \pyipsii_{m}(y_i,\psi_i,\theta_m) \, d \psi_i \right)
\end{array}\)

(3) 
The mixture can then be looked at via the distribution of the individual parameters $\qpsii$ and/or the conditional distribution of the observations $\qcyipsii$. Let us now see some examples of such mixtures models.
 A latency structure can be introduced at the individual parameter level:
 where $\ppsii_{m}(\psi_i;\theta_m)$ is the distribution of the individual parameters in group $m$. For example, a mixture of linear Gaussian models for the individual parameters assumes that there exist $M$ population parameters $\psi_{{\rm pop},1}, \ldots, \psi_{{\rm pop},M}$, vectors of coefficients $\beta_{1}, \ldots, \beta_{M}$, variance matrices $\Omega_{1}, \ldots, \Omega_{M}$ and transformations $h_1,\ldots,h_M$ such that
 where $\mu_m = h_m(\psi_{ {\rm pop},m})+ \langle \beta_m , c_i \rangle$.
 This is the most general representation possible because it allows the transformation, population parameters, covariate model and variancecovariance structure of the random effects all to vary from one group to the next. A more simpler representation would have one or all of these fixed across the groups.
 A latency structure can also be introduced at the level of the conditional distribution of the observations $(y_{ij})$:
 where $\pcyipsii_{m}$ is the conditional distribution of the observations in group $m$. For example, the model for continuous data
 with $\teps_{ij} \sim {\cal N}(0,1)$, can be equivalently represented as
 for each $m=1,\ldots,M$. A mixture of conditional distributions therefore reduces to a mixture of structural models and/or residual errors.
 To give a precise example, a mixture of constant error models would assume that
 Alternatively, between subject model mixtures (BSMM) assume that the structural model is a mixture of $M$ different structural models:
\(
y_{ij} = f\left( t_{ij};\psi_i,z_i \right) + g\left( t_{ij};\psi_i,z_i \right)\teps_{ij}
\)

(4) 
\(
f\left( t_{ij};\psi_i,z_i \right) = \sum_{m=1}^M \one_{z_i = m} f_m\left( t_{ij};\psi_i \right) .
\)

(5) 
Example 1: Mixtures of normal distributions
We consider here a simple PK model for a single oral administration:
Here, the PK parameters are the absorption rate constant $ka$, the elimination rate constant $ke$ and the volume of distribution $V$.
We can model the PK parameters $\psi_i=(ka_i,V_i, ke_i)$ of individual $i$ randomly chosen from the population as a vector of independent random parameters.
The figure shows the final distribution obtained for the volume when given as a mixture of two lognormal distributions:
2 lognormal distributions $p_1$ and $p_2$ for the volume and a mixture of these two distributions

Here, the structural model $f$ is a function of time and $f( t ; \psi_i)$ is the predicted concentration of the drug in individual $i$ at time $t$.
Then, $f( \, \cdot \, ; \psi_i)$ is a random function because it depends on a random parameter $\psi_i$.
The probability distribution of $f( \, \cdot \, ; \psi_i)$ is therefore that of the concentration predicted by the model. It represents the interindividual variability of the drug's pharmacokinetics in the population.
The figure below displays prediction intervals for the concentration $f( \, \cdot \, ; \psi_i)$ for one individual $i$ randomly chosen in the population, where $\psi_i=(ka_i,V_i,Cl_i)$ are the PK parameters. In other words, this plot allows us to visualize the impact of the interindividual variability of the individual PK parameters on the exposure to the drug. Here, $V_i$ is a mixture of two lognormal distributions as described above while $ka_i$ and $ke_i$ have lognormal distributions:
Prediction intervals for the predicted concentration kinetics $f( \, \cdot \, ; \psi_i)$

Here, the distribution of $f( \, \cdot \, ; \psi_i)$ is itself a mixture of 2 distributions since the distribution of $\psi_i$ is a mixture of distributions due to $V_i$. It is interesting to see the distribution of the predicted concentration in each subpopulation. Indeed, any individual $i$ will either have a logvolume from ${\cal N}(\log(70) , 0.3^2)$ (with probability 0.35) or a logvolume from ${\cal N}(\log(42) , 0.3^2)$ (with probability 0.65), so in order to visualize what really happen to a single individual $i$, we need to split the data into two plots: 35% of the individuals will have concentration kinetics distributed like on the left, and 65% like on the right.
The probability distribution of the predicted concentration kinetics $f( \, \cdot \, ; \psi_i)$ in the two subpopulations

Example 2: Mixtures of structural models
Here we are interested in a study which concerns treated HIVinfected patients. The output data is the viral load evolution for these patients. The figure below gives examples of patients with one of three "characteristic" viral load progressions:
 Nonresponders (1) show no decline in viral load.
 Responders (2) exhibit a sustained viral load decline.
 Rebounders (3 and 4) exhibit an initial drop in viral load, then a rebound to higher viral load levels.
Viral load progression for 4 HIVinfected patients.
(1) nonresponder; (2) responder; (3) and (4) are rebounders. Red points indicate below level of quantification data. 
Within a few months of HIV infection, patients typically enter a steady state of chronic infection and have a stabilized concentration of HIV1 in blood plasma. This concentration is modeled by an individual constant $A_{i,0}$. When antiretroviral treatment starts, the viral load of patients who respond shows an initial rapid exponential decay, usually followed by a slower second phase of exponential decay.
This twophase decay in viral load can be approximated by the biexponential model:
After the decrease in viral load level, some subjects show a rebound, which can be due to several factors (nonadherence to the therapy, emergence of drugresistant virus strains, etc.). We propose to extend the biexponential model to these patients by adding a third phase, characterized by a logistic growth process $A_{3}/({1+e^{\lambda_{3}(t\tau)}})$, where $\tau$ is the inflection point of this growth process.
We can then describe the logtransformed viral load with a BSMM with three simple models, corresponding to each of three characteristic viral load progressions:
The logtransformed viral load can then be modeled by:
where $y_{ij}$ is the viral load for subject $i$ at time $t_{ij}$ and $\psi_i=(A_{i,0},A_{i,1},A_{i,2},A_{i,3},\lambda_{i,1},\lambda_{i,2},\lambda_{i,3},\tau_i)$ the vector of individual parameters.
The figure below displays the predicted viral loads for the 4 patients using model $f_1$ for patient 1, $f_2$ for patient 2 and $f_3$ for patients 3 and 4, with the parameters given to the right of the figure:
Not all observed viral load progressions fall so easily into one of the three classes, as for example the patients shown in the next figure.
Viral load data for 4 patients with ambiguous progressions

In these cases, it does not seem quite so reasonable to model the data under the BSMM assumption that each patient must belong uniquely to one class. Instead, it is perhaps more natural to suppose that each patient is partially responding, partially nonresponding and partially rebounding to the given drug treatment. The goal becomes to find the relative strength of each process in
each patient, and a WSMM is an ideal tool to do this.
Without going further into the details, here are the resulting observed and predicted viral loads for these 4 individuals when each individual represents a mixture of the three viral load progressions.
Observed and predicted viral load progression for 4 patients time using WSMM

Bibliography
Biernacki, C., Celeux, G., Govaert, G., Langrognet, F.  ModelBased Cluster and Discriminant Analysis with the MIXMOD Software.
 Computational Statistics and Data Analysis 51(2):587600,2006
 Computational Statistics and Data Analysis 52(3):14411457,2008
 Springer, New York,2006
 Heredity 101:321328,2008
 J. Biomet Biostat S7:013,2012
 Statistics & Computing (to appear) ,2013
 Progress in Applied Mathematics 4(2):148166,2012
 WileyInterscience, New York,2000
 Biometrics 55(2):463469,1999
 Bioinformatics 22:17451752,2006
 J. Pharmacokin. Biopharm. 25:209233,1997
 Journal of the American Statistical Association 91(433):217221,1996
 Comput. Stat. Data Anal. 51:66146623,2007