Stochastic differential equations based models
Sommaire 
Introduction
Diffusion models are known to be a relevant tool for modeling stochastic dynamic phenomena, and are widely used in various fields including finance, physics, biology, physiology and control. In a population approach, a mixedeffects diffusion model describes each individual series of observations using a system of stochastic differential equations (SDE) while also taking into account variability between individuals.
For the sake of simplicity we will consider first a diffusion model for a single individual, and illustrate it with a very general dynamical system with linear transfers and PK examples. We will then show that the extension to mixed diffusion models is fairly straightforward.
Note that the conditional distribution $\qcypsi$ of the observations usually does not have a closedform expression. When the underlying system is a Gaussian linear dynamical one, the conditional pdf of the observations, $\pcypsi(y_i\psi_i)$ can be computed using the Kalman filter (KF). When the system is not linear, the extended Kalman filter (EKF) provides an approximation of the conditional pdf.
Diffusion model
We assume that one diffusion trajectory is observed with noise at discrete time points $t_1<\ldots<t_j<\ldots<t_n$. Let us note $(X(t),t>0) \in \Rset^d$ the underlying dynamical process and $y_j \in \Rset$ a noisy function of $X(t_j)$, $j=1,\ldots,n$. The general form of the diffusion model is given by:
\(
\left\{
\begin{array}{lll}
dX(t) &=& b(X(t),\psi)dt + \gamma(X(t),\psi)dW(t)\\[0.2cm]
y_{j} &=& c(X(t_{j}),\psi) + \varepsilon_{j} \\[0.2cm]
\varepsilon_{j} &\underset{i.i.d.}{\sim}& \mathcal{N}(0,a^2(\psi)), \quad j=1,\ldots,n ,
\end{array}
\right. \)

(1) 
with the initial condition $X(t_1) = x \in \Rset^d$. Here, $(W(t),t>0)$ is a standard Wiener process in $\Rset^d$ and $\varepsilon_j \in \Rset$ represents the measurement error occurring at the $j^{\mathrm{th}}$ observation, independent of $W(t)$. The measurement function $c: \ \Rset^d \times \Rset^p \rightarrow \Rset$, the drift function $b: \ \Rset^d \times \Rset^p \rightarrow \Rset^d$ and the diffusion function $\gamma: \ \Rset^d \times \Rset^p \rightarrow \mathcal{M}_d(\Rset)$, where $\mathcal{M}_d(\Rset)$ is the set of $d \times d$ matrices with real elements, are known functions that depend on an unknown parameter $\psi \in \Rset^p$.
We can in fact consider an SDEbased model as a ODEbased one with a stochastic component.
These kinetics are clearly stochastic. Nevertheless, they are not realistic because:
 they give an overly erratic description of the evolution of the drug concentration within the compartments of the human body.
 they do not comply with certain constraints on biological dynamics (sign, monotony).
A more relevant model might consider that some parameters of the model randomly fluctuate over time, rather than the observed variable itself, modeling for example the elimination rate "constant" $k$ as a stochastic process $k(t)$ that randomly varies around a typical value $k^\star$.
More generally, we can describe the fluctuations within a linear dynamical systems by considering the transfer rates, described below, as diffusion processes rather than the observed processes themselves.
Diffusion models for dynamical systems with linear transfers
Dynamical systems have applications in many fields. They can be used to model viral dynamics, population flows, interactions between cells, and drug pharmacokinetics. Dynamical systems involving linear transfers between different entities are usually modeled by means of a system of ODEs with the following general form:
\(
dA(t) = K\, A(t)dt,
\)

(4) 
where $A(t)$ is a vector whose $l^{\textrm{th}}$ component represents the condition of the $l^{\textrm{th}}$ entity at time $t$ and $K=(K_{l,l^\prime} \, 1\leq l , l^\prime \leq d)$ a deterministic matrix defined as:
\(
\left\{
\begin{array}{ll}
K_{l,l^\prime} = k_{l,l^\prime} & \textrm{if} \quad l \neq l^\prime\\
K_{l,l} =  k_{l,0}  \sum_{l^\prime} k_{l,l^\prime} ,
\end{array}
\right.
\)

(5) 
where $k_{l,l^\prime}$ represents the transfer rate from entity $l$ to entity $l^\prime$, and $k_{l,0}$ the elimination rate from entity $l$. An example of such a dynamical system with $3$ components is schematized below.
A dynamical system with $3$ components (circles) and linear transfers between components (arrows)

In this particular example, matrix $K$ would be defined as
The model defined by equations (4) and (5) is a deterministic model which assumes that transfers take place at the same rate at all times. This is often a restrictive assumption since in reality, dynamical systems usually exhibit some random behavior. It is therefore reasonable to consider that transfers are not constant but randomly fluctuate over time. This new assumption leads to the following dynamical system:
\(
dA(t) = K(t)A(t)dt,
\)

(6) 
where $K$ has the same structure as in (5) but now some components $k_{l,l^\prime}$ are stochastic processes which take nonnegative values and randomly fluctuate around a typical value $k_{l,l^\prime}^\star$.
Let us now illustrate the construction of such diffusion models using some specific examples in pharmacokinetics.
In both examples, the diffusion model can be easily extended to a population approach by defining the system's parameters $\psi$ as an individual random vector.
Mixedeffects diffusion models
Let us now consider model (1) with observations coming from several subjects. An adequate adaptation of model (1) in such a context consists of considering as many dynamical systems as individuals, and defining the parameters of the individual dynamical systems as independent random variables, in such a way to correctly reflect variability between the different trajectories. To standardize notation, we consider $N$ different subjects randomly chosen from a population and note $n_i$ the number of observations for individual $i$, so that $t_{i1}<\ldots<t_{i,n_i}$ are subject $i$'s observation time points. $(X_i(t),t>0) \in \Rset^d$ and $y_{ij} \in \Rset$ will respectively denote individual $i$'s diffusion and the observation $X_i(t_{ij})$. The $y_{ij}$, $i=1,\ldots,N$, $j=1,\ldots,n_i$ are governed by a mixedeffects model based on a $d$dimensional realvalued system of stochastic differential equations with the general form:
\(
\left\{
\begin{array}{l}
dX_i(t) = b(X_i(t),\psi_i)dt + \gamma(X_i(t),\psi_i)dW_i(t),\\[0.2cm]
y_{ij} = c(X_i(t_{ij}),\psi_i) + \teps_{ij},\\[0.2cm]
\teps_{ij} \underset{i.i.d.}{\sim} \mathcal{N}(0,a^2(\psi_i)) \; , \; j=1,\ldots, n_i \; , \; i=1,\ldots,N,\\
\end{array}
\right.
\)

(8) 
with initial condition $X_i(t_1) = x_{i1} \in \Rset^d$ for $i=1,\ldots,N$. The $\psi_i$'s are unobserved independent $d$dimensional random subjectspecific parameters, drawn from a distribution $\qpsi$ which depends on a set of population parameters $\theta$, $(W_1(t),t>0), \ldots, (W_N(t),t>0)$ are standard independent Wiener processes, and the $\teps_{ij}$ are independent Gaussian random variables representing residual errors such that the $\psi_i$, $W_i$ and $\teps_{ij}$ are mutually independent. The measurement function $c$, the drift function $b$ and the diffusion function $\gamma$ are known functions that are common to the $N$ subjects and depend on the unknown parameters $\psi_i$.
Assuming that the $N$ individuals are independent, the joint pdf is given by:
\(
\pcypsi(y_1,\ldots,y_N  \psi_1,\ldots,\psi_N) = \prod_{i=1}^{N}\pcyipsii(y_i  \psi_i).
\)

(9) 
Computing the conditional distribution $\pcyipsii$ of the observations for any individual $i$ requires here to compute the conditional distribution of each observation given the past:
Except in some very specific classes of mixedeffects diffusion models, the transition density $\pmacro(y_{i,j}y_{i,1},\ldots,y_{i,j1}  \psi_i)$ does not have a closedform expression since it involves the transition densities of the underlying diffusion processes $X_i$. When the underlying system is a Gaussian linear dynamical system, this density is a Gaussian density whose mean and variance can be computed using the Kalman filter. When the system is not linear, a first solution consists in approximating this density by a Gaussian density and using the extended Kalman filter for quickly computing the mean and the variance of this density. On the other hand, particle filters do not make any approximations of the transition density, but are very demanding in terms of simulation volume and computation time.
Bibliography
Delattre, M., Lavielle, M.  Coupling the SAEM algorithm and the extended Kalman filter for maximum likelihood estimation in mixedeffects diffusion models
 Statistics and Its Interface ,2013
 REVSTAT Statistical Journal 3:137153,2005
 ESAIM: Probability and Statistics 12:196218,2008
 Oxford Handbook of Nonlinear Filtering ,2011
 Computer methods and programs in biomedicine 94:279289,2009
 Journal of Pharmacokinetics and Pharmacodynamics 32:109141,2005
 Computational Statistics 23:51939,2008
Mortensen, S. B., Klim, S., Dammann, B., Kristensen, N. R., Madsen, H., Overgaard, R. V.  A Matlab framework for estimation of NLME models using stochastic differential equations  Applications for estimation of insulin secretion rates
 Journal of Pharmacokinetics and Pharmacodynamics 34:623642,2007
 Journal of Pharmacokinetics and Pharmacodynamics 32:85107,2005
 Scandinavian Journal of Statistics 37:6790,2010
 Computational Statistics and Data Analysis 55(3):14261444,2011
 Pharmaceutical Research 22:12471258,2005