Introduction
A model built for realworld applications can involve various types of variable, such as measurements, individual and population parameters, covariates, design, etc. The model allows us to represent relationships between these variables.
If we consider things from a probabilistic point of view, some of the variables will be random, so the model becomes a probabilistic one, representing the joint distribution of these random variables.
Defining a model therefore means defining a joint distribution. The hierarchical structure of the model will then allow it to be decomposed into submodels, i.e., the joint distribution decomposed into a product of conditional distributions.
Tasks such as estimation, model selection, simulation and optimization can then be expressed as specific ways of using this probability distribution.
 A model is a joint probability distribution.
 A submodel is a conditional distribution derived from this joint distribution.
 A task is a specific use of this distribution.
We will illustrate this approach starting with a very simple example that we will gradually make more sophisticated. Then we will see in various situations what can be defined as the model and what its inputs are.
An illustrative example
A model for the observations of a single individual
Let $y=(y_j, 1\leq j \leq n)$ be a vector of observations obtained at times $ t=(t_j, 1\leq j \leq n)$. We consider that the $y_j$ are random variables and we denote $p_y$ the distribution (or pdf) of $y$. If we assume a parametric model, then there exists a vector of parameters $\psi$ that completely define $y$.
We can then explicitly represent this dependency with respect to ${\bf \psi}$ by writing $p_y( \, \cdot \, ; \psi)$ for the pdf of $y$.
If we wish to be even more precise, we can even make it clear that this distribution is defined for a given design, i.e., a given vector of times $ t$, and write $ p_y(\, \cdot \, ; \psi, t)$ instead.
By convention, the variables which are before the symbol ";" are random variables. Those that are after the ";" are nonrandom parameters or variables.
When there is no risk of confusion, the nonrandom terms can be left out of the notation.
In this context, the model is the distribution of the observations $p_y(\, \cdot \, ; \psi, t)$.
The inputs of the model are the parameters $\psi$ and the design $ t$.
Example:
500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model
\( f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{k \, t} , \)
where $V$ is the volume of distribution and $k$ the elimination rate constant. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:
\( y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . \)
Assuming that the residual errors $(e_j)$ are independent and normally distributed with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and
\(
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. \)

(1.4)

Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:
\( p(y ; \psi, t) = \prod_{j=1}^n p_j(y_j ; \psi,t_j) ,
\)
where $p_j$ is the normal distribution defined in
(1.4).
Example:
500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model
\( f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{k \, t} , \)
where $V$ is the volume of distribution and $k$ the elimination rate constant. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:
\( y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . \)
Assuming that the residual errors $(e_j)$ are independent and normally distributed with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and
\(
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. \)

(1.4)

Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:
\( p(y ; \psi, t) = \prod_{j=1}^n p_j(y_j ; \psi,t_j) ,
\)
where $p_j$ is the normal distribution defined in
(1.4).
Example:
500 mg of a drug is given by intravenous bolus to a patient at time 0. We assume that the evolution of the plasmatic concentration of the drug over time is described by the pharmacokinetic (PK) model
\( f(t;V,k) = \displaystyle{ \frac{500}{V} }e^{k \, t} , \)
where $V$ is the volume of distribution and $k$ the elimination rate constant. The concentration is measured at times $(t_j, 1\leq j \leq n)$ with additive residual errors:
\( y_j = f(t_j;V,k) + e_j , \quad 1 \leq j \leq n . \)
Assuming that the residual errors $(e_j)$ are independent and normally distributed with constant variance $a^2$, the observed values $(y_j)$ are also independent random variables and
\(
y_j \sim {\cal N} \left( f(t_j ; V,k) , a^2 \right), \quad 1 \leq j \leq n. \)

(1.4)

Here, the vector of parameters $\psi$ is $(V,k,a)$. $V$ and $k$ are the PK parameters for the structural PK model and $a$ the residual error parameter.
As the $y_j$ are independent, the joint distribution of $y$ is the product of their marginal distributions:
\( p(y ; \psi, t) = \prod_{j=1}^n p_j(y_j ; \psi,t_j) ,
\)
where $p_j$ is the normal distribution defined in
(1.4).
A model for several individuals
Now let us move to $N$ individuals. It is natural to suppose that each is represented by the same basic parametric model, but not necessarily the exact same parameter values. Thus, individual $i$ has parameters $\psi_i$. If we consider that individuals are randomly selected from the population, then we can treat the $\psi_i$ as if they were random vectors. As both ${\bf y}=(y_i , 1\leq i \leq N)$ and ${\bf \psi}=(\psi_i , 1\leq i \leq N)$ are random, the model is now a joint distribution: $p_ypsi$. Using basic probability, this can be written as:
\(
{\mathrm p}({\bf y},{\bf \psi}) = \mathrm{p}({\bf y}  {\bf \psi}) \, \mathrm{p}({\bf \psi}) .\)
If $p_\psi$ is a parametric distribution that depends on a vector $\theta$ of population parameters and a set of individual covariates ${\bf c}=(c_i , 1\leq i \leq N)$, this dependence can be made explicit by writing $p_\psi(\, \cdot \,;\theta,{\bf c})$ for the pdf of ${\bf \psi}$.
Each $i$ has a potentially unique set of times $t_i=(t_{i1},\ldots,t_{i \ \!\!n_i})$ in the design, and $n_i$ can be different for each individual.
 In this context, the model is the joint distribution of the observations and the individual parameters:
\( {\mathrm p}({\bf y} , {\bf \psi}; \theta, {\bf c},{\bf t})= {\mathrm p}({\bf y} , {\bf \psi};{\bf t}) \, \mathrm{p}({\bf \psi};\theta,{\bf c}) . \)
 The inputs of the model are the population parameters $\theta$, the individual covariates ${\bf c}=(c_i , 1\leq i \leq N)$ and the measurement times
 ${\bf t}=(t_{ij} ,\ 1\leq i \leq N ,\ 1\leq j \leq n_i)$.
Remarks
Approximating the fraction $\hat{\psi}/\widehat{\rm s.e}(\hat{\psi}_k)$ by the normal distribution is a "good" approximation only when the number of observations $n$ is large. A better approximation should be used for small $n$. In the model $y_j = f(t_j ; \phi) + a\varepsilon_j$, the distribution of $\hat{a}^2$ can be approximated by a chisquare distribution with $(nd_\phi)$ degrees of freedom, where $d_\phi$ is the dimension of $\phi$. The quantiles of the normal distribution can then be replaced by those of a Student's $t$distribution with $(nd_\phi)$ degrees of freedom.
Examples With Equations/Code/Tables
Example 1: Poisson model with time varying intensity
\( \begin{array}{c}
\psi_i &=& (\alpha_i,\beta_i) \\[0.3cm]
\lambda(t,\psi_i) &=& \alpha_i + \beta_i\,t \\[0.3cm]
\mathbb{P}(y_{ij}=k) &=& \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{\lambda(t_{ij} , \psi_i)}\\
\end{array}\)

MLXTran
INPUT:
input = {alpha, beta}
EQUATION:
lambda = alpha + beta*t
DEFINITION:
y ~ poisson(lambda)

Modèle:ExampleWithTable1bis
Example 1: Poisson model with time varying intensity
\( \begin{array}{c}
\psi_i &=& (\alpha_i,\beta_i) \\[0.3cm]
\lambda(t,\psi_i) &=& \alpha_i + \beta_i\,t \\[0.3cm]
\mathbb{P}(y_{ij}=k) &=& \displaystyle{ \frac{\lambda(t_{ij} , \psi_i)^k}{k!} } e^{\lambda(t_{ij} , \psi_i)}\\
\end{array}\)

MLXTran
INPUT:
input = {alpha, beta}
EQUATION:
lambda = alpha + beta*t
DEFINITION:
y ~ poisson(lambda)

Modèle:ExampleWithTable 4

R
fmin1=function(x,y,t)
{f=predc1(t,x)
g=x[4]
e=sum( ((yf)/g)^2 + log(g^2))
}
fmin2=function(x,y,t)
{f=predc2(t,x)
g=x[4]
e=sum( ((yf)/g)^2 + log(g^2))
}
# MLE 
pk.nlm1=nlm(fmin1, c(0.3,6,0.2,1), y, t, hessian="true")
psi1=pk.nlm1$estimate
pk.nlm2=nlm(fmin2, c(3,10,0.2,4), y, t, hessian="true")
psi2=pk.nlm2$estimate

 Here are the parameter estimation results:
> cat(" psi1 =",psi1,"\n\n")
psi1 = 0.3240916 6.001204 0.3239337 0.4366948
> cat(" psi2 =",psi2,"\n\n")
psi2 = 3.203111 8.999746 0.229977 0.2555242

Equations
Here are some examples of these various types of data:
 Continuous data with a normal distribution:
\(y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)\)
 Here, $\lambda(t_{ij},\psi_i)=\left(f(t_{ij},\psi_i),\,g(t_{ij},\psi_i)\right)$, where $f(t_{ij},\psi_i)$ is the mean and $g(t_{ij},\psi_i)$ the standard deviation of $y_{ij}$.
 Categorical data with a Bernoulli distribution:
\( y_{ij} \sim {\cal B}\left(\lambda(t_{ij},\psi_i)\right) \)
 Here, $\lambda(t_{ij},\psi_i)$ is the probability that $y_{ij}$ takes the value 1.
\(
y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)
\)

(2.1)

\(y_{ij} \sim {\cal N}\left(f(t_{ij},\psi_i),\, g^2(t_{ij},\psi_i)\right)\)
\( {\cal L}(\theta ; \psi_1,\psi_2,\ldots, \psi_N) \ \ = \ \ \prod_{i=1}^{N}{\mathrm p}(\psi_i ; c_i , \theta). \)

This is the caption of the figure
