# The Metropolis-Hastings algorithm for simulating the individual parameters

We consider a joint model for the observations and individual parameters of individual $i$:

$$\pyipsii(y_i,\psi_i) = \pcyipsii(y_i | \psi_i) \ppsii(\psi_i) ,$$

where $\qcyipsii$ is the conditional distribution of the observations of individual $i$ (see Modeling the observations) and $\ppsii(\psi_i)$ the distribution of the individual parameters of individual $i$ (see Modeling the individual parameters).

Remark

This distribution depends on a vector of population parameters $\theta$ and possibly covariates $c_i$, regression variables $x_i$ and inputs $u_i$. We suppose that all of these components of the model are given, so it is not necessary to explicitly write them each time.

Our goal here is to generate values from the conditional distribution $\qcpsiiyi$:

$$\pcpsiiyi(\psi_i|y_i) = \displaystyle{ \frac{\pyipsii(y_i,\psi_i)}{\pyi(y_i)} } .$$

This distribution cannot usually be computed in closed-form when the model is not a linear Gaussian one. However, we will see that the Metropolis-Hastings (MH) algorithm allows us to draw a sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ which converges (in distribution) to the target distribution $\qcpsiiyi$.

We will consider a very general model for the individual parameters:

 $$\psi_i = M(\beta,c_i,\eta_i) ,$$ (1)

where $\beta$ is a vector of fixed effects, $c_i$ a vector of (observed) individual covariates and $\eta_i$ a vector of random effects whose probability distribution is denoted $\qetai$.

The MH algorithm is used to simulate a sequence of random effects $(\eta_i^{(\imh)}, \imh=1,2,\ldots)$ with the target distribution being the conditional distribution $\qcetaiyi$ of the random effects $\eta_i$. We will then obtain the sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ using (1).

The MH algorithm is iterative and requires an initial value $\eta_i^{(0)}$. Then, at iteration $\imh$, we

1. Draw a new value $\ceta_i^{(\imh)}$ with some proposal distribution $q_{\imh}(\cdot \, ; \eta_i^{(\imh-1)})$
2. Compute $\cpsi_i^{(\imh)} = M(\beta,c_i,\ceta_i^{(\imh)})$
3. Accept this new value, that is let $\eta_i^{(\imh)}=\ceta_i^{(\imh)}$, with probability

4. $$\begin{eqnarray} \alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) &=& \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qcetaiyi(\ceta_i^{(\imh)}|y_i)} {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qcetaiyi(\eta_i^{(\imh-1)}|y_i) } } \\ &=& \displaystyle{\frac{ q_{\imh}(\eta_i^{(\imh-1)}; \ceta_i^{(\imh)} ) \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i | \cpsi_i^{(\imh)})} {q_{\imh}(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)}) \qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i | \psi_i^{(\imh-1)}) } } . \end{eqnarray}$$

Remarks

• In order to run this algorithm, we need to be able to calculate the transition density $q_{\imh}(\ceta_i;\eta_i)$, the random effects density $\petai(\eta_i)$ (which poses no problem if $\eta_i$ is a Gaussian vector for example), and in particular the conditional distribution $\pyipsii(y_i|\psi_i)$. This is why this calculation is explicitly performed in the various examples provided in Modeling the observations.

• We denote $q_{\imh}$ the proposal distribution used at iteration $\imh$ of the algorithm because different proposals can be used at different iterations.

Several proposal distributions are used in $\monolix$:

1. $q^{(1)}=\qetai$ is the marginal distribution of $\eta_i$, that is, the normal distribution ${\cal N}(0,\Omega)$. The acceptance probability for this kernel is
2. $$\alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) = \displaystyle{\frac{ \qcyipsii(y_i | \cpsi_i^{(\imh)})}{\qcyipsii(y_i | \psi_i^{(\imh-1)}) } } .$$

3. $q^{(2,\ieta)}$, for $\ieta=1,2,\ldots, d$ is the unidimensional Gaussian random walk for component $\ieta$ of $\eta_i$:
4. $$\ceta_{i,\ieta}^{(\imh)} = \eta_{i,\ieta}^{(\imh-1)} + \xi_{i,\ieta}^{(\imh)} ,$$

where $\xi_{i,\ieta}^{(\imh)} \sim {\cal N}(0, \upsilon_\ieta^{(\imh)})$. The variance $\upsilon_\ieta^{(\imh)}$ of this random walk is calibrated in order to reach an optimal acceptance rate $\alpha^\star$ ($\monolix$ uses $\alpha^\star = 0.3$ as default). Here, the transition kernel is symmetrical and

$$\alpha(\ceta_i^{(\imh)} ; \eta_i^{(\imh-1)} ) = \displaystyle{\frac{ \qetai(\ceta_i^{(\imh)}) \qcyipsii(y_i | \cpsi_i^{(\imh)})} {\qetai(\eta_i^{(\imh-1)}) \qcyipsii(y_i | \psi_i^{(\imh-1)}) } }.$$

5. $q^{(3,\Meta)}$, for $\Meta \subset \{1,2,\ldots,d\}$ is the multidimensional Gaussian random walk for the vector $\eta_\Meta = (\eta_\ieta , \ieta\in \Meta)$:
6. $$\ceta_{i,\Meta}^{(\imh)} = \eta_i^{(\imh-1)} + \xi_{i,\Meta}^{(\imh)} ,$$

where $\xi_{i,\Meta}^{(\imh)}=(\xi_{i,\ieta}^{(\imh)}, \ieta\in \Meta)$ is a Gaussian vector with diagonal variance matrix $\Upsilon_\Meta^{(\imh)}$. Here as well, the variance $\Upsilon_\Meta^{(\imh)}$ of this random walk is adjusted in order to reach the optimal acceptation rate $\alpha^\star$. Different subsets $\Meta$ are chosen at each iteration.

The MH algorithm consists then in using successively these different proposals for $i=1,2,\ldots , N$.

The variances $(\upsilon_\ieta,\, \ieta=1,2,\ldots, d)$ for proposal $q^{(2,\ieta)}$ are updated at iteration $\imh$ as follows:

$$\upsilon_\ieta^{(\imh)}=\upsilon_\ieta^{(\imh)}(1+\delta(\overline{\alpha}_\ieta^{(\imh-1)}-\alpha^\star)),$$

where $0<\delta<1$ is a constant and $\overline{\alpha}_\ieta^{(\imh)}$ the empirical acceptance rate at iteration $\imh$:

$$\overline{\alpha}_\ieta^{(\imh)} = \frac{1}{N} \sum_{i=1}^N \one_{\eta_i^{(\imh)}=\ceta_i^{(\imh)} } .$$

Indeed, a too small (resp. large) variance for the random walk leads to a too large (resp. small) acceptance probability. The strategy proposed here therefore allows us to adaptively correct the variance for a given acceptance probability $\alpha^\star$. A small value of $a$ allows us to smooth out the sequence $(\upsilon_\ieta^{(\imh)})$. $\monolix$ uses $a = 0.4$ as a default value.

We can use the same strategy for updating the diagonal variance matrices $\Upsilon_\Meta,\ \Meta \subset \{1,2,,\ldots,d\}$ for the kernel $q^{(3,M)}$.

The simulated sequence $(\psi_i^{(\imh)}, \imh=1,2,\ldots)$ can then be used for estimating empirically the conditional distribution $\qcpsiiyi$ and the conditional mean $\esp{F(\psi_i)| y_i}$ of any function $F$ such that $\esp{F^2(\psi_i)| y_i}<+\infty$. The accuracy of the estimation obviously depends on the length $K$ of the sequence $(\psi_i^{(\imh)})$ used for the estimation, since the variance of the estimator merely decreases as $1/K$. Estimation is also improved if the sequence starts from a point which is representative of the equilibrium distribution (burn-in is one method of finding a good starting point) or known to have reasonably high probability (the conditional mode or the last value obtained with SAEM for instance).