MCMC
Introduction
Markov Chain Monte-Carlo (MCMC) is a method of sampling from a distribution, when it is difficult to sample from the distribution directly. It can be used with Monte-Carlo Integration for numerical calculation of integrals.
It explores the state space ⚠ $X$
using a Markov Chain mechanism, and produces samples ⚠ $x_i$
which mimic samples drawn from the distribution ⚠ $p(x)$
directly. We assume that we can evaluate ⚠ $p(x)$
up to a normalizing constant, but it is difficult to draw samples from it.
Let us consider the situation when the number of possible states in a Markov chain is ⚠ $n$
. Then we can choose an initial distribution ⚠ $q(x) \in \mathbb{R}^n$
and have a transition matrix ⚠ $T \in \mathbb{R}^{n \times n}$
. Following the transition matrix at the first step, we get the distribution ⚠ $qT$
. If we repeat the process several times and it converges to a distribution ⚠ $p(x)$
, we say it converged to an invariant distribution. The process converges if the matrix ⚠ $T$
has properties of irreducibility and aperiodicity.
Definition 1. A transition matrix ⚠ $T$
is irreducible if for any state of the Markov chain there is a positive probability of visiting all other states.
Definition 2. A transition matrix ⚠ $T$
is aperiodical if the Markov chain does not get trapped into cycles.
A sufficient condition for the transition matrix to satisfy these requirenments is the reversibility condition:
⚠ $\displaystyle{p(x_i) T(x_j|x_i) = p(x_j) T(x_i|x_j).}$
In this case ⚠ $p(x)$
here is the invariant distribution. Indeed, summing both sides over ⚠ $x_j$
gives us
⚠ $\displaystyle{p(x_i) = \sum _{x_j} p(x_j) T(x_i|x_j).}$
In continuous state spaces the transition matrix ⚠ $T$
becomes an integral kernel ⚠ $K$
and ⚠ $p(x)$
becomes the corresponding eigenfunction.
MCMC samples are organized in a way that the desired ⚠ $p(x)$
is the invariant distribution:
The Metropolis-Hastings algorithm
Let us have a proposal distribution ⚠ $q(x^* | x)$
and invariant distribution ⚠ $p(x)$
. An MH step involves sampling from the proposal distribution a candidate value ⚠ $x^*$
. The Markov chain then moves towards ⚠ $x^*$
with acceptance probability ⚠ $A(x,x^*) = \min\{1,\frac{p(x^*)q(x|x^*)}{p(x)q(x^*|x)}\}$
, otherwise it remains at ⚠ $x$
.
⚠ $x_0$
;
⚠ $i=0,...,N-1$
}
⚠ $u \sim U_{[0,1]}$
from the uniform distribution.
⚠ $x^* \sim q(x^* |x_i)$
from the proposal distribution.
⚠ $u < A(x_i,x^*)$
then ⚠ $x_{i+1} = x^*$
else ⚠ $x_{i+1}=x_i$
.
If ⚠ $q(x^* | x) = q(x | x^*)$
the MH algorithm transforms to the Metropolis algorithm.
Integration using MCMC
To calculate the integral ⚠ $\int f(x) p(x)dx$
it is possible to sample from the uniform distribution and calculate the integral using Monte-Carlo integration. But MCMC method of sampling from ⚠ $p(x)$
may lead to much better convergence. First one needs to perform burn-in state, when a Markov chain finds a way to sample from a distribution close to ⚠ $p(x)$
using MH algorithm. Then search for the distribution continues, but the sum ⚠ $f(x_i)$
is calculated. The resulting approximation of the integral is ⚠ $1/N_2 \sum_{j=N_1}^{N_1+N_2} f(x_j)$
, where ⚠ $N_1$
is the length of the burn-in stage, and ⚠ $N_2$
is the length of the sampling stage.
MCMC is also used to solve optimization problems like ⚠ $\hat x = \arg\max_{x \in X} p(x)$
by applying a so-called simulated annealing technique.
Bibliography
- Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics, 21(6): 1087–1092 (1953).
- Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109 (1970).
- Christophe Andrieu et al. An Introduction to MCMC for Machine Learning, Machine Learning, 50: 5–43 (2003).