Richard Hugtenburg
December 21, 2000
A PDF version of this file is available here
The following is a simple example to show two important properties of
a Markov chain Monte Carlo (MCMC) sampler and to illustrate the basic
functionality of the method and issues relating to it's usage.
A Markov chain is a series where the realisation of the next element in the series, Y, is dependent only on the current state, X, and occurs with probability, P(Y|X). So the even number series forms a Markov chain but the Fibonacci sequence does not. A particle trajectory is a Markov chain but the formation of a polymer is not.
MCMC is a technique for sampling from space that might be difficult to sample from because it does not have an analytical form or it may be difficult to integrate, in which case a normalising constant and a probability density function (pdf) cannot be found. The elements might be a numerical result from Monte Carlo sampling within a multivariate system, which can not be sampled directly, but are generated from other sampled variables.
MCMC enables independent sampling of a target distribution, f, by using samples from a integrable function, f* in a similar fashion to importance sampling. The first important property of MCMC samplers is that the weighting function need not be know in advance but is effectively generated by a stochastic process. Samples are taken from the distribution f* and accepted or rejected depending on the relative probabilities of the new state compared to the old state in the target distribution, f and the relative probabilities of selection in the integrable distribution f*. Consequently, the second important property of MCMC samplers is that the probability of a state need never be known to better than a normalising constant.
Hastings [1] introduces a general form for an MCMC
sampler that is stationary which requires that given a sample from the
target distribution, f, subsequent samples will remain in f. The
probability of an arbitrary transition is given by
where,
Stationarity is achieved under conditions of reversibility, where the
probability of being in state X and changing to Y is equal to the
probability of being in Y and changing to X, that
P(X)P(Y|X)=P(Y)P(X|Y). The choice of
gives a
symmetric relation that satisfies the reversibility condition.
| P(X)P(Y|X) | = | ||
| = | |||
| = | P(Y)P(X|Y) | (2) |
While his results guarantee stationarity, convergence is only
guaranteed if the Markov chain is irreducible, that there is a
positive probability of visiting all states in a finite number of
transitions. The quality of convergence is therefore dependent on the
quality of the visitation of states generated by Q. One condition
that ensures convergence is that
,
where
is a
number everywhere greater than zero. In other words this condition
dictates that the probability of a transition is never vanishingly
small. Intuitively, Q is often chosen to be similar in shape to P. In
a Bayesian context, Q encompasses our knowledge of P prior to the
calculation, hence Q(X)>P(X).
A typical method of sampling f* is the random walk sampler where
the sample Y is dependent only on the separation from X. In which
case,
Q(Y|X)=Q(|X-Y|). Subsequently,
.
An
independence sampler is where the new sample Y is independent of
the current value X. So,
Q(Y|X)=f*(Y).
An example will be presented of a one-sided normal distribution
sampled from an exponential decay distribution. A normal distribution
can be sampled so this example is a little trivial but helps to
illustrate several important characteristics of the method.
We take our normal distribution as being proportional to
.
The exponential function is defined
as
.
Y is sampled independently.
Equation 1 reduces to the following;
| (3) |
To calculate the sampled pdf, f*(x) is integrated over
and normalised to a total probability of one. A value for x is
sampled from the inverse of the integral of f* for a random
variable,
,
uniformly distributed on (0,1). As,
,
| = | |||
| = | (4) |
A second random variable,
,
is used to determine the nature of
the transition with respect to the calculated value of
.
Samples can be collected into groups. The samples, X, been collected
into those bounded by
,
the proportion of which
represent the first standard deviation.
![]() |
Figure 2 demonstrates the estimates as a function of iteration number and starting value. Note how the choice of starting value introduces a bias into the simulation as it is not an independent sample from the stationary distribution. An initial burn-in period is therefore included in a start-up procedure to minimise the significance of the starting value. Consequently, an MCMC sampler is best incorporated into a code as an object that runs continuously and independently of the main program.
This introductory material provides the background to a problem in radiation physics [2]. Monte Carlo is effective in determining macroscopic properties of radiation sources. A typical result from a Monte Carlo ``simulation'' of a radiation source would be the expectation or average energy or charge deposited within a particular region for a given (large) number of source particles. In many circumstances, the expectation corresponds to an actual measurement. Perhaps the energy or charge deposited is within the sensitive volume of a detector. The information from the detector adds a further constraint to the model and might be used to improve the accuracy of estimates of quantities that are more difficult to measure in the system.
Here is a multivariate system where a target quantity with a distribution defined by the accuracy of the measurement can only be generated from a dependent distributed parameter that may be sampled directly. To do so in such a way that is physically consistent such that unbiased estimates of other quantites can be made requires the Markov chain Monte Carlo method.
The distribution f* is generated from a series of samples from a
distribution defined by our knowledge of the dependent parameter,
.
Realisation of the Markov chain, from current state X to
next state Y which will tend towards the distribution f, is
complicated, Q(Y|X) is undefined as Y is not sampled directly
from
.
Random walk sampling is useful as it removes the dependence on the transition probability but suffers if the distribution has poor visitation properties such as would be caused by a lot of local probability maxima.
Independence sampling has the advantage of a more robust convergence
under these circumstances but can be less efficient if the
distribution is highly peaked around probability maxima, leading to a
lot of rejections. A simple way of obtaining an unbiased sample from
f*, such that Q(Y) can be determined with confidence, is to ensure
that the Monte Carlo generation from
is of high accuracy. In
practice, early estimates of f* can be of lesser accuracy and the
quality of the Monte Carlo simulation improved as the algorithm
proceeds. The method is accurate as long as quantities are examined
carefully for signs of convergence before they are accumulated. The
method is satifactory in some circumstances but quickly grinds down
in problems requiring high resolution such as spectroscopy.
For further applications and extensions of Hastings algorithm the reader is referred to Tierney (1994) [3] and to a growing number of MCMC examples on the web.
Richard Peter Hugtenburg