Monte Carlo methods rely on repeated random sampling to compute results. They generate random samples from a population according to a probability distribution and use them to obtain numerical results. The founders of the Monte Carlo method were J. von Neumann and S. Ulam during the Manhattan Project in the 1940s. Monte Carlo methods can be used to solve multidimensional integrals and have better convergence than classical numerical integration methods for dimensions greater than 4. The variance of Monte Carlo estimates decreases as 1/N, where N is the number of samples, resulting in slow convergence. Variance reduction techniques can improve the convergence rate.
1. Monte Carlo Methods
Frank Kienle
Senior Data Scientist
Blue Yonder (www.blue-yonder.com)
§ TexPoint fonts used in EMF.
2. History
J. v. Neumann and S. Ulam are commonly regarded as the founders of
the Monte Carlo method (United States Manhattan Project)
Original defined for calculating the probability of winning a card game of
solitaire
Published article 1949 by Metropolis and Ulam:
‘The Monte Carlo method’
2
3. Monte Carlo Example:
How to calculate with the help of Monte Carlo Simulation:
3
1. Uniformly scatter points throughout the square (by simulation)
2. Count the number of points lying in the circle
3. The ratio of the point inside (N1) and the overall number of points (N2)
x
x
x
x
x
x
x
x
x
x
x
A1 = ⇧R2
A2 = (2R)2
=
⇧
4
N1
N2
=
⇧
4
⇧
4. Monte Carlo:
4
3.14159Π =
Example with 30 samples:
1 attempt: 3.4666
2 attempt: 2.9333
3 attempt: 3.2
….
The variance is large:
(calculated with N=1000 attempts)
à 68.4 % of all values lie within a distance of <0.29 to the true value
2
= 0.086
= 0.29
2
=
1
N
X
(Xi ⇡)2
5. Monte Carlo:
5
3.14159Π =
Example with 300 samples:
1 attempt: 3.213
2 attempt: 3.106
3 attempt: 3.32
….
The variance is large:
(calculated with T=1000 attempts)
à 68.4 % of all values lie within a distance of <0.093 to the true value
2
= 0.0087
= 0.093
2
=
1
N
X
(Xi ⇡)2
6. Monte Carlo Methods
Some samples size - some number of points –
and we try to infer something more general
Its all about an application which is called:
Inferential Statistics
6
7. How to solve an integral via Monte Carlo method, e.g.
Monte Carlo Approximation:
e.g. 3 random samples of x
Monte Carlo Integration
7
10 x
f(x)
I =
Z 1
0
ex
dx
x =
1
3
I =
Z 1
0
ex
dx = lim
x!0
X
ex
dx
¯I =
1
N
X
ex
with x 2 [0, 1]
8. How to solve an integral via Monte Carlo method, e.g.
Monte Carlo Approximation
( )
Monte Carlo Integration
8
10 x
f(x)
0
I =
Z 1
0
ex
dx I =
Z 1
0
ex
dx = lim
x!0
X
ex
dx
x =
1
N
! 0
1
N
X
ex N!1
! lim
x!0
X
ex
dx
N ! 1
9. Monte Carlo Methods
Its all about an application which is called:
Inferential statistics
Some samples size - some number of points - and we try to infer
something more general
Why does it work:
Random sample tends to exhibit same properties as the population from
which it is drawn.
9
10. 10
Law of Large Numbers
For a sequence of independent, identically
distributed variable , with expectation then :
Arithmetic mean converges to the expected value
Strong law of large numbers
the sample average converges almost surely to the expected value
Xi for 1, 2, ..., N
µ = E(X)
XN =
1
N
(X1 + · · · + XN )
XN ! µ for N ! 1
Pr
⇣
lim
N!1
XN = µ
⌘
= 1
11. Monte Carlo Methods
Its all about an application which is called:
Inferential statistics
Why does it work:
Random sample tends to exhibit same properties as the population from
which it is drawn.
Calculations:
It is all about to calculate an expectation of a random variable
11
12. Expectation
A random variable with distribution
The expectation of a function of is:
discrete :
Continuous:
12
fX (x)X
g X
E(g(X)) =
X
x2X
g(x)fX (x)
E(g(X)) =
Z
x2X
g(x)fX (x)dx
13. Why is the expectation so useful
Solve Probabilities:
Solve Integrals:
13
Z b
a
q(x)dx = (b a)
Z b
a
q(x)
1
b a
dx
continuous random variable U with density function fU (u) = 1
b a
Z b
a
q(x)dx = (b a)E(q(U))
P(Y 2 A) = E(I{A}(Y ))
14. Why is the expectation so useful
Solve Probabilities:
Solve Integrals:
Discrete Sums:
14
Z b
a
q(x)dx = (b a)E(q(U))
P(Y 2 A) = E(I{A}(Y ))
X
x2A
q(x) =
1
p
X
x2A
q(x)p =
1
p
E(q(W))
W takes values in A with equal probabilityX
w2A
p = 1
20. Monte Carlo Simulation
How good is the Monte Carlo Method:
As seen the variance of the result (error) assuming different attempts can
be pretty large.
The expected variance of the Monte Carlo Simulation is of order
20
2
MC / O
✓
1
N
◆
V ar XN µ = V ar
1
N
NX
i=1
Xi
!
=
1
N
V ar(X)
21. Rate of convergence
The standard derivation (more intuitive number) is of order
Every further digit in precision requires 100 times more simulations!
à Very slow convergence to the correct result
21
MC / O
✓
1
p
N
◆
22. Convergence of Monte Carlo Integration:
Convergence of numerical integration (trapezoid rule):
22
MC / O
✓
1
p
N
◆
T / O
✓
1
N2
◆
23. Multidimensional Integral
Monte Carlo simulation is very effective to solve multidimensional integrals
Standard deviation for different number of samples x,y,z all independent
23
I =
Z 1
0
Z 1
0
Z 1
0
ex
ey
ez
dxdydz = e3
3e2
+ 3e 1 = 5.0732
N = 100 ! = 0.0725
N = 1000 ! = 0.0074
N = 10000 ! = 0.00067
24. Random sampling in the 3D-Grid
24
With only N=100 samples the result is surprisingly good
25. Integration in d-Dimensions
Convergence of numerical integration (trapezoid rule):
Convergence of Monte Carlo Integration:
The error is independent of the dimension
Convergence of Monte Carlo integration is for d>4 better than the classical
numerical integration
25
MC / O
✓
1
p
N
◆
T / O
✓
1
N
2
d
◆
26. Variance reduction method
• The main disadvantage of the (crude) Monte Carlo method is its slow
convergence.
• The standard deviation of the error only decreases as a square root in
terms of the required number of simulations.
• A faster decrease of the variance could speed up the computations,
i.e. achieving a desired accuracy requires less simulation runs.
Any such modification of the (crude) Monte Carlo method is called:
variance reduction method
26
32. Importance Sampling
Idea: certain values of the input random variables in a simulation have
more impact on the parameter being estimated than others.
If these "important" values are emphasized by sampling more frequently,
then the estimator variance can be reduced.
32
34. Importance Sampling
Idea: certain values of the input random variables in a simulation have
more impact on the parameter being estimated than others.
34
g(xi)
h(xi)
¯I =
1
N
NX
i=1
g(Xi)
h(Xi)
xi
35. Variance reduction method
Implementing and adapting variance reduction methods requires quite
some effort in programming and mathematical considerations.
The gain in variance reduction should also be judged against this
additional effort.
Is it really worth using a variance reduction method in a specific situation?
35
38. 38
Bit Interleaved Coded Modulation
Spatial multiplexing
§ goal is to maximize transmission rate
§ No rate loss by space coding,
only time coding by channel encoder
Source
Channel
Encoder Π
QAM
Mapper
39. 39
Channel Model
MT = MR = 4 transmit and receive antennas
Received vector:
Quasi-static Rayleigh fading channel
§ each entry modelled as independent, complex, zero-mean, Gaussian random variable
§ H remains constant for multiple time steps
Nr of bits per transmission vector N = MT · Q
41. 41
Monte Carlo Method
search the nearest point, by clever sampling
ˆsML
= arg min
s
||yt Hts||2
Hsi
Each point described by:
8 antennas and 1024 QAM à280 points
42. Gibbs Sampling
A Markov Chain Monte Carlo algorithm
At each each step, replace the value of a variable using the distribution
conditioned on the remaining variables
1. Initialize
2. For
42
⌧ = 1, . . . , T :
{xi : i = 1, . . . N}
x⌧+1
2 ⇠ P(x2|x⌧
1, x⌧
3, . . . , x⌧
N )
x⌧+1
1 ⇠ P(x1|x⌧
2, x⌧
3, . . . , x⌧
N )
43. Gibbs Sampling: MIMO Receiver
For each step, replace the value of a variable using the distribution
conditioned on the remaining variables
1. Initialize best linear solution (MMSE solution)
2. For
43
⌧ = 1, . . . , T :
ˆsMMSE
=
✓
HH
H +
MT
SNR
I
◆ 1
HH
yt
(xi)⌧+1
= ln
P(xi = 0|y, s⌧
⇠xi
)
P(xi = 1|y, s⌧
⇠xi
)
44. Summary: Monte Carlo Methods
Monte Carlo methods are a class of computational algorithms that rely on
repeated random sampling to compute their results.
It is all about how to draw random samples from an expected distribution
Is the population we have available similar to the truth?
44
45. Inverse Transformation Method
Gaussian distribution
Probability Density Function
Uniformrandom
numbergenerator
§ Cumulative Distribution Function
Gaussian distribution
F(x) =
Z x
inf
f(t)dt
x = F 1
(u)
46. Hit-or-miss Method
Problem is not always simple to calculate
• choose x (equally distributed)
in interval where
• choose y (equally distributed)
In interval
• Return x when
else don’t return a value
46
x = F 1
(u)
f(x) 6= 0
[min(f(x)), max(f(x))]
y < f(x)
48. Summary: Monte Carlo Methods
Monte Carlo methods are a class of computational algorithms that rely on
repeated random sampling to compute their results.
It is all about how to draw random samples from an expected distribution
Is the population we have available similar to the truth?
48
50. Acceptance/Rejection Methode
Combination of Hit and Miss
and Inverse transform method
In the rejection sampling method,
samples are drawn from a simple
distribution q(z) and rejected
if they fall in the grey area between
the unnormalized distribution
p(z) and the scaled distribution
kq(z).
The resulting samples are distributed according to p(z), which is the normalized
version of p(z).
First, we generate a number z0 from the distribution q(z).
Next, we generate a number u0 from the uniform distribution over
[0, kq(z0)]. This pair of random numbers has uniform distribution under the curve of the function kq(z).
Finally, if u0 > p(z0) then the sample is rejected, otherwise u0 is retained. T
50
51. 51
Law of Large Numbers
converges to the expected value
Weak law:
For any nonzero margin ε specified, with a sufficiently large sample there will be a very high
probability that the average of the observations will be close to the expected value, that is,
within the margin.
Strong law:
that the sample average converges almost surely to the expected value[
Xn =
1
n
(X1 + · · · + Xn)
Xn ! µ for n ! 1
lim
n!1
Pr |Xn µ| > " = 0
Pr
⇣
lim
n!1
Xn = µ
⌘
= 1
53. Big Picture
53
Statistics
Frequentist
Uses frequent measurements of a data
set or experiment. The trick is the
sampling to extract the desired
information:
Time
Sampling:
à e.g.
Nyquist
Theorem
Space
Sampling:
à e.g.
Integral,
Monte Carlo
Function
Sampling:
à e.g.
Wavelets,
Fourier
Bayesian
Theory
Takes into account all available
information and answers the question
of interest given the particular data set
Maximum Noise
Suppression
à Wiener Filter
Minimum Variance
Estimator:
à Kalman Filter
(PLL)