Gaussian samples — Part 2
Contents
2. How does the Box-Muller transform work?
3. Derivation of Box-Muller transform
5. Caveat for changing variables
6. Sampling 2-D Gaussians with Box-Muller transform
Background
In part 1 of this project, I’ve shown how to generate Gaussian samples using the common technique of inversion sampling:
- First, we sample from the uniform distribution between 0 and 1 — green points in the below animation. These uniform samples represent the cumulative probabilities of a Gaussian distribution i.e. the area under the distribution to the left of some point.
- Next, we apply the inverse Gaussian cumulative distribution function (CDF) to these uniform samples. This will transform them to our desired Gaussian samples — blue points in the below animation.
The bad news: the Gaussian inverse CDF is not well-defined, so we have to approximate that function, and the simple Taylor series was used. However, this only samples accurately near the Gaussian mean, and under-samples more extreme values at both ends of the distribution.
Therefore, in this part of the project, we will investigate a more “direct” sampling method that does not depend on the approximation of the Gaussian inverse CDF. This method is called the Box-Muller transform, after the two mathematicians who invented the method in 1958: the British George E. P. Box and the American Mervin E. Muller.
How does the Box-Muller transform work?
For this project, my goal is to generate Gaussian samples in two dimensions i.e. generating samples whose x and y coordinates are independent standard normals (Gaussian with zero mean and standard deviation of 1). In part 1, I used the inverse Gaussian CDF to generate separately the x and y coordinates from their respective uniform samples (U₁ and U₂):
For the Box-Muller transform, I will also start with the same two uniform samples. However, I will transform these uniform samples into the x and y coordinates using much simpler formulas:
Despite the strong coupling between U₁ and U₂ in each of the two formulas above, the generated x and y coordinates, which are both standard Gaussians, are still surprisingly independent from each other! In the derivation of the Box-Muller transform that follows, I will demonstrate why this is indeed this case.
Derivation of Box-Muller transform
We know that for any two independent random variables x and y, the joint probability density f(x, y) is simply the product of the individual density functions: f(x) and f(y). Furthermore, the Pythagoras theorem allows us to combine the x² and y² terms in each of the Gaussian density functions. This results in the -r²/2 term in the exponential of the joint distribution, where r is the distance from the origin to the 2-D Gaussian sample.
To make it simpler, we then define a variable s that is equal to r²/2. In other words, s is simply half of the squared distance from the origin of our Gaussian sample. Written this way, the joint PDF is simply the product of a constant (1/2π) and an exponential (e⁻ˢ).
This is where it gets interesting since we can think of the 1/2π constant as a uniform distribution between 0 and 2π: Unif(0, 2π). On the other hand, the exponential term is nothing but an exponential distribution with the parameter λ=1: Expo(1). Furthermore, since these two terms are multiplied together in the joint distribution, the uniform and the exponential distributions are also independent of each other (see accompanying image).
Geometric interpretation
Geometrically, the uniform distribution between 0 and 2π represents the angle θ that our 2-D Gaussian sample makes with the positive x-axis. This aligns with our intuition about Gaussian samples in 2-D: they are sampled with an equal chance at any angle between 0 and 360 degrees, hence the characteristic round “blob” that they often make.
In contrast, these blobs are always concentrated near the origin, which fits well with the fact that s (half of the squared distance from the origin) follows an exponential distribution: you are more likely to encounter a sample as you move closer to the origin.
This leads us to the underlying principle of the Box-Muller transform:
Instead of sampling independent Gaussians for the x and y coordinates of the samples, we sample their independent uniform angles and exponential half-of-squared distances to origin.
Caveat for changing variables
Technically, when changing random variables of a joint distribution to different ones, we need to multiply the new distribution with the Jacobian of the variable transformation (see here for more details). This ensures that the new distribution is still a valid probability density function i.e. that the “area” under the distribution is still 1. Thankfully, the Jacobian of the transformation from {s, θ} to {x, y} is 1, which greatly simplifies our problem.
Sampling 2-D Gaussians with Box-Muller transform
Once we transform the original problem into sampling a uniform angle and an exponential half-of-squared-distance, the remaining steps are much easier. This is because both the uniform and the exponential distribution have very simple inverse CDF, as outlined in the table below. As a result, we can apply these inverse CDFs to any uniform sample from 0 to 1 to transform it to our desired uniform and exponential sample (see part 1 of the project for an explanation of inverse transform sampling).
The animation below walks through all the steps needed to generate the Gaussian samples in 2-D:
- Sample from two separate uniform sample generators (U₁ and U₂)
- Apply the inverse CDF of the exponential distribution with λ=1 to U₁ to get half of the squared distance from the origin of the sample (s). For simplicity, the inverse CDF is modified from -ln(1-U₁) to -ln(U₁). As a result, this modified function is technically no longer the inverse CDF of the exponential, but it will still output samples that are exponentially distributed. This is because U₁ and 1-U₁ are both uniform samples between 0 and 1.
- Apply the inverse CDF of the uniform distribution between 0 and 2π to U₂ to get the angle of the sample (θ)
- Calculate the distance r from the origin for each sample from its half of squared distance: r=√(2s).
- Lastly, the x and y coordinates are found by simple trigonometry: r cos(θ) and r sin(θ) respectively. Combining the formulas from steps 2 to 5 gives us the formulas seen earlier to transform U₁ and U₂ into x and y.
Coding-wise, generating Gaussian samples using the Box-Muller transform can’t be any easier, as seen in this Python implementation for the 5 steps mentioned above to generate 1000 Gaussian samples in 2-D:
For more detail, you can check it here Link
Please also check N-gram language models and Bayesian Statistics.
Data Science Blog
Please check our other Data Science Blog
Hiring Data Scientist / Engineer
We are looking for Data Scientist and Engineer.
Please check our Career Page.
AI / Data Science Project
Please check about experiences for Data Science Project
Vietnam AI / Data Science Lab
You can also visit Vietnam AI Lab