Generating Normal Random Variables from Uniforms
You have a reliable generator for $\text{Uniform}(0,1)$ random variables. Describe a method for producing standard normal $N(0,1)$ samples from these uniform draws.
Specifically:
- State the transformation and prove that it produces standard normals.
- Why does this work geometrically?
- Are there practical alternatives, and when would you prefer them?
Hints
- Think about the joint distribution of two independent standard normals. What does it look like in polar coordinates?
- In polar form, the angle is uniform on $[0, 2\pi)$ and the squared radius $R^2 \sim \text{Exp}(1/2)$. Both are easy to sample from a uniform.
- Set $R = \sqrt{-2 \ln U_1}$ and $\Theta = 2\pi U_2$, then convert back to Cartesian: $Z_1 = R\cos\Theta$, $Z_2 = R\sin\Theta$.
Worked Solution
How to Think About It: The core challenge is mapping a "flat" distribution (uniform) to a "bell-shaped" one (normal). The trick is to think in two dimensions: a single standard normal is hard to invert (the CDF has no closed form), but a pair of independent standard normals has a beautifully simple structure in polar coordinates. The joint density $\frac{1}{2\pi}e^{-(x^2+y^2)/2}$ separates into a radial part and an angular part -- and both of those are easy to sample from using uniforms. That is the key insight behind Box-Muller: go to 2D polar coordinates where everything factors.
Key Insight: Two independent standard normals, viewed as a point $(Z_1, Z_2)$ in the plane, have a joint distribution where the angle is uniform on $[0, 2\pi)$ and the squared radius is $\text{Exp}(1/2)$ (equivalently $\chi^2(2)$). Both of these are trivial to generate from uniforms.
The Method:
- Draw two independent uniforms $U_1, U_2 \sim U(0,1)$.
- Set the radius: $R = \sqrt{-2 \ln U_1}$.
- Set the angle: $\Theta = 2\pi U_2$.
- Convert to Cartesian coordinates:
$Z_1 = R \cos \Theta = \sqrt{-2 \ln U_1} \, \cos(2\pi U_2)$
$Z_2 = R \sin \Theta = \sqrt{-2 \ln U_1} \, \sin(2\pi U_2)$
Then $Z_1$ and $Z_2$ are independent $N(0,1)$ random variables.
Proof: Write the joint density of $(Z_1, Z_2)$ in polar coordinates $(r, \theta)$:
$f(z_1, z_2) = \frac{1}{2\pi} e^{-(z_1^2 + z_2^2)/2}$
Substituting $z_1 = r\cos\theta$, $z_2 = r\sin\theta$ with Jacobian $r$:
$f(r, \theta) = \frac{r}{2\pi} e^{-r^2/2}, \quad r \geq 0, \; \theta \in [0, 2\pi)$
This factors as $f(r, \theta) = \left(r \, e^{-r^2/2}\right) \cdot \frac{1}{2\pi}$. The angular part is $\text{Uniform}(0, 2\pi)$, generated by $\Theta = 2\pi U_2$. For the radial part, the CDF of $R$ is $F_R(r) = 1 - e^{-r^2/2}$, so setting $U_1 = 1 - e^{-R^2/2}$ and inverting gives $R = \sqrt{-2 \ln(1 - U_1)}$. Since
Practical Considerations:
- Trig is expensive. The two calls to $\cos$ and $\sin$ are the main bottleneck. The Marsaglia polar method avoids trig entirely: generate $(V_1, V_2)$ uniform on $(-1,1)^2$, reject if $S = V_1^2 + V_2^2 \geq 1$ (about 21% rejection rate). Then $Z_1 = V_1 \sqrt{-2 \ln S \,/\, S}$ and $Z_2 = V_2 \sqrt{-2 \ln S \,/\, S}$ are independent standard normals. This replaces trig with rejection, which is faster on most hardware.
- Inverse CDF method. You can also use $Z = \Phi^{-1}(U)$ directly. The normal CDF has no closed form, but rational approximations (e.g., Beasley-Springer-Moro) give machine-precision results. This is the standard approach in quasi-Monte Carlo and GPU implementations because it is one-to-one and preserves the low-discrepancy structure of the input sequence.
- Ziggurat algorithm. For maximum speed, the Ziggurat method uses precomputed horizontal rectangles to cover the normal density, achieving $O(1)$ expected time with minimal rejection. This is what most high-performance libraries use.
- Numerical caution. If $U_1$ is exactly 0, $\ln(0)$ is undefined. In practice, clamp $U_1$ away from 0 or use open-interval uniforms.
Answer: The Box-Muller transform converts two independent $U(0,1)$ draws into two independent $N(0,1)$ samples via $Z_1 = \sqrt{-2 \ln U_1}\cos(2\pi U_2)$ and $Z_2 = \sqrt{-2 \ln U_1}\sin(2\pi U_2)$. It works because a 2D standard normal in polar coordinates decomposes into a uniform angle and an exponentially-distributed squared radius, both trivially generated from uniforms. In practice, the Marsaglia polar method or the inverse CDF approach are preferred for speed or for quasi-Monte Carlo compatibility.
Intuition
The reason Box-Muller works is one of those beautiful accidents of dimension: a single normal is hard to invert (no closed-form CDF), but a pair of independent normals has a joint density that is perfectly radially symmetric. In polar coordinates, the angle and radius become independent, and both have distributions you can sample from with simple uniform transformations. You trade one hard problem (inverting $\Phi$) for two easy ones (exponential and uniform on a circle).
This idea -- that going to a higher dimension can simplify sampling -- shows up constantly in computational finance. Copulas factor joint distributions into marginals and dependence structure. Gibbs samplers break a hard multivariate distribution into easy conditional ones. Even importance sampling is really about finding a space where the target is easier to hit. The general lesson: when a distribution is hard to sample directly, look for a change of variables or a higher-dimensional representation where everything factors nicely.