This past evening, my former quiz bowl teammate turned co-worker (not to mention long-time friend) Michael Whittaker tossed a number of math problems my way. One in particular caught my interest:

Say you’ve got a normal random variable with mean zero and variance one. Draw a sample from that distribution. Now take that point you’ve just sampled and spawn a unit-variance normal distribution centered around it; draw a sample from this distribution. Continue this process $N$ times for arbitrary $N$. What is the distribution of the $N$th sample?

Michael told me that he and his friends at Cornell thought up this problem during a lull in a dinner-time conversation. The solution is after the break. A hint: It’s either simpler than it appears or, if you’ve done a bit of math, just as simple as it appears.

There are at least two ways to tackle this problem: the scientist’s way and the mathematician’s way. We’ll take a look at both; as is often the case, the scientific method builds intuition, which the mathematical method then reifies.

**The scientist’s way**

When working with distributions, simulations can help you better understand the problem at hand. Even if you can’t determine the shape of the desired distribution from first principles, you can at least get a sense of what it should look like by sampling and plotting using your scientific programming language of choice (Julia’s mine). Like so:

Lo and behold: a normal distribution! The scientist or engineer might call it a day and stop here; that’d be fine. The distribution does, after all, look irrefutably normal. But if an image doesn’t quite satisfy you, if you want to *know* that it’s normal, then read on.

**The mathematician’s way**

A helpful step in approaching a math problem is to cast it in mathematical language. Here’s a natural way to do so:

Say that we have a sequence $X_1$, $X_2$, $\ldots$, $X_N$ of random variables, where $X_i \sim \mathcal N(X_{i-1}, 1)$, $X_1 \sim \mathcal N(0, 1)$. What is the distribution of $X_N$?

Writing the problem in this form makes it clear that each $X_i$ depends only on the sample drawn directly before it (we say that the process obeys the Markov property ). A wonderful pattern emerges upon closer inspection: $X_i \sim \mathcal N(X_{i-1}, \sigma^2) = X_{i-1} + \mathcal N(0, \sigma^2)$. Unrolling the recursion, we get that $X_N \sim \sum_{i=1}^{N} Y_i$, where the $Y_i$ are i.i.d. zero-mean normal distributions with variance $\sigma^2$. We conclude that $X_N \sim \mathcal N(0, N\sigma^2)$. $\square$

Endnotes

[1] The process described in this problem is a Gaussian random walk, a type of Markov process with applications to stock prices.

[2] I’ve uploaded my IPython notebook to Github for reproducibility.