Gaussian Random Walks: The N-step Distribution

This past evening, my former quiz bowl teammate turned co-worker (not to mention long-time friend) Michael Whittaker tossed a number of math problems my way. One in particular caught my interest:

Say you’ve got a normal random variable with mean zero and variance one. Draw a sample from that distribution. Now take that point you’ve just sampled and spawn a unit-variance normal distribution centered around it; draw a sample from this distribution. Continue this process $N$ times for arbitrary $N$. What is the distribution of the $N$th sample?

Michael told me that he and his friends at Cornell thought up this problem during a lull in a dinner-time conversation. The solution is after the break. A hint: It’s either simpler than it appears or, if you’ve done a bit of math, just as simple as it appears.

* * *

There are at least two ways to tackle this problem: the scientist’s way and the mathematician’s way. We’ll take a look at both; as is often the case, the scientific method builds intuition, which the mathematical method then reifies.

The scientist’s way
When working with distributions, simulations can help you better understand the problem at hand. Even if you can’t determine the shape of the desired distribution from first principles, you can at least get a sense of what it should look like by sampling and plotting using your scientific programming language of choice (Julia’s mine). Like so:

Screen Shot 2016-07-26 at 12.47.30 AM

Screen Shot 2016-07-26 at 12.33.37 AM

Screen Shot 2016-07-26 at 12.32.32 AM

Lo and behold: a normal distribution! The scientist or engineer might call it a day and stop here; that’d be fine. The distribution does, after all, look irrefutably normal. But if an image doesn’t quite satisfy you, if you want to know that it’s normal, then read on.

A wild normal appears! No, I don't play Pokémon Go.

A wild normal appears! No, I don’t play Pokémon Go.

The mathematician’s way
A helpful step in approaching a math problem is to cast it in mathematical language. Here’s a natural way to do so:

Say that we have a sequence $X_1$, $X_2$, $\ldots$, $X_N$ of random variables, where $X_i \sim \mathcal N(X_{i-1}, 1)$, $X_1 \sim \mathcal N(0, 1)$. What is the distribution of $X_N$?

Writing the problem in this form makes it clear that each $X_i$ depends only on the sample drawn directly before it (we say that the process obeys the Markov property ). A wonderful pattern emerges upon closer inspection: $X_i \sim \mathcal N(X_{i-1}, \sigma^2) = X_{i-1} + \mathcal N(0, \sigma^2)$. Unrolling the recursion, we get that $X_N \sim \sum_{i=1}^{N} Y_i$, where the $Y_i$ are i.i.d. zero-mean normal distributions with variance $\sigma^2$. We conclude that $X_N \sim \mathcal N(0, N\sigma^2)$. $\square$

In Julia, the second parameter of the Normal function is the standard deviation of the distribution, not the variance.

In Julia, the second parameter of the Normal function is the standard deviation of the distribution, not the variance.

Screen Shot 2016-07-26 at 1.13.51 AM

Endnotes
[1] The process described in this problem is a Gaussian random walk, a type of Markov process with applications to stock prices.
[2] I’ve uploaded my IPython notebook to Github for reproducibility.

Leave a Reply

Your email address will not be published. Required fields are marked *