# Deriving the Poisson Distribution

Where does the Poisson Distribution come from?

A little bit of research^{1} tells us that the
distribution was originally introduced by Abraham de
Moivre in 1710 in an article called *“On the Measurement of
Chance, or, on the Probability of Events in Games Depending Upon
Fortuitous Chance”* ^{2} (not the original title).

A few steps that will get us there is laid out below.

Let’s start with a simple “rate” problem.

How likely is it that you’ll receive 12 visits next week given that you had 10 visitors this week?

This is clearly not a question to be answered based on a lot of
information. However when the assumption is that *on average* you
get 10 visitors a week, then the question sounds a bit more
reasonable.

While we say the *rate* is 10 visitors per week, that’s
obviously not going to be uniformly distributed. Maybe given the nature
of your site, people are more likely to visit over the week-end rather
than during the week. However for the purpose of this analysis we are
going to assume that the rate is uniform.

Simplifying things a bit more, we can ask the question in a different way:

How likely is that that the rate of visitors for your site will be \frac{12}{7} per day next week given that the usual rate is \frac{10}{7} per day?

Or in another way:

How likely is that that the rate of visitors for your site will be \frac{12}{7 * 24} per hour next week given that the usual rate is \frac{10}{7 * 24} per hour?

Why do this? Without much explanation this reformulation gets us to a rate that’s less than one per unit of time being considered. This way we can now look at each time unit and assume that we’ll either get 1 visitor or none any given hour.

So each hour, there’s a probability of \lambda that there will be a user, and 1 - \lambda that there won’t. Here we are
using \lambda = \frac{10}{7 * 24} for
convenience. The latter is also the “rate” of events or arrivals. The
term *arrivals* or *occurrences* is common in
literature.

The probability that there will be 12 visitors is the probability that there will be 12 hour slots where there’s an arrival and the rest will be slots without an arrival.

Treating this as a list of one-hour “slots”, the outcome would look like:

0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | … | 0 |

The number in each box is the number of arrivals during that slot which is either 1 or 0. So there are 2^n possible outcomes where n = 7 * 24. Each 1 has a probability of \lambda and 0 has a probability of 1 - \lambda.

So for there to be exactly k arrivals – where k = 12 which is the target – there needs to be k ones and n - k zeros.

So the probability of one of those events happening is \lambda^k (1 - \lambda)^{n-k}.

But there are lots of ways of choosing k boxes out of n. In fact there is exactly \binom{n}{k} ways.

So the probability of getting exactly k arrivals is:

\Pr(k) = \binom{n}{k} \lambda^k (1-\lambda)^{n - k}

With me so far?

Astute readers will note that we did some hand-waving up there. Specifically we assumed that the possibilities for any 1-hour block is either there being an arrival or there not being one. This isn’t always the case since there could be more than 1 arrival in any given hour.

One way to address this is to consider increasingly smaller units of time. For example, if we divide the one-hour block into m equal sized pieces then it’ll be a bit more likely – assuming m \gt 1 – that there will be at most 1 arrival during that time period. In fact if m is very large, then it is increasingly likely that the choices are either 1 or 0.

So the number of boxes is now n*m, and the probability of an arrival during any single timeslot is now \frac{λ}{m}. Substituting these, our new probability is:

\Pr(k) = \binom{nm}{k} \left( \frac{\lambda}{m} \right)^k \left(1 - \frac{\lambda}{m}\right)^{nm - k}

Alright. We are getting somewhere now. The next step is to increase m all the way to \infty.

\begin{align*} \lim_{m \to \infty} \Pr(k) & = \lim_{m \to \infty} \binom{n*m}{k} ( \frac{λ}{m} )^k (1 - \frac{λ}{m})^{nm - k} \\ & = \lim_{m → \infty} \frac{(nm)!}{k! (nm - k)!} ( \frac{λ}{m} )^k (1 - \frac{λ}{m})^{nm - k} \end{align*}

The right side looks a bit daunting, the we can simplify things quite a bit by looking at each part separately.

For example:

\begin{align*} \binom{nm}{k}\left( \frac{\lambda}{m} \right)^k & = \frac{(nm)!}{k! (nm - k)!}\left( \frac{\lambda}{m} \right)^k \\ & = \frac{nm \cdot (nm - 1) \cdot (nm - 2) \cdot ... (nm - k + 1)}{k!}\left( \frac{\lambda}{m} \right)^k \\ & = \left( \frac{nm}{m} \right) \left( \frac{nm - 1}{m}\right) ... \left( \frac{nm - k + 1}{m}\right) \frac{\lambda^k}{k!} \end{align*}

Furthermore:

\left( \frac{nm}{m} \right) \left( \frac{nm - 1}{m}\right) ... \left( \frac{nm - k + 1}{m}\right) \frac{\lambda^k}{k!} = \frac{\lambda^k}{k!}\prod_{i=0}^{k-1} \left(n - \frac{k - i}{m}\right)

As \lim_{m → \infty} we see that \frac{k-i}{m} tends to 0. This makes our lives a bit easier.

Now we have:

\begin{align*} \Pr(k) &= \lim_{m \to \infty}\frac{(n \lambda)^k}{k!} \left(1 - \frac{\lambda}{m} \right)^{nm - k} \\ &= \lim_{m \to \infty}\frac{(n \lambda)^k}{k!} \left(1 - \frac{\lambda}{m} \right)^{nm(1 - \frac{k}{nm})}\end{align*}

Remember that:

\lim_{n \to \infty} \left( 1 + \frac{x}{n} \right)^n = e^x

So substituting n \to nm and x \to - \lambda n gets us:

\lim_{nm \to \infty} \left( 1 - \frac{\lambda}{m} \right)^{nm} = e^{- \lambda n}

Now going back to \Pr(k):

\Pr(k) = \frac{(n \lambda)^k e^{- \lambda n}}{k!}

Let’s do another substitution. Remember that \lambda is the number of arrivals during n time periods. If we consider n time periods (originally hours) to be a single unit of time, then our rate of arrivals is \lambda n. So without loss of generality let’s substitute \lambda n \to \lambda.

This gives us:

\Pr(k) = \frac{\lambda^k e^{- \lambda}}{k!}

… which is the PDF of the Poisson distribution.

n.b.: In retrospect it would’ve been easier to consider the unit of time to be n hours in the first place, but I’m too lazy to go back and rewrite everything.

…

…