Deriving the Poisson Distribution

Where does the Poisson Distribution come from?

A little bit of research1 tells us that the distribution was originally introduced by Abraham de Moivre in 1710 in an article called “On the Measurement of Chance, or, on the Probability of Events in Games Depending Upon Fortuitous Chance” 2 (not the original title).

A few steps that will get us there is laid out below.

Let’s start with a simple “rate” problem.

How likely is it that you’ll receive 12 visits next week given that you had 10 visitors this week?

This is clearly not a question to be answered based on a lot of information. However when the assumption is that on average you get 10 visitors a week, then the question sounds a bit more reasonable.

While we say the rate is 10 visitors per week, that’s obviously not going to be uniformly distributed. Maybe given the nature of your site, people are more likely to visit over the week-end rather than during the week. However for the purpose of this analysis we are going to assume that the rate is uniform.

Simplifying things a bit more, we can ask the question in a different way:

How likely is that that the rate of visitors for your site will be 127\frac{12}{7} per day next week given that the usual rate is 107\frac{10}{7} per day?

Or in another way:

How likely is that that the rate of visitors for your site will be 12724\frac{12}{7 * 24} per hour next week given that the usual rate is 10724\frac{10}{7 * 24} per hour?

Why do this? Without much explanation this reformulation gets us to a rate that’s less than one per unit of time being considered. This way we can now look at each time unit and assume that we’ll either get 1 visitor or none any given hour.

So each hour, there’s a probability of λ\lambda that there will be a user, and 1λ1 - \lambda that there won’t. Here we are using λ=10724\lambda = \frac{10}{7 * 24} for convenience. The latter is also the “rate” of events or arrivals. The term arrivals or occurrences is common in literature.

The probability that there will be 12 visitors is the probability that there will be 12 hour slots where there’s an arrival and the rest will be slots without an arrival.

Treating this as a list of one-hour “slots”, the outcome would look like:

0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0

The number in each box is the number of arrivals during that slot which is either 1 or 0. So there are 2n2^n possible outcomes where n=724n = 7 * 24. Each 1 has a probability of λ\lambda and 0 has a probability of 1λ1 - \lambda.

So for there to be exactly kk arrivals – where k=12k = 12 which is the target – there needs to be kk ones and nkn - k zeros.

So the probability of one of those events happening is λk(1λ)nk\lambda^k (1 - \lambda)^{n-k}.

But there are lots of ways of choosing kk boxes out of nn. In fact there is exactly (nk)\binom{n}{k} ways.

So the probability of getting exactly kk arrivals is:

Pr(k)=(nk)λk(1λ)nk\Pr(k) = \binom{n}{k} \lambda^k (1-\lambda)^{n - k}

With me so far?

Astute readers will note that we did some hand-waving up there. Specifically we assumed that the possibilities for any 1-hour block is either there being an arrival or there not being one. This isn’t always the case since there could be more than 1 arrival in any given hour.

One way to address this is to consider increasingly smaller units of time. For example, if we divide the one-hour block into mm equal sized pieces then it’ll be a bit more likely – assuming m>1m \gt 1 – that there will be at most 1 arrival during that time period. In fact if mm is very large, then it is increasingly likely that the choices are either 1 or 0.

So the number of boxes is now nmn*m, and the probability of an arrival during any single timeslot is now λm\frac{λ}{m}. Substituting these, our new probability is:

Pr(k)=(nmk)(λm)k(1λm)nmk\Pr(k) = \binom{nm}{k} \left( \frac{\lambda}{m} \right)^k \left(1 - \frac{\lambda}{m}\right)^{nm - k}

Alright. We are getting somewhere now. The next step is to increase mm all the way to \infty.

limmPr(k)=limm(nmk)(λm)k(1λm)nmk=limm(nm)!k!(nmk)!(λm)k(1λm)nmk\begin{align*} \lim_{m \to \infty} \Pr(k) & = \lim_{m \to \infty} \binom{n*m}{k} ( \frac{λ}{m} )^k (1 - \frac{λ}{m})^{nm - k} \\ & = \lim_{m → \infty} \frac{(nm)!}{k! (nm - k)!} ( \frac{λ}{m} )^k (1 - \frac{λ}{m})^{nm - k} \end{align*}

The right side looks a bit daunting, the we can simplify things quite a bit by looking at each part separately.

For example:

(nmk)(λm)k=(nm)!k!(nmk)!(λm)k=nm(nm1)(nm2)...(nmk+1)k!(λm)k=(nmm)(nm1m)...(nmk+1m)λkk!\begin{align*} \binom{nm}{k}\left( \frac{\lambda}{m} \right)^k & = \frac{(nm)!}{k! (nm - k)!}\left( \frac{\lambda}{m} \right)^k \\ & = \frac{nm \cdot (nm - 1) \cdot (nm - 2) \cdot ... (nm - k + 1)}{k!}\left( \frac{\lambda}{m} \right)^k \\ & = \left( \frac{nm}{m} \right) \left( \frac{nm - 1}{m}\right) ... \left( \frac{nm - k + 1}{m}\right) \frac{\lambda^k}{k!} \end{align*}

Furthermore:

(nmm)(nm1m)...(nmk+1m)λkk!=λkk!i=0k1(nkim)\left( \frac{nm}{m} \right) \left( \frac{nm - 1}{m}\right) ... \left( \frac{nm - k + 1}{m}\right) \frac{\lambda^k}{k!} = \frac{\lambda^k}{k!}\prod_{i=0}^{k-1} \left(n - \frac{k - i}{m}\right)

As limm\lim_{m → \infty} we see that kim\frac{k-i}{m} tends to 00. This makes our lives a bit easier.

Now we have:

Pr(k)=limm(nλ)kk!(1λm)nmk=limm(nλ)kk!(1λm)nm(1knm)\begin{align*} \Pr(k) &= \lim_{m \to \infty}\frac{(n \lambda)^k}{k!} \left(1 - \frac{\lambda}{m} \right)^{nm - k} \\ &= \lim_{m \to \infty}\frac{(n \lambda)^k}{k!} \left(1 - \frac{\lambda}{m} \right)^{nm(1 - \frac{k}{nm})}\end{align*}

Remember that:

limn(1+xn)n=ex\lim_{n \to \infty} \left( 1 + \frac{x}{n} \right)^n = e^x

So substituting nnmn \to nm and xλnx \to - \lambda n gets us:

limnm(1λm)nm=eλn\lim_{nm \to \infty} \left( 1 - \frac{\lambda}{m} \right)^{nm} = e^{- \lambda n}

Now going back to Pr(k)\Pr(k):

Pr(k)=(nλ)keλnk!\Pr(k) = \frac{(n \lambda)^k e^{- \lambda n}}{k!}

Let’s do another substitution. Remember that λ\lambda is the number of arrivals during nn time periods. If we consider nn time periods (originally hours) to be a single unit of time, then our rate of arrivals is λn\lambda n. So without loss of generality let’s substitute λnλ\lambda n \to \lambda.

This gives us:

Pr(k)=λkeλk!\Pr(k) = \frac{\lambda^k e^{- \lambda}}{k!}

… which is the PDF of the Poisson distribution.

n.b.: In retrospect it would’ve been easier to consider the unit of time to be nn hours in the first place, but I’m too lazy to go back and rewrite everything.


  1. Also known as reading the relevant Wikipedia article. In this case that’s the History section of the article on the Poisson distribution.↩︎

  2. The original article in Latin is available via The Royal Society Publishing.↩︎

Last modified: December 11, 2020