Written by Colin+ in statistics.

A student asked:

What's the link between the Poisson formula and the binomial?

... and I started to cry a little bit.

You use the Poisson distribution when you have events happening at a constant rate, on a continuous time-frame - as opposed to the binomial, which involves $n$ distinct events happening, each with its own probability. They are, however, related: in fact, as $n$ gets large, the binomial looks more and more like a Poisson (the distinct events get closer and closer together and, in the limit, become continuous.)

You can see that the binomial formula for having $r$ successes when you expect $\lambda$ is:

$^nC_r p^r (1-p)^{n-r}$, where $p := \frac{\lambda}{n}$. I plan to show that, as $n \rightarrow \infty$, that's the same thing as the Poisson formula, $ e^{-\lambda} \frac{\lambda^r}{r!}$.

I've got to tell you, it's not very nice - even when you take natural logs: we're starting with $\ln\left( ^nC_r \right) + r \ln(p) + (n-r) \ln(1-p)$ and want to get to $-\lambda + r \ln(\lambda)- \ln(r!)$.

OK - well, $\ln(p) =\ln( \lambda) - \ln(n)$, and $\ln(1-p) = \ln(n-\lambda) - \ln (n)$. Let's sub those in:

$LHS = \ln \left( ^n C_r \right) + r\ln (\lambda) - r\ln(n) + (n-r) \ln(n-\lambda) - (n-r)\ln(n)$

$ = \ln \left( ^n C_r \right) + r \ln(\lambda) + (n-r) \ln(n-\lambda) - n \ln (n)$, as the $r\ln(n)$ terms vanish.

... is that we have the $r \ln(n)$ term. The bad news is, we have a lot more besides.

*Stirling's approximation* is a neat way to estimate factorials for large numbers: $\ln(z!) \simeq \frac 12 \ln(2\pi) + \left(z + \frac 12\right) \ln(z) - z$. It leads to a rather messy expression for $\ln\left(^nC_r\right) \simeq \left(n+\frac12\right) \ln (n) - \left(n -r + \frac 12\right) \ln(n-r) - r - \ln (r!)$.

You see why I cried a bit? Notice, too, that $r$ isn't necessarily very big, so I haven't Stirlinged it. (I've also got one eye on the final answer: I want an $r!$ in the there!)

I can make it nicer, though - I know, because it's going to infinity, that $n$ is *much* bigger than $\frac 12$. $n - r$ is also much bigger than $\frac12$, so I'm going to kick out some halves.

$\ln\left(^nC_r\right) \simeq n\ln (n) - (n -r ) \ln(n-r) - r - \ln(r!)$.

Back to what we had:

$LHS = \ln \left( ^n C_r \right) + r \ln(\lambda) + (n-r) \ln(n-\lambda) - n \ln (n)$

$ = n\ln (n) - (n -r ) \ln(n-r) - r - \ln(r!) + r \ln(\lambda) + (n-r)\ln(n-\lambda) - n\ln(n)$ - aha! the $n\ln(n)$s vanish.

$ = - (n -r ) \ln(n-r) - r - \ln(r!) + r \ln(\lambda) + (n-r)\ln(n-\lambda) $ - now let's combine the things with $(n-r)$ at the front:

$ = (n-r) \left[ \ln(n-\lambda) - \ln(n-r) \right] - r - \ln(r!) + r\ln(\lambda)$. All we need now is to show that the first two terms converge towards $-\lambda$ and we have the RHS.

For smallish $z$, $\ln(1 + z) \simeq z$. Unfortunately, we don't have $\ln(1 + x)$, we have $\ln(n + \text{stuff})$. However, we can factor that out (and get the nice bonus that the $\ln(n)$s we produce vanish, too. That first term is equivalent to:

$(n-r) \left[ \ln\left(1-\frac{\lambda}{n}\right) - \ln\left(1-\frac{r}{n}\right) \right]$

For large $n$, that tends to:

$(n-r) \left[ - \frac{\lambda}{n} + \frac{r}{n} \right] = \frac{(\lambda-r)(r-n)}{n}$

Back to where we were:

$LHS = (n-r) \left[ \ln(n-\lambda) - \ln(n-r) \right] - r - \ln(r!) + r\ln(\lambda)$

$ = \frac{(\lambda-r)(r-n)}{n} - r - \ln(r!) + r\ln(\lambda)$

Now, I'm going to move the rogue $r$ into the fraction and expand the bracket:

$ = \frac{(\lambda r - n \lambda - r^2 + rn) - rn}{n} - \ln(r!) + r\ln(\lambda)$

Oh, goody, the $rn$s vanish:

$ = \frac{\lambda r - n \lambda - r^2}{n} - \ln(r!) + r\ln(\lambda)$

$ = \frac{\lambda r}{n} - \lambda - \frac{r^2}{n} - \ln(r!) + r\ln(\lambda)$

As $n \rightarrow \infty$, the two fractional terms go to zero, leaving us with:

$= - \lambda - \ln(r!) + r\ln(\lambda) = RHS$, and a well-deserved $\blacksquare$.

## Cav

Excellent post,.although I feel you may have spent a bit too long in conversation with Dave “the king of stats” Gale!

I’m also shocked that you would mention “Stirling’s Approximation” and don’t let your readers know that all he did was work out a constant and steal the credit for what should, by rights, be called “de Moivre’s Approximation”!

## Colin

You’re right, the Ninja would never have let that slip by ðŸ˜‰