Why are there so many equations for variance?

A student asks:

Why are there so many equations for the variance?

In S1, depending on the board you're working with, you might need to know three equations for variance. For listed data, it's:

$\Var(X) = \frac{\sum x^2}{n} - \left(\frac{\sum x}{n}\right)^2$

For grouped data, it's:

$\Var(X) = \frac{\sum fx^2}{\sum f} - \left(\frac{\sum fx}{\sum f}\right)^2$

And for a discrete random variable, it's:

$\Var(X) = {\sum px^2} - \left({\sum px}\right)^2$

Wow. That's an awful lot of equations.

Until you realise there's just one formula

If you play 'spot the difference' with the four equations, you might notice a few similarities -- for example, they're always the difference between two things. The first one is usually something squared inside a fraction, and the second is usually a fraction squared. That's not a coincidence. You might even notice that the second term in each equation is the square of the mean of the variable. That's not a coincidence, either.

In fact, all of the variance formulas come from one, single master formula:

(Variance) = (mean of the squares) - (square of the mean).

Let's look at them one by one

The first one is probably the easiest:

$\Var(X) = \frac{\sum x^2}{n} - \left(\frac{\sum x}{n}\right)^2$

The first term: you add up all of the squares of the numbers, and divide by how many things are in the list ($n$). That's the mean of the squares of the numbers. The second term is "add up all of the numbers, divide by how many there are, and square the result" -- that's simply squaring the mean.

The second isn't much harder:

$\Var(X) = \frac{\sum fx^2}{\sum f} - \left(\frac{\sum fx}{\sum f}\right)^2$

If you start with the second term and accept that $\frac{\sum fx}{\sum f}$ is the mean of $x$ -- which it is, you've been doing that since GCSE -- then you pretty much have to accept that $\frac{\sum fx^2}{\sum f}$ is the mean of the squares of $x$.

Lastly, the probability-based one:

$\Var(X) = {\sum px^2} - \left({\sum px}\right)^2$

This is really just the same as the previous one, only with $p_i = \frac{f_i}{\sum f_i}$ -- that is to say, the probability of any event is its frequency divided by the total of the frequencies. One you see that, the template falls into place: it's, again, the mean of the squares minus the square of the mean.

Knowing that even helps you in later modules: for a probability density function $f(x)$, the mean of $x$ works out to be $\frac{\int xf(x) \dx}{\int f(x) \dx}$ between appropriate limits. From that, you can jump straight to the variance formula for a continuous random variable:

$\Var(X) = \frac{\int x^2f(x) \dx}{\int f(x) \dx} - \left( \frac{\int xf(x) \dx}{\int f(x) \dx}\right)^2$.

Nice1!

Colin

Colin is a Weymouth maths tutor, author of several Maths For Dummies books and A-level maths guides. He started Flying Colours Maths in 2008. He lives with an espresso pot and nothing to prove.

  1. For not-very-nice values of nice []

Share

One comment on “Why are there so many equations for variance?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up for the Sum Comfort newsletter and get a free e-book of mathematical quotations.

No spam ever, obviously.

Where do you teach?

I teach in my home in Abbotsbury Road, Weymouth.

It's a 15-minute walk from Weymouth station, and it's on bus routes 3, 8 and X53. On-road parking is available nearby.

On twitter