Written by Colin+ in geometry, statistics 1.
If I had £35 every time a student said "I don't get linear interpolation," I'd have pretty much the same business model as I do right now.
Everyone knows it's something to do with finding medians and quartiles, and something to do with the class width and... stuff. Some can even start writing down a fraction.
However, there's another way that involves less remembering and more... doing maths.
You remember at GCSE, if you wanted to find the median from a cumulative frequency graph, you'd draw a line across from the index of the median and follow it down to the axis. That's exactly what you do with linear interpolation except:
Well, if you were to draw the graph, it ought to go through the point at the end of the class below the median, and the end of the class containing the median. For example, if we had the table:
Height | Frequency | Cumulative |
---|---|---|
$0 \le x \lt 10$ | 5 | 5 |
$10 \le x \lt 20$ | 8 | 13 |
$20 \le x \lt 30$ | 19 | 32 |
$30 \le x \lt 40$ | 4 | 36 |
Since there are 36 whatevers, the median will be the height of the 18th thing. That's in the third group - which begins at (20, 13) and ends at (30, 32). All we need to do is find the gradient of the line through those two points, and then the value of $x$ that gives us $y=18$.
So, the gradient is $\frac{32-13}{30-20} = 1.9$, which makes the equation of the line $y - 13 = 1.9(x-20)$, picking the lower point. It works just as well with the other.
Now, we want where $y=18$, so substitute that in to get $5 = 1.9 (x-20)$ and you can say $x-20 = \frac{5}{1.9}$ - or $x = 22.63$. That's your interpolated value for the median!
I'd be remiss in my duties if I didn't tell you to watch out for sneaky frequency tables that don't quite join up - where the measurements are rounded to the nearest whole number or similar. In that case, the lower end is half a unit lower than you might expect, and the upper end half a unit higher. For example, if you had classes of 7-8, 9-11 and 12-16 (all to the nearest whole number), the middle class is really 8.5 to 11.5.
This bit of sneakiness affects any method you use for finding the median - you'd do well to watch out for it!
(Edited 17/2/2014 to fix a typo)
srcav
Nice post on linear interpolation : http://t.co/baQLdRLxmG
nick
Hullo, we did linear interpolation today, please could I run a couple of things past you – in the example above, why is the first point 13,20 and not 14,20? Also you have the median position as n/2 , we have Ben told today to always use ( n+1)/2 and round up if need be.. And then lastly someone said something about this median position being different for aqa and edecel! Any truth in that? Oh one last thing, we didn’t use this graphical way and y=MX+c method at all, we drew the class interval as 13 to 32 on top with an 18 sandwiched in between, and 20 to 30 on the bottom, so the answer was got via a ratio calculation, so (18-13)/q-20 equal to (32-13)/30-20. Might be nice to add this method as well? But I’ll definitely mention the y=MX+c method tomorrow! Thanks!
Colin
Hi, Nick,
It’s (13,20) because otherwise we have two distinct whatevers with a height of 20 (the last in the 10-20 group and the first in the 20-30).
As for the $\frac n2$ versus $\frac{n+1}{2}$ question, my understanding is that you’d use $\frac{n+1}{2}$ for a list of numbers and $\frac{n}{2}$ for a distribution like this; in practice, it makes very little difference and (to the best of my recollection) the mark scheme allows either in this kind of question.
Towards the start of the post, I refer to the main method — the idea of the TMTOWTDI posts is to give an alternative to the way you’re normally taught.
Tam
Colin thanks for the excellent explanation. I have just one quick question relating to the sneaky frequency tables mentioned at the end.
Say I had a table that went
0-2
3-5
6-8
9-11
I understand the second and third ranges are actually 2.5-5.5 and 5.5-8.5 respectively.
But what happens if the median was in the first or last ranges? Would they be 0-2.5 and 8.5-11 respectively?
Thanks
Colin
That’s a great question, and one I don’t have a one-size-fits-all answer for. I think:
* If you got a question where that detail was significant (which is unlikely), either answer would be accepted as correct.
* There may be some context that makes it clear – for example, it might make no sense for the thing you’re measuring to be negative, in which case 0 would be the lower bound
* The whole thing is ridiculous – giving an answer “correct” to several significant figures when you have literally no idea of the distribution inside each range is a perfect example of precision rather than accuracy.
Gemgem
Hi. I have a question. This is may assignment. I hope you can help me. How to find for decile with linear interpolation?. Thanks in advance
Colin
It works much the same way! If you’re interested in the first decile, instead of finding the middle number, you find the number a tenth of the way through – if you had 36 numbers, you’d want number 3.6 – and go through exactly the same process.