Collecting coupons

For all the grief I give @reflectivemaths on Wrong, But Useful, he does occasionally ask an interesting question.

In episode 45, he wondered how many packs of Lego cards one would need to acquire, on average, to complete the set of 140?1

A simpler case

Suppose, instead of 140 cards, there was a total of three to collect.

You start (obviously) with no cards. On average, you need to pick up a single card to make sure you have one card in your collection. By on average, I mean 'every time'. The expected time (in cards) to go from 0 cards to 1 card is 1.

If you have one card, how long do you expect to wait until you get to two? Now we're looking at a geometric distribution with a parameter of $\frac{2}{3}$ - every time you get a new card, there are two chances in three of it being the one you don't already have. If $X \sim Geo\left(p\right)$, then $E(X) = \frac{1}{p}$ - and in particular, it will take us on average 1.5 cards to go from one card to two.

And for the last card? Now it's another geometric distribution; this time, only one in three cards helps us, so we're drawing from $Geo\left(\frac{1}{3}\right)$, with an expected value of 3.

Adding these up gives $1 + \frac{3}{2} + 3 = 5.5$ - although it's not really the number we care about.

Breaking it down, this is three geometric distributions, with parameters $\frac{3}{3}$, $\frac{2}{3}$ and $\frac{1}{3}$, giving expected times of $\frac{3}{3}$, $\frac{3}{2}$ and $\frac{3}{1}$ - or $3 \left(1+\frac{1}{2}+\frac{1}{3}\right)$.

The general and particular cases

A few moments' thought shows that if you have $n$ cards to collect, you have $n$ geometric distributions, with parameters $\frac{n}{n}$, $\frac{n-1}{n}$, ... $\frac{1}{n}$ - and so your total expected time is $n\left(\frac 1n + \frac1{n-1} + \frac{1}{n-2}+ ... + \frac{1}{2} + 1\right)$.

That sum in the bracket? That's known as the $n$th harmonic number, $H_n$. In the Sainsbury's case, your expected value turns out to be $140 H_{140} \approx 773$ cards - or close to 200 packets.

What if you think about shiny and normal?

In fact, the packets are made of a single shiny card (of which there are 36 to collect) and three regular cards (numbering 104).

To collect 36 silver cards (forgetting anything else) would take $36 H_{36} \approx 150$ cards. Collecting 104 regular cards would require $104 H_{104} \approx 544$ cards. Altogether, having this fairly mild guarantee about the structure means you'd expect to need 694 cards, or 173 packs, to finish your collection.

Is there a quick way to calculate $H_n$?

Because $\int_1^n \frac{1}{x} \d x = \ln(n)$, it's reasonable to expect $\sum_1^n \frac{1}{n}$ to be related to $\ln(n)$ as well. It is - but there's a mysterious discrepancy. In fact, $\sum_1^n \frac{1}{n} \approx \ln(n) + \gamma$, where $\gamma$ is the Euler-Mascheroni constant, about 0.577.

In our original 140-card problem without silver cards, this would give $140 (\ln(140) + \gamma)$; $\ln(140) = \ln(7)+\ln(20) \approx 5$, and we end up with $140 \times 5.5 \approx 770$. That's not at all bad!

This puzzle is known as The Coupon Collector's Problem - and I love how it falls apart so logically!


Colin is a Weymouth maths tutor, author of several Maths For Dummies books and A-level maths guides. He started Flying Colours Maths in 2008. He lives with an espresso pot and nothing to prove.

  1. For the sake of simplicity, let's assume that the 140 cards are equally likely to show up in any given pack of four, and that there are infinitely many of each. []


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up for the Sum Comfort newsletter and get a free e-book of mathematical quotations.

No spam ever, obviously.

Where do you teach?

I teach in my home in Abbotsbury Road, Weymouth.

It's a 15-minute walk from Weymouth station, and it's on bus routes 3, 8 and X53. On-road parking is available nearby.

On twitter