A reader asks: how do you do the chain rule?

The chain rule was always my least favourite of the differentiation rules - although the quotient rule has now replaced it as an unnecessary evil. I suspect I didn’t like it when I learnt it because I learnt it more or less by rote rather than by understanding why it worked.

Sidetrack: that’s actually quite typical of the way I learn things. My usual process is “do what the experts tell me to until I get the hang of it, then start asking questions” - I think I just didn’t ask questions about the chain rule.

So, I’ll try to structure the article the same way: start with what you do, end with why you do it. All set?

When you have a complicated Thing to differentiate (let’s say: $y = \sin( ln( x^2 + 5) )$, to take a contrived example, you break it down into smaller functions. For instance, you could say:

$y = \sin(u)$, where… $u = \ln(v)$, where… $v = x^2 + 5$

The idea is that you take a big, messy Thing that screams “I don’t know where to START!” and split it up into usually two or three parts that you can differentiate easily. Let’s do that:

$\frac{dy}{du} = \cos(u)$ $\frac{du}{dv} = \frac{1}{v}$ $\frac{dv}{dx} = 2x$

What’s with the strange letters on the bottom? Ah, well, we’re differentiating with respect to whatever variable we put in the function. That is, if you’re talking about $\sin$ of $u$, the only thing it’s easy to differentiate with respect to is $u$.

Once you’ve done all of the differentiating, then you multiply your answers together:

$\frac{dy}{dx} = \cos(u) \times \frac{1}{v} \times 2x = \frac{2x \cos(u)}{v}$

But wait. We don’t want $u$s and $v$s knocking about; we need $x$s, damnit!

Luckily, we can replace all of the $u$s with $\ln(v)$ (as we defined earlier) to get:

$\frac{dy}{dx} = \frac{2x \cos(\ln(v))}{v}$

… and all of the $v$s with $x^2 + 5$, again from the definition:

$\frac{dy}{dx} = \frac{2x \cos(\ln(x^2 + 5))}{x^2 + 5}$

There we go! It’s a bit of a mess, but that’s to be expected (it was a mess to begin with, after all).

Quick recap of the steps:

1) ‘Unpack’ the function into things you can differentiate. You may need to use product or quotient rule on some bits of it, if they’re feeling mean. Use different letters for each argument to keep things clear. 2) Differentiate each function as usual. No funny business! 3) Multiply the answers together. 4) Replace all the letters you’ve made up with the letter (usually $x$) you started with.

Easy enough?

To explain why it works, I’m going to hand over to the Mathematical Cowboy who observes:

If you have $\frac{dy}{du} \times \frac{du}{dv} \times \frac{dv}{dx}$, the $du$s sort of cancel, and the $dv$s sort of cancel too - you’re left with a $dy$ on the top and a $dx$ on the bottom, making $\frac{dy}{dx}$. Simple!

This is the kind of thing that got the Mathematical Cowboy kicked off of his university analysis course: it’s the sort of thing that works perfectly well at A-level, but is seriously frowned upon by people who care deeply about mathematical rigour. ((If you care deeply about rigour? You probably don’t struggle too much with the chain rule, now, do you?)) By which I mean, it’s like knowing “you stick a zero on to multiply by ten” - it works perfectly well in a vast subset of cases, but it breaks down a bit the further you go (the stick-a-zero on rule doesn’t work for decimals, for example). In the same way, ‘cancelling’ $du$s and $dv$s is a good rule of thumb for now, but you should appreciate there’s more to it than that.