As I write this, classical chess’s top two players are Magnus Carlsen of Norway (rated 2835) and the USA’s Fabiano Caruana, who has a rating of 2832. Very close! But what do the rankings mean?

FIDE1 uses the Elo rating system, a methodical - and mathematical - system for distilling results into a number.

The Elo system works on two levels: as a descriptor of results (which it does very well), and as a predictor of performance (where it’s a bit more ropey).

Let’s start with the prediction element: if two players – let’s call them Arpad and Bobby - have ratings of $R_A$ and $R_B$ respectively, then over the long term you would expect Arpad to win $\frac{1}{1+ 10^{k(R_B - R_A)}}$ of the available points, where $k = \frac{1}{400}$.

That makes some sense: if two players have identical ratings, you would expect each to win half of the games. Carlsen’s advantage of three points over Caruana corresponds to an expectation of winning 50.4% of the points available. Against an average player ranked 1500, he’d win 99.95% of the points2.

The way it’s calculated is quite neat. Suppose, over a given time period, Arpad plays a series of games. He’s 100 points better than Bobby and wins 1.5 points in two games; he has the same score as Judit and loses his game against her; against Garry, who is 300 points better than him, he scores 0.5 out of 3.

How many points *should* Arpad have won from those six games? Against Bobby, the system predicts 1.28 points; playing Judit, half a point; against Garry, 0.45. Overall, he should have won about 2.23 points; in fact, he only won two - so he scored 0.23 points fewer than he should have done.

Arpad’s rating would be adjusted slightly downwards (as he has underperformed). More precisely, we would adjust it downwards by $0.23K$, where $K$ is a constant chosen to prevent ratings changing too quickly.3 Assuming Arpad is not a master chess player, he would lose about seven points off the back of this performance.

(By comparison, when Carlsen and Caruana drew all twelve of their championship matches, Carlsen won 0.048 points fewer than he expected to; this corresponds to something around three-quarters of a point – although the ratings didn’t change. I am unclear on the reasons for this.)

Until Elo devised the system, chess ratings were a bit hit and miss. The first known system, the Harkness system, used the average rating of players in a tournament and updated players’ scores based on their success percentage in the tournament. That sounds reasonable… but didn’t really have a statistical underpinning.

Elo was commissioned to change that. Instead of attaching rewards to each tournament, he developed a system that would track each player’s (recent) average performance.

He made some simplifying assumptions: that each player’s performance in any given game is a normally distributed random variable with a fixed standard deviation. (He was aware of its limitations. In an 1962 issue of *Chess Life*, he compared rating players to measuring “the position of a cork bobbing up and down on the surface of agitated water with a yard stick tied to a rope and which is swaying in the wind”, which rather puts a three-point gap in perspective.)

One of the benefits of Elo’s scheme is its *transitivity*. For example, using the ratings given above, Garry should win 0.85 points a game against Arpad, who expects to win 0.64 points a game against Bobby. In other words, Garry is nearly six times better than Arpad, who is a little short of twice as good as Bobby. Multiplying those together (using the correct numbers) gives 10.

Garry - 400 points ahed of Bobby - wins a bit less than 91% of the games between them. This also comes out as being ten times better!

Árpád Imre Élő, born in the Hungarian Empire in 1903, moved to the US with his parents in 1913. That's where (as far as I can tell) he lost the accents on his name. Eight times Wisconsin State chess champion, he was also a professor of Physics at Marquette University.

He died in Wisconsin in 1992.