Monday saw
the conclusion of the 2012-13 series of University Challenge where,
much to my annoyance, Manchester avenged their quarter-final defeat to University College London to take home the trophy. It's the first time the title has been retained since Magdalen College, Oxford secured back-to-back victories in 1997 and 1998 (an institution Manchester now also join on four victories in the prestigious quiz).
Teams from
Cottonopolis have become a familiar sight on Monday night BBC2 TV schedules of late: in the last eight years they've lifted the trophy four times and only failed to make the semi-finals once (when their team didn't make the cut to appear on the programme). There seems little questioning their status as the most consistent institution over the years, but do the data back this up? While we're at it, can we identify the Best Team Ever? It's time for a statistical adventure.
First up: what data? University Challenge first aired in 1963, running for 25 years before being taken off air in 1987. Picked up again eight years later, it has been a fixture of television schedules ever since, and with
full round-by-round scores available from 1995 onwards it's this 'Paxman era' where we'll be focusing our attention.
Now we have some data, the next question is how to compare teams both within and across series. At a basic level things are straightforward: it seems fair to assume that the series champions were the best team in that series, while the runners-up were - by definition - second best. But how do you compare the two losing semi-finalists, or compare this year's winners to the champions from 1995?
There are, of course, numerous ways we could derive metrics to compare teams (indeed, there are plenty of established methods in existence) but I wanted to build my own, as-simple-as-possible, model based on three 'intuitive' principles:
1) Progressing further in the competition is better
2) Losing by a small margin is better than losing by a large margin
3) Losing to a team that goes on to do well is better than losing to a team that goes on to do badly
The first of these is made straightforward by the (relatively) consistent tournament structure on the show. Since 1995 every series has featured five rounds, so I decided to assign every team a Baseline score from 1 to 6 based on their stage of elimination from the show: 1 point for losers in the first round (or highest scoring losers who lose their playoff match), 2 if you went out in the second round, and so on up to 5 points for the losing finalists and 6 for the series champions. This measure is the first element of comparing teams within a series: a higher Baseline score means a better performance. The problem is how to separate teams who were eliminated at the same stage of competition, which is where we try and incorporate the second and third principles.
How far a team progressed in a series is one half of how good they are: the other, of course, is how they fared against - and the quality of - the opponents they met along the way. For opponent strength we have a ready-made statistic in their Baseline score, and we can use the scoreline from each of their games to see how well they did. A typical approach in tournaments is to look at the margin of victory or defeat - the 'spread' - but I decided instead to look at the proportion of the total points scored in a game that were picked up by either team. This means that the effect varying question difficulty across rounds (or even series) is moderated, and also gives us a handy metric of 'performance' in a game in the form of a percentage: if a team lose 150-50 then they picked up 25% of the points in that game, while if they were pipped 155-150 it would be almost 50%.
By multiplying the percentage of points scored in a game by the opponent's Baseline score, we get a measure of performance which I've imaginatively called Performance score. For example, suppose you lost in the first round 150-50 to a team who went on to win the series. Your opponent's Baseline score would be 6, while your points percentage for that game is 25%. Combining these gives you a Performance score of 25% x 6 = 1.5. Your opponents, meanwhile, bagged 75% of the points available, but as first round losers your Baseline score is just 1. They therefore pick up a Performance score of 0.75. It might seem a bit odd that you get more points for losing than they do for winning, but remember that this measure is only used to compare teams who were eliminated at the same stage of the competition, so this comparison doesn't really mean anything.
From here, we can calculate every team's average Performance score across all of their games, giving a measure of the strength of their opponents and how well they fared against them. We can then use this metric as a tie-breaker to separate teams who have the same number of wins. For example, if we apply this strategy for the current series, we find that of the two losing semi-finalists (New College, Oxford, and Bangor) Bangor would snatch third place. (Admittedly, I was a little surprised by this as New College seemed the much stronger team, but a quick look at
the results for the series suggests that this isn't reflected in the scores. For example, Bangor defeated King's College, Cambridge, far more convincingly than New College did.)
In the same way we can also compare the 19 Paxman-era champions to see which team were the most dominant in their series. It will come as little surprise to regular viewers that the 2009 Corpus Christi, Oxford team (aka
Corpus Christi Trimble) would have topped this particular list, but as they were disqualified for fielding an ineligible player we instead find the 1998 Magdalen, Oxford squad come out on top. This team were a little before my time, but a poke through
that series suggests that the scoring algorithm is doing a reasonable job: their quarter- and semi-finals were Trimble-like demolitions before a relatively narrow victory against Birkbeck to lift the trophy. (Coincidentally, Magdalen also take second in the overall standings, with their 2011 team posting similarly strong statistics.)
What of our original question, though? Which institution has been the most successful at University Challenge in the last 19 years? For this I assigned every team a rank within their series (first based on how far they got in the competition then using average Performance score above to break ties). From here there are then two ways to identify the 'best' institution: their average rank or their total rank. If we go with the former then, predictably, it's a team with only one appearance who top the list: London Metropolitan may have only made it onto the show once, but their third place in the 2004 series gives them a hard-to-beat average. Really, though, as getting through the show's audition exam is itself an achievement, it's total rank that represents a truly consistent institution, and on this metric it's Manchester who take the crown. The top of this list is, however, dominated by teams with multiple appearances: with a whopping 15 appearances Durham are second despite never winning a series, while Magdalen, Oxford are down in seventh with 'only' 9 appearances.
So there you have it, unequivocal proof that Manchester are doing something right, although if you're not totally convinced by my methods I wouldn't blame you. Merely while writing this up I spotted at least half a dozen holes one could pick in my metric, and I fully anticipate being alerted to some better, more established approach. Still, it's hard to deny its simplicity, and in any case the most important thing is who comes out on top. I don't think there are many systems that would suggest anything other than what mine has here: Manchester will once again be the team to beat next year, and Corpus Christi wuz robbed.