Tuesday, 8 July 2014

Bonus question
Was Series 9 of Only Connect the toughest yet?

Only Connect had its last hurrah on BBC 4 last night with the end of series 9 heralding its long-anticipated move to BBC 2 in the autumn. Much has been made of the supposed threat this might pose to the show's famed difficulty, with many a doubter worried about an inevitable 'dumbing down' as it tries to find its feet in more populist waters. Conversely, the latest series - featuring a brand new question editor - has been cited as the 'toughest ever', with some going as far as to say the questions have been "impossible", "unfair", and "a masterpiece in obscurity". Now that the series is complete what can we take from the scorecards the latest inductees to the Only Connect Cadet Force have turned in?

Our first exhibit is the simplest: average total score (of both teams) per episode for each series, contrasted with the series 1-8 average.
Average (combined) per-episode scores of Only Connect teams, series 1-9.

It probably won't come as too much of a surprise that, with an average combined score per game of just under 34 points, series 9 is indeed the lowest scoring yet. What's slightly interesting, however, is that this puts it just a couple of points lower than the previous record holder: series 2's teams averaged 36 points per game between them. Nevertheless, series 9 is rather out on its own and a standard statistical test suggests that this fluctuation isn't just down to chance, so something's going on (and for those of you who yearn for p-values, it's 0.01).

It's fairly clear, then, that scoring this series was abnormally low, but it gets a little more interesting if we look at round-by-round scoring to see exactly where these points have been lost. The next graphs split average scores per game into individual rounds (you may want to click it to make it slightly bigger, or even right-click and open in a new tab).

Average (combined) per-episode round-by-round scores of Only Connect teams, series 1-9.

Series 9 doesn't stand out quite so much on these plots, as there is understandably rather more natural variation in scores when we look at things in more detail. Nevertheless, series 9 saw the lowest average scores in sequences and on the walls, and was also well below average for connections and missing vowels to boot. Compared with the Series 1-8 average, Series 9 episodes saw around 1 fewer point scored in rounds 1 and 2, a 2.5 point drop on the walls and another 2 points or so in missing vowels. While this points to a general across-the-board fall in scoring, the wall scores are responsible for slightly more than their fair share of the drop-off.

Let's take a more detailed look at those wall scores. The following compares the distribution of scores on individual walls for series 1 to 8 with those of series 9: the height of a bar indicates what proportion of walls were solved for that particular score. It's here where a big part of series 9's lower scores is hiding.

Particularly striking are the bars at the far end. Out of series 9's 26 walls just 4 - under 1 in 6 - were solved for the maximum of 10 points. In contrast, across the 224 walls in series 1 to 8 a whopping 78 - over 1 in 3 - were maxed. While comparing the relatively small series 9 dataset with series 1-8 leaves a lot of room for statistical noise to creep in, it's still a fairly remarkable change in scoring, and again there is some reasonable statistical evidence this isn't just down to chance (p = 0.02, stats-fans).

Go hard or go home?

So where does this little tour of Only Connect numbers leave us? One thing that's inescapable is that the scoring in series 9 of TV's toughest quiz was not merely the lowest yet, but also so low that the odds of this being down to random chance alone are low enough to interest a statistician (and we're very interesting people). In other words, there's reasonable evidence that something is underpinning the drop. What exactly is, however, debatable. Harder questions, perhaps? Or could it simply be this series' contestants being a touch sub-par?

The data, alas, cannot distinguish these two possibilities (much like the never-ending debate over whether exams are getting easier or kids are getting smarter). Personally, though, I'm largely in the "ouch, that was a bit tricky" camp. Like every one that has preceded it, this series featured contestants with some serious quizzing pedigrees along with some entirely new faces, and I certainly don't think any of them were shown to be chumps (I'd welcome anyone who says otherwise to apply!). On the other hand, the questions have seemed a touch stiffer from the comfort of my sofa, and I think the change in question editor has shown through in a slight shift in the styles of puzzles we've been faced with. How this will develop for the show's first (of hopefully many) series on BBC 2 remains to be seen, but for now at least I think fears of Only Connect going soft can be put to one side.

Although really, if there's one show that can afford to dumb down, it's this one.


  1. I'm surprised to see how low the average for missing vowels was in the first series as I'm fairly sure there was no delay between clues being broadcast and displayed in the studio so the teams had more time at their disposal.

    Of course, time's the key factor in missing vowels and, whilst they might have gone at a greater pace, other differences in format (such as the teams introducing themselves) may have left less time for questions - not to mention needing more clues in the first two rounds.

    All of which, of course, is entirely unscientific (and rambling) speculation but I'm saving my chi-squares for testing the hypothesis that the music round is no harder than any other.

    1. Yes, I think you're correct about the delay issue (I rewatched some series 1 episodes recently and it certainly seems like there's no delay, or at least a much shorter one). That said, I wonder how much of it might be that this will have been the first time most of the contestants will have ever seen that sort of puzzle before - I certainly benefited from seven series of missing vowels before going on!

      And yes, I'd quite like to test the music question thing (my guess is that it's no harder in terms of going unsolved, but that it almost always goes for only 1 or 2 points), but suspect the sheer effort required to collect question-by-question data will forever prove beyond me.

  2. What confused me was the discrepancy in question difficulty in the final. Most of the questions were as hard as I'd come to expect from this series, but then the Europhiles - meaning no disrespect - got two of the easiest sequences I've seen in a while (the presidential assassins and the alphabetical capitals). I get that there's always an element of luck in question choice, but that seemed a bit much.

    1. Hrm. I am genuinely quite reluctant to judge relative difficulty of questions, although I agree those two sequences were very much 'quizzer bread and butter'. However, I thought the citrus fruit connection and Ring Cycle were similarly gettable, but that could equally just be a reflection of my wheelhouse. I could imagine the quantum numbers set would be a doddle to anyone with a degree in the correct area (but as I don't this could be entirely misjudged).