My previous Jeopardy
analyzer was built using a base of about 30 daily Coryat scores. This
one has more than 1600 scores that were either recorded directly,
e-mailed to me, or scraped from the forum at jboard.tv . Here we look
at the consistency of tournament effects for different at-home
players, and some long-term trends.
After recording 90 days over 2017, it's become apparent that I'm not getting any better at Jeopardy just by playing the game at home, as shown in Figure 1.*
The average Coryat is fitted
using a spline smoother from the loess package found in base R. It's
a more flexible model and a simpler model to code and manipulate than
the previous one. The smooth line in Figure 1 shows that the Coryat
score at the beginning of the year is roughly the same as it is at
the end of the year. The two tournaments I recorded, the Teen
Tournament on the Tournament of Champions are not used in the spline
spline smoother, they are linearly interpolated. To measure the
effect of these tournaments, I compare these linearly interpolated
values to the mean value on each tournament. These results are shown
in Table 1, including the estimate for tournament effect from my
previous analysis which gives roughly the same value. According to
these results, I get an average of 3128 more points in a College
Championship game than a normal game, and 1726 fewer points in a
Tournament of Champions game than normal.
Player
|
n
|
Regular Coryat
|
College
|
Champions
|
Teacher's
|
Jack
|
90
|
15570
|
+3128
|
-1726
|
NA
|
A
|
480
|
30637
|
+5443
|
+1866
|
+2354
|
B
|
707
|
34192
|
+4405
|
-86
|
+2090
|
C
|
55
|
35207
|
-15466 (n=1)
|
-370
|
-3829
|
D
|
27
|
16419
|
NA
|
NA
|
NA
|
E
|
45
|
23682
|
NA
|
NA
|
NA
|
F
|
119
|
18795
|
3329
|
-2044
|
NA
|
Another trend I was
interested in was more short-term. Looking at this chart of date
today scores it almost seems like the scores are oscillating, with a
high score followed by a low score and vice-versa. If this is a real
effect we should be able to see it as a negative autocorrelation
between one day's score and the next. Figure 8 shows a scatter plot
of one day's scores in the X and the previous day's scores in the Y.
The estimated Pearson correlation coefficient is almost exactly zero.
Furthermore this lack of correlation is not an artifact of some
nonlinear affect, because the same lack of patterns shows up when we
compare the ranks of the the scores from one day to the next and take
the Spearman correlation coefficient of these ranked values instead.
In short, there are no day-to-day balancing effects and no hot or
cold streaks. It's all just regression to the mean.
Even if the correlation did
show up as statistically significant, it doesn't seem to be
particularly meaningful. My hypothesis was that topics were chosen
from day-to-day such that viewers at home would be more likely to
have category or two that they could excel in every couple of days,
and to avoid having long stretches we're home viewers may feel
frustrated. Another effect of such a negative correlation would be
that champions that stay on a long time truly would be outstanding
players in multiple fields rather than specialists. However, I was
unable to find any evidence of such a topic shuffling strategy in my
own scores, and I would consider myself a fairly typical at-home
player in my degree of specialization.**
Now let's try these analyses
with some other players and see if the same sort of trends appear, as
shown in Figures 2-7 and Table 1. According to the figures, a whole
range of trends appear. The only common one seems to be quick
improvement at the beginning of tracking Coryat scores. From the
table, we see that the Tournament of Champions tends to vex people.
The other tournaments not so much. One note about player C's College
Championship effect is that it represents only a single measurement,
which you can see by the triangle on their chart on day 53.
Figure 2 |
Figure 3 |
Figure 4 |
Figure 5 |
Figure 6 |
Figure 7 |
What about that streak or
balance hypothesis? Does the near-zero correlation hold for other
players as well? Only for player F did a statistically significant
correlation appear (p = 0.005, before multiple testing adjustments),
and even then that could be an artifact of gradual improvement (a
similar check on the model residuals could adjust for improvement,
but we're p-hacking at this point). Figures 8 to 10 show the
scatterplots of me and two of the other players to show how weak such
a relationship is, if there is one at all.
Player
|
Pearson r
|
Spearman r
|
Jack
|
0.004
|
-0.006
|
A
|
0.045
|
0.036
|
B
|
0.112
|
0.100
|
C
|
0.013
|
0.019
|
D
|
-0.062
|
0.091
|
E
|
0.242
|
0.206
|
F
|
0.252
|
0.309
|
Table 2
Another hypothesis I wanted
to test was about the nature of Coryat scores. The spline smoother
model in this analysis relies on the scores having some linear
relationship to underlying skill, rather than something non-linear
like an exponential one. That is, someone who averages 15,000 would
be just as much better then someone who average is 12,000, as that
second person would be better than someone else whose average is
9000. Or, in other words, that every point of average Coryat
increased represents the same amount of latent skill improvement.
This sort of relationship
may seem given, but it isn't in a lot of games and sports. Consider
bowling, either 5 or 10 pin. Scores in bowling tend to compound
because consecutive strikes are worth more than individual,
unconnected ones. The distribution of a typical bowler's scores are
right or positively skewed, meaning that there are more unusually
high scores than unusually low ones. (For extremely good bowlers, the
opposite is true because they will typically play close to
perfection).
So do Coryat scores follow a
similar set of patterns? Consider the histograms in the right-half of
Figures 8-10. Player A is very strong, and their scores exhibit the
negative correlation that we would expect of someone consistently at
their peak. Player F's scores are approximately symmetric about
15000. My scores are positively skewed, implying that I'm
consistently bad, but occasionally get lucky.
Figure 8 |
Figure 9 |
Figure 10 |
In this link, I have included the updated are code necessary to do these analysis with your own at home scores, as well as a sample dataset of the scores of a couple people that gave my explicit permission to share their scores.
I would be thrilled to have
more data from more players, in order to further analyze the at-home
experience of Jeopardy. I could use this data to further improve
questions posed in this post, as well as answer queries from other
players.
In the future, I would like
to compare the difficulties of Jeopardy! vs Double Jeopardy!, and to
see how dependent a typical score is on a few categories which could
be answered with data of enough volume and resolution. Please feel
free to add your own analysis questions in the comment section, or to
my email (jackd@sfu.ca or Twitter
@jack_davis_sfu ).
Thanks for reading!
* I did improve from 16/50
to 30/50 on the annual online test, but that could mostly be
attributed to bad luck on the first test and good luck on the second.
**Specifically I nail the
science questions do reasonably well on the academic questions, but
I'm left silent when it comes to Americana and Opera.
Link to the first Jeopardy analysis: http://www.stats-et-al.com/2017/03/analyzing-jeopardy-in-r-college.html
No comments:
Post a Comment