Paper 1, Section II, J

Statistical Modelling
Part II, 2017

The Cambridge Lawn Tennis Club organises a tournament in which every match consists of 11 games, all of which are played. The player who wins 6 or more games is declared the winner.

For players aa and bb, let nabn_{a b} be the total number of games they play against each other, and let yaby_{a b} be the number of these games won by player aa. Let n~ab\tilde{n}_{a b} and y~ab\tilde{y}_{a b} be the corresponding number of matches.

A statistician analysed the tournament data using a Binomial Generalised Linear Model (GLM) with outcome yaby_{a b}. The probability PabP_{a b} that aa wins a game against bb is modelled by

log(Pab1Pab)=βaβb,\log \left(\frac{P_{a b}}{1-P_{a b}}\right)=\beta_{a}-\beta_{b},

with an appropriate corner point constraint. You are asked to re-analyse the data, but the game-level results have been lost and you only know which player won each match.

We define a new GLM for the outcomes y~ab\tilde{y}_{a b} with P~ab=Ey~ab/n~ab\tilde{P}_{a b}=\mathbb{E} \tilde{y}_{a b} / \tilde{n}_{a b} and g(P~ab)=g\left(\tilde{P}_{a b}\right)= βaβb\beta_{a}-\beta_{b}, where the βa\beta_{a} are defined in ()(*). That is, βaβb\beta_{a}-\beta_{b} is the log-odds that aa wins a game against bb, not a match.

Derive the form of the new link function gg. [You may express your answer in terms of a cumulative distribution function.]