Paper 4, Section II, J

Statistical Modelling
Part II, 2019

A sociologist collects a dataset on friendships among mm Cambridge graduates. Let yi,j=1y_{i, j}=1 if persons ii and jj are friends 3 years after graduation, and yi,j=0y_{i, j}=0 otherwise. Let ziz_{i} be a categorical variable for person ii 's college, taking values in the set {1,2,,C}\{1,2, \ldots, C\}. Consider logistic regression models,

P(yi,j=1)=eθi,j1+eθi,j,1i<jm\mathbb{P}\left(y_{i, j}=1\right)=\frac{e^{\theta_{i, j}}}{1+e^{\theta_{i, j}}}, \quad 1 \leqslant i<j \leqslant m

with parameters either

  1. θi,j=βzi,zj\theta_{i, j}=\beta_{z_{i}, z_{j}}; or,

  2. θi,j=βzi+βzj\theta_{i, j}=\beta_{z_{i}}+\beta_{z_{j}}; or,

  3. θi,j=βzi+βzj+β0δzi,zj\theta_{i, j}=\beta_{z_{i}}+\beta_{z_{j}}+\beta_{0} \delta_{z_{i}, z_{j}}, where δzi,zj=1\delta_{z_{i}, z_{j}}=1 if zi=zjz_{i}=z_{j} and 0 otherwise.

(a) Write the likelihood of the models.

(b) Show that the three models are nested and specify the order. Suggest a statistic to compare models 1 and 3, give its definition and specify its asymptotic distribution under the null hypothesis, citing any necessary theorems.

(c) Suppose persons ii and jj are in the same college k;k ; consider the number of friendships, MiM_{i} and MjM_{j}, that each of them has with people in college k\ell \neq k ( \ell and kk fixed). In each of the models above, compare the distribution of these two random variables. Explain why this might lead to a poor quality of fit.

(d) Find a minimal sufficient statistic for model 3. [You may use the following characterisation of a minimal sufficient statistic: let f(β;y)f(\beta ; y) be the likelihood in this model, where β=(βk)k=0,1,,C\beta=\left(\beta_{k}\right)_{k=0,1, \ldots, C} and y=(yi,j)i,j=1,,m;y=\left(y_{i, j}\right)_{i, j=1, \ldots, m} ; suppose T=t(y)T=t(y) is a statistic such that f(β;y)/f(β;y)f(\beta ; y) / f\left(\beta ; y^{\prime}\right) is constant in β\beta if and only if t(y)=t(y)t(y)=t\left(y^{\prime}\right); then, TT is a minimal sufficient statistic for β\beta.]