Paper 4, Section II, J
A sociologist collects a dataset on friendships among Cambridge graduates. Let if persons and are friends 3 years after graduation, and otherwise. Let be a categorical variable for person 's college, taking values in the set . Consider logistic regression models,
with parameters either
; or,
; or,
, where if and 0 otherwise.
(a) Write the likelihood of the models.
(b) Show that the three models are nested and specify the order. Suggest a statistic to compare models 1 and 3, give its definition and specify its asymptotic distribution under the null hypothesis, citing any necessary theorems.
(c) Suppose persons and are in the same college consider the number of friendships, and , that each of them has with people in college ( and fixed). In each of the models above, compare the distribution of these two random variables. Explain why this might lead to a poor quality of fit.
(d) Find a minimal sufficient statistic for model 3. [You may use the following characterisation of a minimal sufficient statistic: let be the likelihood in this model, where and suppose is a statistic such that is constant in if and only if ; then, is a minimal sufficient statistic for .]