Paper 4, Section I, 5K5 K

Statistical Modelling
Part II, 2014

Consider the normal linear model where the nn-vector of responses YY satisfies Y=Xβ+εY=X \beta+\varepsilon with εNn(0,σ2I)\varepsilon \sim N_{n}\left(0, \sigma^{2} I\right) and XX is an n×pn \times p design matrix with full column rank. Write down a (1α)(1-\alpha)-level confidence set for β\beta.

Define the Cook's distance for the observation (Yi,xi)\left(Y_{i}, x_{i}\right) where xiTx_{i}^{T} is the ii th row of XX, and give its interpretation in terms of confidence sets for β\beta.

In the model above with n=100n=100 and p=4p=4, you observe that one observation has Cook's distance 3.1. Would you be concerned about the influence of this observation? Justify your answer.

[Hint: You may find some of the following facts useful:

  1. If Zχ42Z \sim \chi_{4}^{2}, then P(Z1.06)=0.1,P(Z7.78)=0.9\mathbb{P}(Z \leqslant 1.06)=0.1, \mathbb{P}(Z \leqslant 7.78)=0.9.

  2. If ZF4,96Z \sim F_{4,96}, then P(Z0.26)=0.1,P(Z2.00)=0.9\mathbb{P}(Z \leqslant 0.26)=0.1, \mathbb{P}(Z \leqslant 2.00)=0.9.

  3. If ZF96,4Z \sim F_{96,4}, then P(Z0.50)=0.1,P(Z3.78)=0.9.]\left.\mathbb{P}(Z \leqslant 0.50)=0.1, \mathbb{P}(Z \leqslant 3.78)=0.9 .\right]