Paper 1, Section II, H

Statistics
Part IB, 2013

Consider the general linear model Y=Xθ+ϵY=X \theta+\epsilon where XX is a known n×pn \times p matrix, θ\theta is an unknown p×1p \times 1 vector of parameters, and ϵ\epsilon is an n×1n \times 1 vector of independent N(0,σ2)N\left(0, \sigma^{2}\right) random variables with unknown variance σ2\sigma^{2}. Assume the p×pp \times p matrix XTXX^{T} X is invertible. Let

θ^=(XTX)1XTYϵ^=YXθ^\begin{aligned} \hat{\theta} &=\left(X^{T} X\right)^{-1} X^{T} Y \\ \hat{\epsilon} &=Y-X \hat{\theta} \end{aligned}

What are the distributions of θ^\hat{\theta} and ϵ^\hat{\epsilon} ? Show that θ^\hat{\theta} and ϵ^\hat{\epsilon} are uncorrelated.

Four apple trees stand in a 2×22 \times 2 rectangular grid. The annual yield of the tree at coordinate (i,j)(i, j) conforms to the model

yij=αi+βxij+ϵij,i,j{1,2},y_{i j}=\alpha_{i}+\beta x_{i j}+\epsilon_{i j}, \quad i, j \in\{1,2\},

where xijx_{i j} is the amount of fertilizer applied to tree (i,j),α1,α2(i, j), \alpha_{1}, \alpha_{2} may differ because of varying soil across rows, and the ϵij\epsilon_{i j} are N(0,σ2)N\left(0, \sigma^{2}\right) random variables that are independent of one another and from year to year. The following two possible experiments are to be compared:

I:(xij)=(0123) and II:(xij)=(0231)\mathrm{I}:\left(x_{i j}\right)=\left(\begin{array}{cc} 0 & 1 \\ 2 & 3 \end{array}\right) \quad \text { and } \quad \mathrm{II}:\left(x_{i j}\right)=\left(\begin{array}{cc} 0 & 2 \\ 3 & 1 \end{array}\right) \text {. }

Represent these as general linear models, with θ=(α1,α2,β)\theta=\left(\alpha_{1}, \alpha_{2}, \beta\right). Compare the variances of estimates of β\beta under I and II.

With II the following yields are observed:

(yij)=(100300600400)\left(y_{i j}\right)=\left(\begin{array}{ll} 100 & 300 \\ 600 & 400 \end{array}\right)

Forecast the total yield that will be obtained next year if no fertilizer is used. What is the 95%95 \% predictive interval for this yield?