Paper 1, Section I, K

The body mass index (BMI) of your closest friend is a good predictor of your own BMI. A scientist applies polynomial regression to understand the relationship between these two variables among 200 students in a sixth form college. The $R$ commands

$>$ fit. $1<-\operatorname{lm}(B M I \sim$ poly $($ friendBMI , 2, raw=T $))$

$>$ fit. $2<-\operatorname{lm}(B M I \sim$ poly $($ friendBMI, 3, raw $=\mathrm{T}))$

fit the models $Y=\beta_{0}+\beta_{1} X+\beta_{2} X^{2}+\varepsilon$ and $Y=\beta_{0}+\beta_{1} X+\beta_{2} X^{2}+\beta_{3} X^{3}+\varepsilon$ , respectively, with $\varepsilon \sim N\left(0, \sigma^{2}\right)$ in each case.

Setting the parameters raw to FALSE:

$>$ fit. $3<-\operatorname{lm}(B M I \sim$ poly $($ friendBMI , 2, raw=F $)$ )

$>$ fit. $4<-\operatorname{lm}(\mathrm{BMI} \sim$ poly $($ friendBMI, 3, raw $=\mathrm{F}))$

fits the models $Y=\beta_{0}+\beta_{1} P_{1}(X)+\beta_{2} P_{2}(X)+\varepsilon$ and $Y=\beta_{0}+\beta_{1} P_{1}(X)+\beta_{2} P_{2}(X)+$ $\beta_{3} P_{3}(X)+\varepsilon$ , with $\varepsilon \sim N\left(0, \sigma^{2}\right)$ . The function $P_{i}$ is a polynomial of degree $i$ . Furthermore, the design matrix output by the function poly with raw=F satisfies:

$>t($ poly $($ friendBMI, 3, raw $=F)) \% * \%$ poly $(a, 3$ , raw $=F)$

$\begin{array}{rrrr}1 & 2 & 3 \\ 1 & 1.000000 e+00 & 1.288032 \mathrm{e}-16 & 3.187554 \mathrm{e}-17 \\ 2 & 1.288032 \mathrm{e}-16 & 1.000000 \mathrm{e}+00 & -6.201636 \mathrm{e}-17 \\ 3 & 3.187554 \mathrm{e}-17 & -6.201636 \mathrm{e}-17 & 1.000000 \mathrm{e}+00\end{array}$

How does the variance of $\hat{\beta}$ differ in the models $f i t .2$ and $f i t .4$ ? What about the variance of the fitted values $\hat{Y}=X \hat{\beta}$ ? Finally, consider the output of the commands

$>\operatorname{anova}$ (fit.1,fit.2)

anova(fit.3,fit.4)

Define the test statistic computed by this function and specify its distribution. Which command yields a higher statistic?