Paper 4, Section I, J

Statistical Modelling
Part II, 2020

Suppose you have a data frame with variables response, covar1, and covar2. You run the following commands on RR.

 covar2 0.37552.59780.1450.886\begin{array}{llllll}\text { covar2 } & 0.3755 & 2.5978 & 0.145 & 0.886\end{array}

...

(a) Consider the following three scenarios:

(i) All the output you have is the abbreviated output of summary (model) above.

(ii) You have the abbreviated output of summary (model) above together with

Residual standard error: 0.80970.8097 on 47 degrees of freedom

Multiple R-squared: 0.81260.8126, Adjusted R-squared: 0.80460.8046

F-statistic: 101.9101.9 on 2 and 47 DF, p-value: < 2.2e162.2 e-16

(iii) You have the abbreviated output of summary (model) above together with

Residual standard error: 0.91840.9184 on 47 degrees of freedom

Multiple R-squared: 0.0007120.000712, Adjusted R-squared: 0.04181-0.04181

F-statistic: 0.016740.01674 on 2 and 47 DF, p-value: 0.98340.9834

What conclusion can you draw about which variables explain the response in each of the three scenarios? Explain.

(b) Assume now that you have the abbreviated output of summary (model) above together with

anova(lm(response  1),lm(~ 1), \operatorname{lm}( response covar1)\sim \operatorname{covar} 1), model ))

What are the values of the entries with a question mark? [You may express your answers as arithmetic expressions if necessary].