A4.14

Assume that the $n$ -dimensional observation vector $Y$ may be written as $Y=X \beta+\epsilon$ , where $X$ is a given $n \times p$ matrix of rank $p, \beta$ is an unknown vector, with $\beta^{T}=\left(\beta_{1}, \ldots, \beta_{p}\right)$ , and

\epsilon \sim N_{n}\left(0, \sigma^{2} I\right)

where $\sigma^{2}$ is unknown. Find $\hat{\beta}$ , the least-squares estimator of $\beta$ , and describe (without proof) how you would test

H_{0}: \beta_{\nu}=0

for a given $\nu$ .

Indicate briefly two plots that you could use as a check of the assumption $(*)$ .

Continued opposite Sulphur dioxide is one of the major air pollutants. A data-set presented by Sokal and Rohlf (1981) was collected on 41 US cities in 1969-71, corresponding to the following variables:

$Y=$ sulphur dioxide content of air in micrograms per cubic metre

$X 1=$ average annual temperature in degrees Fahrenheit

$X 2$ = number of manufacturing enterprises employing 20 or more workers

$X 3=$ population size (1970 census) in thousands

$X 4=$ average annual wind speed in miles per hour

$X 5=$ average annual precipitation in inches

$X 6=$ average annual of days with precipitation per year $.$

Interpret the $R$ output that follows below, quoting any standard theorems that you need to use.

\begin{aligned} &>\text { next. } \operatorname{lm}-\operatorname{lm}(\log (\mathrm{Y}) \sim \mathrm{X} 1+\mathrm{X} 2+\mathrm{X} 3+\mathrm{X} 4+\mathrm{X} 5+\mathrm{X} 6) \\ &>\text { summary }(\text { next.lm }) \\ &\text { Call: } \operatorname{lm}(\text { formula }=\log (\mathrm{Y}) \sim \mathrm{X} 1+\mathrm{X} 2+\mathrm{X} 3+\mathrm{X} 4+\mathrm{X} 5+\mathrm{X} 6) \end{aligned}

\begin{aligned} & \text { Call: } \operatorname{lm}(\text { formula }=\log (\mathrm{Y}) \sim \mathrm{X} 1+\mathrm{X} 2+\mathrm{X} 3+\mathrm{X} 4+\mathrm{X} 5+\mathrm{X} 6) \end{aligned}

Residuals :

\begin{array}{rrrrr} \text { Min } & 1 Q & \text { Median } & 3 Q & \text { Max } \\ \hline .79548 & -0.25538 & -0.01968 & 0.28328 & 0.98029 \end{array}

$\begin{array}{lllll}-0.79548 & -0.25538 & -0.01968 & 0.28328 & 0.98029\end{array}$

$\begin{array}{lrlcll}\text { Coefficients: } & & & & & \\ & \text { Estimate } & \text { Std. Error } & \text { t value } & \operatorname{Pr}(>|t|) & \\ \text { (Intercept) } & 7.2532456 & 1.4483686 & 5.008 & 1.68 \mathrm{e}-05 & * * * \\ \text { X1 } & -0.0599017 & 0.0190138 & -3.150 & 0.00339 & * * \\ \text { X2 } & 0.0012639 & 0.0004820 & 2.622 & 0.01298 & * \\ \text { X3 } & -0.0007077 & 0.0004632 & -1.528 & 0.13580 & \\ \text { X4 } & -0.1697171 & 0.0555563 & -3.055 & 0.00436 & * * \\ \text { X5 } & 0.0173723 & 0.0111036 & 1.565 & 0.12695 & \\ \text { X6 } & 0.0004347 & 0.0049591 & 0.088 & 0.93066\end{array}$

Signif. codes: 0 ', $0.001$ ', $0.01$ ', $0.05$ ':

Residual standard error: $0.448$ on 34 degrees of freedom

Multiple R-Squared: $0.6541$

F-statistic: $10.72$ on 6 and 34 degrees of freedom, p-value: $1.126 \mathrm{e}-06$