A4.14

Computational Statistics and Statistical Modelling
Part II, 2002

Assume that the nn-dimensional observation vector YY may be written as Y=Xβ+ϵY=X \beta+\epsilon, where XX is a given n×pn \times p matrix of rank p,βp, \beta is an unknown vector, with βT=(β1,,βp)\beta^{T}=\left(\beta_{1}, \ldots, \beta_{p}\right), and

ϵNn(0,σ2I)\epsilon \sim N_{n}\left(0, \sigma^{2} I\right)

where σ2\sigma^{2} is unknown. Find β^\hat{\beta}, the least-squares estimator of β\beta, and describe (without proof) how you would test

H0:βν=0H_{0}: \beta_{\nu}=0

for a given ν\nu.

Indicate briefly two plots that you could use as a check of the assumption ()(*).

Continued opposite Sulphur dioxide is one of the major air pollutants. A data-set presented by Sokal and Rohlf (1981) was collected on 41 US cities in 1969-71, corresponding to the following variables:

Y=Y= sulphur dioxide content of air in micrograms per cubic metre

X1=X 1= average annual temperature in degrees Fahrenheit

X2X 2 = number of manufacturing enterprises employing 20 or more workers

X3=X 3= population size (1970 census) in thousands

X4=X 4= average annual wind speed in miles per hour

X5=X 5= average annual precipitation in inches

X6=X 6= average annual of days with precipitation per year ..

Interpret the RR output that follows below, quoting any standard theorems that you need to use.

> next. lmlm(log(Y)X1+X2+X3+X4+X5+X6)> summary ( next.lm ) Call: lm( formula =log(Y)X1+X2+X3+X4+X5+X6)\begin{aligned} &>\text { next. } \operatorname{lm}-\operatorname{lm}(\log (\mathrm{Y}) \sim \mathrm{X} 1+\mathrm{X} 2+\mathrm{X} 3+\mathrm{X} 4+\mathrm{X} 5+\mathrm{X} 6) \\ &>\text { summary }(\text { next.lm }) \\ &\text { Call: } \operatorname{lm}(\text { formula }=\log (\mathrm{Y}) \sim \mathrm{X} 1+\mathrm{X} 2+\mathrm{X} 3+\mathrm{X} 4+\mathrm{X} 5+\mathrm{X} 6) \end{aligned}

 Call: lm( formula =log(Y)X1+X2+X3+X4+X5+X6)\begin{aligned} & \text { Call: } \operatorname{lm}(\text { formula }=\log (\mathrm{Y}) \sim \mathrm{X} 1+\mathrm{X} 2+\mathrm{X} 3+\mathrm{X} 4+\mathrm{X} 5+\mathrm{X} 6) \end{aligned}

Residuals :

 Min 1Q Median 3Q Max .795480.255380.019680.283280.98029\begin{array}{rrrrr} \text { Min } & 1 Q & \text { Median } & 3 Q & \text { Max } \\ \hline .79548 & -0.25538 & -0.01968 & 0.28328 & 0.98029 \end{array}

0.795480.255380.019680.283280.98029\begin{array}{lllll}-0.79548 & -0.25538 & -0.01968 & 0.28328 & 0.98029\end{array}

 Coefficients:  Estimate  Std. Error  t value Pr(>t) (Intercept) 7.25324561.44836865.0081.68e05 X1 0.05990170.01901383.1500.00339 X2 0.00126390.00048202.6220.01298 X3 0.00070770.00046321.5280.13580 X4 0.16971710.05555633.0550.00436 X5 0.01737230.01110361.5650.12695 X6 0.00043470.00495910.0880.93066\begin{array}{lrlcll}\text { Coefficients: } & & & & & \\ & \text { Estimate } & \text { Std. Error } & \text { t value } & \operatorname{Pr}(>|t|) & \\ \text { (Intercept) } & 7.2532456 & 1.4483686 & 5.008 & 1.68 \mathrm{e}-05 & * * * \\ \text { X1 } & -0.0599017 & 0.0190138 & -3.150 & 0.00339 & * * \\ \text { X2 } & 0.0012639 & 0.0004820 & 2.622 & 0.01298 & * \\ \text { X3 } & -0.0007077 & 0.0004632 & -1.528 & 0.13580 & \\ \text { X4 } & -0.1697171 & 0.0555563 & -3.055 & 0.00436 & * * \\ \text { X5 } & 0.0173723 & 0.0111036 & 1.565 & 0.12695 & \\ \text { X6 } & 0.0004347 & 0.0049591 & 0.088 & 0.93066\end{array}

Signif. codes: 0 ', 0.0010.001 ', 0.010.01 ', 0.050.05 ':

Residual standard error: 0.4480.448 on 34 degrees of freedom

Multiple R-Squared: 0.65410.6541

F-statistic: 10.7210.72 on 6 and 34 degrees of freedom, p-value: 1.126e061.126 \mathrm{e}-06