Paper 4, Section I, 5I5 I

Statistical Modelling
Part II, 2009

Sulphur dioxide is one of the major air pollutants. A dataset by Sokal and Rohlf (1981) was collected on 41 US cities/regions in 1969-1971. The annual measurements obtained for each region include (average) sulphur dioxide content, temperature, number of manufacturing enterprises employing more than 20 workers, population size in thousands, wind speed, precipitation, and the number of days with precipitation. The data are displayed in RR as follows (abbreviated):

Describe the model being fitted by the following RR commands.

>> fit <lm(log(so2)<-\operatorname{lm}(\log (\mathrm{so} 2) \sim temp ++ manuf ++ pop ++ wind ++ precip ++ days ))

Explain the (slightly abbreviated) output below, describing in particular how the hypothesis tests are performed and your conclusions based on their results:

Based on the summary above, suggest an alternative model.

Finally, what is the value obtained by the following command?

>sqrt(sum(resid(fit)2)/fit$df)>\operatorname{sqrt}\left(\operatorname{sum}\left(\operatorname{resid}(f i t)^{\sim} 2\right) / \mathrm{fit} \$ \mathrm{df}\right)