Paper 2, Section I, K

Statistical Modelling
Part II, 2012

The purpose of the following study is to investigate differences among certain treatments on the lifespan of male fruit flies, after allowing for the effect of the variable 'thorax length' (thorax) which is known to be positively correlated with lifespan. Data was collected on the following variables:

longevity lifespan in days

thorax (body) length in mm\mathrm{mm}

treat a five level factor representing the treatment groups. The levels were labelled as follows: "00", "10", "80", "11", "81".

No interactions were found between thorax length and the treatment factor. A linear model with thorax as the covariate, treat as a factor (having the above 5 levels) and longevity as the response was fitted and the following output was obtained. There were 25 males in each of the five groups, which were treated identically in the provision of fresh food.

Coefficients :

EstimateStd. Errort valuePr(>t) (Intercept) 49.9810.614.716.7e06 treat10 2.652.980.890.37 treat11 7.022.972.360.02 treat80 3.933.001.310.19 treat81 19.953.016.641.0e09 thorax 135.8212.4410.92<2e16\begin{array}{lrrrr} & \text{Estimate} & \text{Std. Error} & \text{t value} & \operatorname{Pr}(>|t|) \\ \text { (Intercept) } & -49.98 & 10.61 & -4.71 & 6.7 e-06 \\ \text { treat10 } & 2.65 & 2.98 & 0.89 & 0.37 \\ \text { treat11 } & -7.02 & 2.97 & -2.36 & 0.02 \\ \text { treat80 } & 3.93 & 3.00 & 1.31 & 0.19 \\ \text { treat81 } & -19.95 & 3.01 & -6.64 & 1.0 e-09 \\ \text { thorax } & 135.82 & 12.44 & 10.92 & <2 \mathrm{e}-16\end{array}

Residual standard error: 10.510.5 on 119 degrees of freedom

Multiple R-Squared: 0.6560.656, Adjusted R-squared: 0.6420.642

F-statistics: 45.545.5 on 5 and 119 degrees of freedom, p-value: 0

(a) Assuming the same treatment, how much longer would you expect a fly with a thorax length 0.1 mm0.1 \mathrm{~mm} greater than another to live?

(b) What is the predicted difference in longevity between a male fly receiving treatment treat 10 and treat81 assuming they have the same thorax length?

(c) Because the flies were randomly assigned to the five groups, the distribution of thorax lengths in the five groups are essentially equal. What disadvantage would the investigators have incurred by ignoring the thorax length in their analysis (i.e., had they done a one-way ANOVA instead)?

(d) The residual-fitted plot is shown in the left panel of Figure 1 overleaf. Is it possible to determine if the regular residuals or the studentized residuals have been used to construct this plot? Explain.

(e) The Box-Cox procedure was used to determine a good transformation for this data. The plot of the log-likelihood for λ\lambda is shown in the right panel of Figure 1 . What transformation should be used to improve the fit and yet retain some interpretability?

Figure 1: Residual-Fitted plot on the left and Box-Cox plot on the right