Paper 1, Section II, J

We consider a subset of the data on car insurance claims from Hallin and Ingenbleek (1983). For each customer, the dataset includes total payments made per policy-year, the amount of kilometres driven, the bonus from not having made previous claims, and the brand of the car. The amount of kilometres driven is a factor taking values $1,2,3,4$ , or 5 , where a car in level $i+1$ has driven a larger number of kilometres than a car in level $i$ for any $i=1,2,3,4$ . A statistician from an insurance company fits the following model on $R$ .

$>$ model1 <- Im(Paymentperpolicyyr as numeric(Kilometres) $+$ Brand $+$ Bonus)

(i) Why do you think the statistician transformed variable Kilometres from a factor to a numerical variable?

(ii) To check the quality of the model, the statistician applies a function to model1 which returns the following figure:

What does the plot represent? Does it suggest that model1 is a good model? Explain. If not, write down a model which the plot suggests could be better.

(iii) The statistician fits the model suggested by the graph and calls it model2. Consider the following abbreviated output:

$>\operatorname{summary}(\operatorname{model2})$

$\cdots$

Coefficients:

$\begin{array}{lrrrr}\text { (Intercept) } & 6.514035 & 0.186339 & 34.958 & <2 \mathrm{e}-16 * * * \\ \text { as.numeric(Kilometres) } & 0.057132 & 0.032654 & 1.750 & 0.08126 . \\ \text { Brand2 } & 0.363869 & 0.186857 & 1.947 & 0.05248 .\end{array}$

Brand2

$\cdots$

Brand9

$\begin{array}{lllll}0.125446 & 0.186857 & 0.671 & 0.50254\end{array}$

Bonus

Signif. codes: 0 '' $0.001$ '' $0.01$ '' $0.05$ '.' $0.1$ ' 1

Residual standard error: $0.7817$ on 284 degrees of freedom ..

Using the output, write down a $95 \%$ prediction interval for the ratio between the total payments per policy year for two cars of the same brand and with the same value of Bonus, one of which has a Kilometres value one higher than the other. You may express your answer as a function of quantiles of a common distribution, which you should specify.

(iv) Write down a generalised linear model for Paymentperpolicyyr which may be a better model than model1 and give two reasons. You must specify the link function.