Paper 1, Section II, J

Statistical Modelling
Part II, 2020

We consider a subset of the data on car insurance claims from Hallin and Ingenbleek (1983). For each customer, the dataset includes total payments made per policy-year, the amount of kilometres driven, the bonus from not having made previous claims, and the brand of the car. The amount of kilometres driven is a factor taking values 1,2,3,41,2,3,4, or 5 , where a car in level i+1i+1 has driven a larger number of kilometres than a car in level ii for any i=1,2,3,4i=1,2,3,4. A statistician from an insurance company fits the following model on RR.

>> model1 <- Im(Paymentperpolicyyr as numeric(Kilometres) ++ Brand ++ Bonus)

(i) Why do you think the statistician transformed variable Kilometres from a factor to a numerical variable?

(ii) To check the quality of the model, the statistician applies a function to model1 which returns the following figure:

What does the plot represent? Does it suggest that model1 is a good model? Explain. If not, write down a model which the plot suggests could be better.

(iii) The statistician fits the model suggested by the graph and calls it model2. Consider the following abbreviated output:

>summary(model2)>\operatorname{summary}(\operatorname{model2})

\cdots

Coefficients:

 (Intercept) 6.5140350.18633934.958<2e16 as.numeric(Kilometres) 0.0571320.0326541.7500.08126. Brand2 0.3638690.1868571.9470.05248.\begin{array}{lrrrr}\text { (Intercept) } & 6.514035 & 0.186339 & 34.958 & <2 \mathrm{e}-16 * * * \\ \text { as.numeric(Kilometres) } & 0.057132 & 0.032654 & 1.750 & 0.08126 . \\ \text { Brand2 } & 0.363869 & 0.186857 & 1.947 & 0.05248 .\end{array}

Brand2

\cdots

Brand9

0.1254460.1868570.6710.50254\begin{array}{lllll}0.125446 & 0.186857 & 0.671 & 0.50254\end{array}

Bonus

Signif. codes: 0 '' 0.0010.001 '' 0.010.01 '' 0.050.05 '.' 0.10.1 ' 1

Residual standard error: 0.78170.7817 on 284 degrees of freedom ..

Using the output, write down a 95%95 \% prediction interval for the ratio between the total payments per policy year for two cars of the same brand and with the same value of Bonus, one of which has a Kilometres value one higher than the other. You may express your answer as a function of quantiles of a common distribution, which you should specify.

(iv) Write down a generalised linear model for Paymentperpolicyyr which may be a better model than model1 and give two reasons. You must specify the link function.